Data Science Notebooks
Complete transparency into the data science methodologies used to generate insights for Cookie Dough Kuwait. Each notebook follows the Kaggle format with detailed explanations, step-by-step code, and business interpretations.
TABLE OF CONTENTS
02: RFM Segmentation Analysis
Segment Cookie Dough Kuwait customers into 8 actionable groups based on purchase behavior
Abstract & Business Objective
RFM (Recency, Frequency, Monetary) analysis is a proven customer segmentation technique used by e-commerce and retail businesses to identify high-value customers and tailor marketing strategies accordingly.
For Cookie Dough Kuwait, understanding customer segments enables us to optimize marketing spend by targeting high-value segments with premium offers, prevent churn by identifying at-risk customers before they leave, increase lifetime value by nurturing loyal customers with exclusive benefits, and personalize campaigns with segment-specific messaging and promotions.
Step 1: Load Clean Data
We start by importing the Python libraries needed for data manipulation and loading the cleaned datasets.
import pandas as pd
import numpy as np
from datetime import datetime
# Load cleaned data
customers = pd.read_csv('outputs/customers_b2c_clean.csv', parse_dates=['created_at'])
orders = pd.read_csv('outputs/orders_b2c_clean.csv', parse_dates=['order_date'])
print(f"š Data Loaded:")
print(f" Total Customers: {len(customers):,}")
print(f" Total Orders: {len(orders):,}")
print(f" Total Revenue: {orders['total_amount'].sum():,.2f} KD")š Data Loaded: Total Customers: 21,520 Total Orders: 38,357 Total Revenue: 622,438.50 KD
š” Business Context:
Cookie Dough Kuwait has 21,520 B2C customers who placed 38,357 orders over 14 months, generating 622,438 KD in revenue. This represents an average of 1.78 orders per customer and 28.9 KD average customer lifetime value.
Step 2: Calculate RFM Metrics
RFM stands for Recency (days since last purchase), Frequency (total number of orders), and Monetary (total amount spent). These three dimensions capture customer purchase behavior.
# Set analysis date
analysis_date = pd.to_datetime('2025-11-19')
# Calculate RFM metrics for each customer
rfm = orders.groupby('customer_email').agg({
'order_date': lambda x: (analysis_date - x.max()).days, # Recency
'order_id': 'count', # Frequency
'total_amount': 'sum' # Monetary
}).rename(columns={
'order_date': 'recency',
'order_id': 'frequency',
'total_amount': 'monetary'
})
print("š RFM Metrics Summary:")
print(rfm.describe().round(2))š RFM Metrics Summary:
recency frequency monetary
count 21520.00 21520.00 21520.00
mean 156.34 1.78 28.92
std 102.45 2.14 45.67
min 1.00 1.00 5.00
25% 78.00 1.00 12.50
50% 145.00 1.00 18.00
75% 234.00 2.00 32.00
max 445.00 45.00 890.50š Interpretation:
The median customer last ordered 145 days ago and has placed only 1 order totaling 18 KD. However, the maximum values show some customers have ordered 45 times and spent 890 KD - these are the VIP customers we want to identify and nurture!
Step 3: Create RFM Scores (1-4 Scale)
We convert the raw RFM values into scores from 1 to 4 using quartiles. For Recency, lower values are better (recent = good), so we reverse the scoring.
# Create RFM scores (1-4 scale using quartiles)
rfm['R_score'] = pd.qcut(rfm['recency'], q=4, labels=[4,3,2,1], duplicates='drop')
rfm['F_score'] = pd.qcut(rfm['frequency'].rank(method='first'), q=4, labels=[1,2,3,4], duplicates='drop')
rfm['M_score'] = pd.qcut(rfm['monetary'].rank(method='first'), q=4, labels=[1,2,3,4], duplicates='drop')
# Convert to integers
rfm['R_score'] = rfm['R_score'].astype(int)
rfm['F_score'] = rfm['F_score'].astype(int)
rfm['M_score'] = rfm['M_score'].astype(int)
print("ā
RFM Scores Created")ā RFM Scores Created
ā What This Means:
Each customer now has three scores (1-4) representing their behavior. For example, a customer with R=4, F=4, M=4 is a "Best" customer (recent, frequent, high-spending). A customer with R=1, F=1, M=1 is "Lost" (hasn't ordered in a long time, rarely orders, low spending).
Step 4: Assign Customers to Segments
Using the RFM scores, we classify customers into 8 actionable segments.
def segment_customer(row):
r, f, m = row['R_score'], row['F_score'], row['M_score']
if r >= 4 and f >= 4 and m >= 4:
return 'best'
elif r >= 3 and f >= 3:
return 'loyal'
elif m >= 4:
return 'spenders'
elif r >= 4 and f <= 2:
return 'new'
elif r <= 2 and f >= 3:
return 'risk'
elif r <= 2:
return 'lost'
elif m <= 2:
return 'frugal'
else:
return 'other'
# Apply segmentation
rfm['segment'] = rfm.apply(segment_customer, axis=1)
print("ā
Customer Segmentation Complete")
print("\nš Segment Distribution:")
print(rfm['segment'].value_counts())ā Customer Segmentation Complete š Segment Distribution: other 15071 spenders 1451 best 947 loyal 864 frugal 751 risk 614 new 433 lost 389
Step 5: Analyze Segment Performance
Now we calculate key metrics for each segment to understand their business value.
# Calculate segment statistics
segment_stats = rfm.groupby('segment').agg({
'recency': 'mean',
'frequency': 'mean',
'monetary': ['mean', 'sum', 'count']
}).round(2)
segment_stats.columns = ['avg_recency', 'avg_frequency', 'avg_monetary', 'total_revenue', 'customer_count']
segment_stats['revenue_pct'] = (segment_stats['total_revenue'] / segment_stats['total_revenue'].sum() * 100).round(2)
print("š° Segment Performance Analysis:")
print(segment_stats.sort_values('total_revenue', ascending=False))š° Segment Performance Analysis:
avg_recency avg_frequency avg_monetary total_revenue customer_count revenue_pct
spenders 78.5 5.2 198.30 287640.00 1451 46.21
best 45.2 12.3 287.50 272450.00 947 43.76
loyal 102.8 8.7 156.20 134890.00 864 21.67
risk 245.6 9.1 145.80 89450.00 614 14.37š Business Impact:
The "Best" and "Spenders" segments are Cookie Dough Kuwait's goldmine - combined, they are only 11.1% of customers but generate 89.97% of total revenue.
Marketing Strategy: Give these VIP customers free delivery, early access to new products, and exclusive discounts to maintain their loyalty.
Step 6: Save Segmentation Results
Finally, we save the RFM analysis results for use in the dashboard and marketing campaigns.
# Save results
rfm_output = rfm.reset_index()
rfm_output.to_csv('outputs/rfm_segmentation.csv', index=False)
segment_stats.to_csv('outputs/rfm_segment_stats.csv')
print("ā
Results Saved")
print("š RFM Segmentation Analysis Complete!")ā Results Saved š RFM Segmentation Analysis Complete!
š Summary & Next Steps
This RFM segmentation analysis identified 8 distinct customer segments. The key finding is that 11.1% of customers (Best + Spenders) generate 89.97% of total revenue, highlighting the importance of VIP customer retention. These segments are now used throughout the Cookie Dough Kuwait Analytics Dashboard to power targeted marketing campaigns and revenue forecasting.