Customer Clustering Analysis

K-Means clustering with elbow method to discover natural customer groupings

Optimal Clusters
3

K-Means clusters

Silhouette Score
0.4544

Cluster quality metric

Total Customers
19,536

Analyzed customers

Largest Cluster
49.1%

9,601 customers

Elbow Method - Optimal Cluster Selection
Inertia (within-cluster sum of squares) and silhouette score for k=2 to k=10
2345678910Number of Clusters (k)08500170002550034000Inertia00.150.30.450.6Silhouette Score
  • Inertia
  • Silhouette Score

Interpretation: The elbow point indicates where adding more clusters provides diminishing returns. Higher silhouette scores (closer to 1) indicate better-defined clusters.

Understanding Silhouette Score & Optimal K Selection

What is the Silhouette Score?

The Silhouette Score is a metric that measures how well-defined and separated clusters are. It ranges from -1 to +1:

  • +1: Perfect clustering - objects are very close to their own cluster and far from other clusters
  • 0: Overlapping clusters - objects are on the boundary between clusters
  • -1: Incorrect clustering - objects are closer to neighboring clusters than their own

For Cookie Dough Kuwait, our silhouette score of 0.4544 indicates reasonably well-defined customer segments with clear behavioral differences.

Why k=3 is Optimal

While the analysis shows that k=7 (silhouette: 0.4458) and k=10 (silhouette: 0.4503) have slightly higher silhouette scores than k=3 (silhouette: 0.4065), we chose k=3 as optimal based on three key factors:

1. Elbow Method

The inertia curve shows a clear "elbow" at k=3, where inertia drops from 31,247 (k=2) to 21,004 (k=3) - a 32.8% reduction. Beyond k=3, improvements are incremental, indicating diminishing returns.

2. Business Interpretability

Three clusters provide actionable, interpretable segments: High-Value Loyal, Recent/Active, and Dormant/Lost. Seven or ten clusters would be too granular for effective marketing targeting and operational execution.

3. Overfitting Risk

Higher k values (k=7, k=10) risk overfitting - creating clusters that capture noise rather than meaningful patterns. k=3 balances model complexity with generalizability.

Key Takeaway

The optimal number of clusters is not solely determined by the highest silhouette score. Instead, it requires balancing statistical metrics (elbow point, silhouette score) with business needs (interpretability, actionability) and model complexity (avoiding overfitting). For Cookie Dough Kuwait, k=3 achieves this optimal balance.

Cluster Distribution
Customer count by cluster
Cluster Profiles
Detailed characteristics of each cluster

Dormant/Lost

9,601 customers (49.1%)

Long-inactive customers with low engagement

Recency
1429 days
Frequency
1.22 orders
Monetary
11.99 KD

Recent/Active

8,021 customers (41.1%)

Recently active customers

Recency
483 days
Frequency
1.31 orders
Monetary
13.38 KD

High-Value Loyal

1,914 customers (9.8%)

Frequent buyers with high spend

Recency
692 days
Frequency
4.42 orders
Monetary
49.95 KD
RFM Segments vs K-Means Clusters
Cross-tabulation showing how RFM segments map to K-Means clusters (percentage within each RFM segment)
RFM SegmentDormant/LostHigh-Value LoyalRecent/Active
best0.0%37.0%63.0%
frugal2.1%0.0%97.9%
lost99.9%0.1%0.0%
loyal13.0%14.5%72.5%
new0.0%0.0%100.0%
other56.7%0.0%43.3%
risk79.2%20.8%0.0%
spenders25.7%1.7%72.6%

Key Insights

  • "lost" RFM segment: Almost entirely (99.89%) falls into the "Dormant/Lost" cluster
  • "best" RFM segment: Splits between "High-Value Loyal" (37%) and "Recent/Active" (63%)
  • "frugal" RFM segment: Predominantly (98%) in "Recent/Active" cluster
  • K-Means discovers different patterns than RFM quartile-based segmentation