Customer Clustering Analysis
K-Means clustering with elbow method to discover natural customer groupings
K-Means clusters
Cluster quality metric
Analyzed customers
9,601 customers
- Inertia
- Silhouette Score
Interpretation: The elbow point indicates where adding more clusters provides diminishing returns. Higher silhouette scores (closer to 1) indicate better-defined clusters.
What is the Silhouette Score?
The Silhouette Score is a metric that measures how well-defined and separated clusters are. It ranges from -1 to +1:
- +1: Perfect clustering - objects are very close to their own cluster and far from other clusters
- 0: Overlapping clusters - objects are on the boundary between clusters
- -1: Incorrect clustering - objects are closer to neighboring clusters than their own
For Cookie Dough Kuwait, our silhouette score of 0.4544 indicates reasonably well-defined customer segments with clear behavioral differences.
Why k=3 is Optimal
While the analysis shows that k=7 (silhouette: 0.4458) and k=10 (silhouette: 0.4503) have slightly higher silhouette scores than k=3 (silhouette: 0.4065), we chose k=3 as optimal based on three key factors:
1. Elbow Method
The inertia curve shows a clear "elbow" at k=3, where inertia drops from 31,247 (k=2) to 21,004 (k=3) - a 32.8% reduction. Beyond k=3, improvements are incremental, indicating diminishing returns.
2. Business Interpretability
Three clusters provide actionable, interpretable segments: High-Value Loyal, Recent/Active, and Dormant/Lost. Seven or ten clusters would be too granular for effective marketing targeting and operational execution.
3. Overfitting Risk
Higher k values (k=7, k=10) risk overfitting - creating clusters that capture noise rather than meaningful patterns. k=3 balances model complexity with generalizability.
Key Takeaway
The optimal number of clusters is not solely determined by the highest silhouette score. Instead, it requires balancing statistical metrics (elbow point, silhouette score) with business needs (interpretability, actionability) and model complexity (avoiding overfitting). For Cookie Dough Kuwait, k=3 achieves this optimal balance.
Dormant/Lost
9,601 customers (49.1%)Long-inactive customers with low engagement
Recent/Active
8,021 customers (41.1%)Recently active customers
High-Value Loyal
1,914 customers (9.8%)Frequent buyers with high spend
| RFM Segment | Dormant/Lost | High-Value Loyal | Recent/Active |
|---|---|---|---|
| best | 0.0% | 37.0% | 63.0% |
| frugal | 2.1% | 0.0% | 97.9% |
| lost | 99.9% | 0.1% | 0.0% |
| loyal | 13.0% | 14.5% | 72.5% |
| new | 0.0% | 0.0% | 100.0% |
| other | 56.7% | 0.0% | 43.3% |
| risk | 79.2% | 20.8% | 0.0% |
| spenders | 25.7% | 1.7% | 72.6% |
Key Insights
- • "lost" RFM segment: Almost entirely (99.89%) falls into the "Dormant/Lost" cluster
- • "best" RFM segment: Splits between "High-Value Loyal" (37%) and "Recent/Active" (63%)
- • "frugal" RFM segment: Predominantly (98%) in "Recent/Active" cluster
- • K-Means discovers different patterns than RFM quartile-based segmentation