4 Cluster Analysis
Rafael Irizarry
Cluster analysis is a class of statistical techniques that can be applied to data that exhibit natural groupings. Cluster analysis makes no distinction between dependent and independent variables. The entire set of interdependent relationships is examined.
Cluster analysis sorts through the raw data on customers and groups them into clusters. A cluster is a group of relatively homogeneous customers. Customers who belong to the same cluster are similar to each other. They are also dissimilar to customers outside the cluster, particularly customers in other clusters. The primary input for cluster analysis is a measure of similarity between customers, such as correlation coefficients, distance measures, and association coefficients.
The following are the basic steps involved in cluster analysis:
- Formulate the problem and select the variables you want to use as the basis for clustering.
- Compute the distance between customers along the selected variables.
- Apply the clustering procedure to the distance measures.
- Decide on the number of clusters.
- Map and interpret clusters and draw conclusions—illustrative techniques like perceptual maps are useful.