Abstract
Clustering methods for classical data are well established, though the associated algorithms primarily focus on partitioning methods and agglomerative hierarchical methods. With the advent of massively large data sets, too large to be analyzed by traditional techniques, new paradigms are needed. Symbolic data methods form one solution to this problem. While symbolic data can be important and arise naturally in their own right, they are particularly relevant when faced with data that emerged from aggregation of (larger) data sets. One format is when the data are histogram-valued in ℝp, instead of points in ℝp as in classical data. This paper looks at the problem of constructing hierarchies using a divisive polythetic algorithm based on dissimilarity measures derived for histogram observations. WIREs Comput Stat 2017, 9:e1405. doi: 10.1002/wics.1405. For further resources related to this article, please visit the WIREs website.
Original language | English (US) |
---|---|
Article number | e1405 |
Journal | Wiley Interdisciplinary Reviews: Computational Statistics |
Volume | 9 |
Issue number | 5 |
DOIs | |
State | Published - Sep 1 2017 |
Keywords
- cumulative density function dissimilarity
- Euclidean extended Ichino–Yaguchi dissimilarity
- polythetic hierarchy trees
ASJC Scopus subject areas
- Statistics and Probability