Hierarchical clustering for histogram data

L. Billard; Jaejik Kim

doi:10.1002/wics.1405

Hierarchical clustering for histogram data

L. Billard, Jaejik Kim

Research output: Contribution to journal › Review article › peer-review

6 Scopus citations

Abstract

Clustering methods for classical data are well established, though the associated algorithms primarily focus on partitioning methods and agglomerative hierarchical methods. With the advent of massively large data sets, too large to be analyzed by traditional techniques, new paradigms are needed. Symbolic data methods form one solution to this problem. While symbolic data can be important and arise naturally in their own right, they are particularly relevant when faced with data that emerged from aggregation of (larger) data sets. One format is when the data are histogram-valued in ℝ^p, instead of points in ℝ^p as in classical data. This paper looks at the problem of constructing hierarchies using a divisive polythetic algorithm based on dissimilarity measures derived for histogram observations. WIREs Comput Stat 2017, 9:e1405. doi: 10.1002/wics.1405. For further resources related to this article, please visit the WIREs website.

Original language	English (US)
Article number	e1405
Journal	Wiley Interdisciplinary Reviews: Computational Statistics
Volume	9
Issue number	5
DOIs	https://doi.org/10.1002/wics.1405
State	Published - Sep 1 2017
Externally published	Yes

Keywords

Euclidean extended Ichino–Yaguchi dissimilarity
cumulative density function dissimilarity
polythetic hierarchy trees

ASJC Scopus subject areas

Statistics and Probability

Access to Document

10.1002/wics.1405

Cite this

@article{ac3e5d875b1446de865b80f4a6ca094b,

title = "Hierarchical clustering for histogram data",

abstract = "Clustering methods for classical data are well established, though the associated algorithms primarily focus on partitioning methods and agglomerative hierarchical methods. With the advent of massively large data sets, too large to be analyzed by traditional techniques, new paradigms are needed. Symbolic data methods form one solution to this problem. While symbolic data can be important and arise naturally in their own right, they are particularly relevant when faced with data that emerged from aggregation of (larger) data sets. One format is when the data are histogram-valued in ℝp, instead of points in ℝp as in classical data. This paper looks at the problem of constructing hierarchies using a divisive polythetic algorithm based on dissimilarity measures derived for histogram observations. WIREs Comput Stat 2017, 9:e1405. doi: 10.1002/wics.1405. For further resources related to this article, please visit the WIREs website.",

keywords = "Euclidean extended Ichino–Yaguchi dissimilarity, cumulative density function dissimilarity, polythetic hierarchy trees",

author = "L. Billard and Jaejik Kim",

note = "Publisher Copyright: {\textcopyright} 2017 Wiley Periodicals, Inc.",

year = "2017",

month = sep,

day = "1",

doi = "10.1002/wics.1405",

language = "English (US)",

volume = "9",

journal = "Wiley Interdisciplinary Reviews: Computational Statistics",

issn = "1939-5108",

publisher = "John Wiley and Sons Inc.",

number = "5",

}

TY - JOUR

T1 - Hierarchical clustering for histogram data

AU - Billard, L.

AU - Kim, Jaejik

PY - 2017/9/1

Y1 - 2017/9/1

N2 - Clustering methods for classical data are well established, though the associated algorithms primarily focus on partitioning methods and agglomerative hierarchical methods. With the advent of massively large data sets, too large to be analyzed by traditional techniques, new paradigms are needed. Symbolic data methods form one solution to this problem. While symbolic data can be important and arise naturally in their own right, they are particularly relevant when faced with data that emerged from aggregation of (larger) data sets. One format is when the data are histogram-valued in ℝp, instead of points in ℝp as in classical data. This paper looks at the problem of constructing hierarchies using a divisive polythetic algorithm based on dissimilarity measures derived for histogram observations. WIREs Comput Stat 2017, 9:e1405. doi: 10.1002/wics.1405. For further resources related to this article, please visit the WIREs website.

AB - Clustering methods for classical data are well established, though the associated algorithms primarily focus on partitioning methods and agglomerative hierarchical methods. With the advent of massively large data sets, too large to be analyzed by traditional techniques, new paradigms are needed. Symbolic data methods form one solution to this problem. While symbolic data can be important and arise naturally in their own right, they are particularly relevant when faced with data that emerged from aggregation of (larger) data sets. One format is when the data are histogram-valued in ℝp, instead of points in ℝp as in classical data. This paper looks at the problem of constructing hierarchies using a divisive polythetic algorithm based on dissimilarity measures derived for histogram observations. WIREs Comput Stat 2017, 9:e1405. doi: 10.1002/wics.1405. For further resources related to this article, please visit the WIREs website.

KW - Euclidean extended Ichino–Yaguchi dissimilarity

KW - cumulative density function dissimilarity

KW - polythetic hierarchy trees

UR - http://www.scopus.com/inward/record.url?scp=85027687872&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85027687872&partnerID=8YFLogxK

U2 - 10.1002/wics.1405

DO - 10.1002/wics.1405

M3 - Review article

AN - SCOPUS:85027687872

SN - 1939-5108

VL - 9

JO - Wiley Interdisciplinary Reviews: Computational Statistics

JF - Wiley Interdisciplinary Reviews: Computational Statistics

IS - 5

M1 - e1405

ER -

Hierarchical clustering for histogram data

Abstract

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this