XF-16NAJDS-W
Research / Academic Paper ACTIVE

Extensions to the k-means algorithm for clustering large data sets with categorical values

Abstract

p1: sud data mining and knowledge discovery kl657-03-huang october 27, 1998 12:59 data mining and knowledge discovery 2, 283–304 (1998) c(cid:176) 1998 kluwer academic publishers. manufactured in the netherlands. extensions to the k-means algorithm for clustering large data sets with categorical values zhexue huang acsys crc, csiro mathematical and information sciences, gpo box 664, canberra, act 2601, australia huang@mip.com.au abstract. the k-means algorithm is well known for its efficiency in clustering large data sets. however, working only on numeric values prohibits it from being used to cluster real world data containing categorical values. in this paper we present two algorithms which extend the k-means algorithm to categorical domains and domains with mixed numeric and categorical values. the k-modes algorithm uses a simple matching dissimilarity measure to deal with categorical objects, replaces the means of clusters with modes, and uses a frequency-based method to update modes in the clustering process to minimise the clustering cost function. with these extensions the k-modes algorithm enables the clustering of categorical data in a fashion similar to k-means. the k-prototypes algorithm, through the definition of a combined dissimilarity measure, further integrates the k-means and k-modes algorithms to allow for clustering objects described by mixed numeric and categorical attributes. we use the well known soybean disease and credit approval data sets to …

Source: pdf_first_chars

Document Metadata

Issuer
Springer Science and Business Media LLC
Document Type
Research / Academic Paper
Publication Year
1998
Retrieved
5 May 2026
Source
Contact XFID for Access
Record ID
XF16NAJDSW
Validation
Inferred by XFID

Topics

Machine Learning

Cited by (1)

Other RESEARCH documents in the registry that cite this work.

How to Cite This Record

Use the XFID in citations to create a stable, permanent reference that resolves to this registry entry regardless of the source URL.

Academic / report citation
Springer Science and Business Media LLC (1998). Extensions to the k-means algorithm for clustering large data sets with categorical values. XFID: XF-16NAJDS-W. Retrieved from https://xframework.id/XF16NAJDSW
Identifier only
XF-16NAJDS-W