American Statistical Association
New York City
Metropolitan Area Chapter

Levin Lecture Series: Spring 2018 Colloquium Seminars
Department of Biostatistics
Columbia University



CLUSTERING MIXED-TYPE DATA

by

Marianthi Markatou
Professor, Department of Biostatistics
University of Buffalo

Host: Dr. Ying Wei


Abstract

Despite the existence of a large number of clustering algorithms, clustering mixed interval (continuous) and categorical (nominal and/or ordinal) scale data remains a challenging problem. We show that current clustering methods for mixed-scale data suffer from at least one of two central challenges: 1) they are unable to equitably balance the contribution of continuous and categorical scale variables without strong parametric assumptions; 2) they are unable to properly handle data sets in which only a subset of variables are related to the underlying cluster structure of interest. We first develop KAMILA (KAY-means for Mixed Large data), a clustering method that addresses (1) and in many situations (2) without requiring strong assumptions. We next develop MEDEA (Multivariate Eigenvalue Decomposition Error Adjustment), a weighting scheme that addresses (2) even in the face of a large number of uninformative variables. We study theoretical aspects of our methods and demonstrate their performance using Monte Carlo simulations and real data sets.


Date: Thursday, April 12, 2018
Time: 11:30 A.M. - 12:30 P.M.
Location: Mailman School of Public Health
Department of Biostatistics
722 West 168th Street
AR Building
8th Floor Auditorium
New York, New York

The World of Statistics
Home Page | Chapter News | Chapter Officers | Chapter Events
Other Metro Area Events | ASA National Home Page | Links To Other Websites
NYC ASA Chapter Constitution | NYC ASA Chapter By-Laws

Page last modified on April 3, 2018
Copyright © 1998-2018 by New York City Metropolitan Area Chapter of the ASA
Designed and maintained by Cynthia Scherer
Send questions or comments to nycasa@nycasa.org