American Statistical Association
New York City
Metropolitan Area Chapter

Mailman School of Public Health
Columbia University
Department of Biostatistics Colloquium



FINDING SIGNIFICANT LARGE-AVERAGE SUBMATRICES
IN HIGH DIMENSIONAL DATA

by

Professor Andrew B. Nobel
Departments of Statistics & Computer Science
University of North Carolina, Chapel Hill


Abstract

Exploratory analysis of gene expression and other high dimensional data often begins with row and column clustering, which yields a partition of the data matrix into disjoint sample-variable blocks (submatrices). Of particular interest in practice are submatrices whose entries are large on average. In conjunction with clinical and functional annotation, large average submatrices are often the starting point for subsequent biological analyses, such as the identification of genetic pathways and new disease subtypes.

We describe a simple algorithm, belonging to the general category of biclustering methods, for identifying large average submatrices in high dimensional data. Like other biclustering methods, the algorithm improves on independent sample variable clustering in several respects. First, the submatrices it identifies can overlap and they need not cover the entire data matrix, features that better reflect underlying biology. Secondly, the inclusion of samples and variables in a submatrix depends only on their expression values inside that submatrix. The algorithm seeks to maximize a simple measure of statistical significance, and through this measure, has close connections with the minimum description length principle. We will discuss the applications of the algorithm to a recent breast cancer study, and compare its performance with several other biclustering methods. If time permits, we will present some related theoretical results.

The talk should be accessible to statisticians, computer scientists, and computational biologists.

Joint work with Andrey Shabalin, Vic Weigman, and Charles Perou.


Date: Thursday, October 25, 2007
Time: 4:00 - 5:00 P.M.
Location: Mailman School of Public Health
Department of Biostatistics
722 West 168th Street
Judith Jansen Conference Room
4th Floor - Room 425
New York, New York

RESERVATIONS ARE NOT REQUIRED

Refreshments will be served at 3:30 P.M. in the
Biostatistics Conference Room (R627).


Home Page | Chapter News | Chapter Officers | Chapter Events
Other Metro Area Events | ASA National Home Page | Links To Other Websites

Copyright © 2007 by New York City Metropolitan Area Chapter of the ASA
Designed and maintained by Cynthia Scherer
Send questions or comments to nycasa@mindspring.com