DataLab Projects

Our research projects typically involve a balance of both theory and application. Theoretical models provide a framework for principled and sound analysis of techniques and algorithms for inferring patterns from data. But data analysis is fundamentally a practical problem: the investigation of ideas and application of algorithms to real-world data sets is a very important component of our work. The following are a few of the application areas for our work, and some specific projects in each.

Computational Methods for Biological Phenomena

2005 - present
FMRI Activation Maps
Heirarchical Bayesian models are used to learn the number, locations, and shapes of "active" regions in the brain during functional magnetic resonance imaging (FMRI), and how these quantities vary from subject to subject and with the presence of conditions such as schizophrenia. In association with the National Alliance for Medical Imaging Computing.
2004 - present
Identifying Hair-Cycle Genes
New experimental devices such as gene chip arrays enable researchers to measure the time-varying expression profiles of very large numbers of genes. Using a probabilistic model, we can infer which of these genes are involved in cyclic phenomena such as hair growth, and identify groups of genes which share similar expression patterns. Joint work with Kevin Lin and Bogi Andersen.

Estimation and Simulation in Atmospheric Systems

2003 - present
Probabilistic models, such as input-driven hidden Markov models, are fit to historical records of precipitation at rain stations in different geographical regions (Western US, Brazil, Kenya, Western Australia). The models can be used for seasonal forecasting and simulation of rainfall. Joint work with the International Research Institute for Climate Prediction, Columbia University.
2004 - present
Tracking the ITCZ
The Intertropical Convergence Zone (ITCZ) is an important atmospheric feature located over the tropical ocean basins. The ITCZ is dynamic, following a general process of formation, undulation, and breakdown over periods on the order of 10 days. Using satellite imagery, we investigate how probabilistic algorithms can be used to automatically track and characterize the ITCZ and related atmospheric activity.

Learning and Data Mining in Text

2002 - present
Topic Modeling
Probabilistic models can be used to sort through large collections of text data, automatically learning what topics are represented, along with other quantities such as author interests. A nice example of the results can be seen here.

Building Models of Human Behavior

2005 - present
Monitoring Building Access
Improvements in technology have made it feasible to closely monitor an environment using wireless networks of sensors. However, to be useful this information must be processed in order to determine, for example, the typical patterns of behavior present in the data and detect anomalies or deviations from those patterns. We explore new ways of modeling behavior observed via sensors within the CalIT2 building, and automatically inferring useful information about the building and its usage.
2004 - present
Relational Network Analysis
Often, we wish to understand or predict the relationships and interactions between a large number of "entities". Examples include social networks of people, connected by interactions such as email exchanges; the internet, characterized by links between web pages; even biological networks of biochemical interactions. In these problems, we may wish to visualize the structure of the network, understand which entities are most important (perhaps relative to another entity), or predict new connections. See also the UCI KDD page.