DataLab Projects
Our research projects typically involve a balance of both theory and application. Theoretical models provide a framework for principled and sound analysis of techniques and algorithms for inferring patterns from data. But data analysis is fundamentally a practical problem: the investigation of ideas and application of algorithms to real-world data sets is a very important component of our work. The following are a few of the application areas for our work, and some specific projects in each.
Computational Methods for Biological Phenomena
2005 - present
FMRI Activation Maps
Heirarchical Bayesian models are used to learn
the number, locations, and shapes of "active" regions in the
brain during functional magnetic resonance imaging (FMRI),
and how these quantities vary from subject to subject and with
the presence of conditions such as schizophrenia.
In association with the National Alliance for Medical Imaging
Computing.
2004 - present
Identifying Hair-Cycle Genes
New experimental devices such as gene chip arrays
enable researchers to measure the time-varying expression profiles
of very large numbers of genes. Using a probabilistic model, we can
infer which of these genes are involved in cyclic phenomena such
as hair growth, and identify groups of genes which share similar
expression patterns. Joint work with Kevin Lin and Bogi Andersen.
Estimation and Simulation in Atmospheric Systems
2003 - present
Probabilistic models, such as input-driven hidden Markov
models, are fit to historical records of precipitation at rain stations in
different geographical regions (Western US, Brazil, Kenya,
Western Australia). The models can be used for seasonal forecasting
and simulation of rainfall. Joint work with the International Research
Institute for Climate Prediction, Columbia University.
2004 - present
Tracking the ITCZ
The Intertropical Convergence Zone (ITCZ) is an important
atmospheric feature located over the tropical ocean basins. The ITCZ is
dynamic, following a general process of formation, undulation, and breakdown
over periods on the order of 10 days. Using satellite imagery, we investigate
how probabilistic algorithms can be used to automatically track and
characterize the ITCZ and related atmospheric activity.
Learning and Data Mining in Text
2002 - present
Topic Modeling
Probabilistic models can be used to sort through
large collections of text data, automatically learning what topics
are represented, along with other quantities such as author interests.
A nice example of the results can be seen
here.
Building Models of Human Behavior
2005 - present
Monitoring Building Access
Improvements in technology have made it feasible to
closely monitor an environment using wireless networks of sensors. However,
to be useful this information must be processed in order to determine,
for example, the typical patterns of behavior present in the data and
detect anomalies or deviations from those patterns. We explore new ways
of modeling behavior observed via sensors within the CalIT2 building, and
automatically inferring useful information about the building and its usage.
2004 - present
Relational Network Analysis
Often, we wish to understand or predict the
relationships and interactions between a large number of "entities".
Examples include social networks of people, connected by interactions
such as email exchanges; the internet, characterized by links between web
pages; even biological networks of biochemical interactions. In these
problems, we may wish to visualize the structure of the network, understand
which entities are most important (perhaps relative to another entity), or
predict new connections. See also the
UCI KDD page.