DataLab Software

Experimental code for reproducing the results and figures presented in the work of N. Navaroli and P. Smyth - Modeling Response Time in Digital Human Communication, Proceedings of the International AAAI Conference on Web and Social Media (ICWSM 2015). Download code | Download input files | Download precomputed experimental results

KDE A fast Java implementation of kernel density estimation for geo-location data. It was implemented for the work of M. Lichman and P. Smyth Modeling Human Location Data with Mixtures of Kernel Densities, proceedings of the 20th ACM SIGKDD.

FAST (Fast and Scalable Topic Modeling Toolbox) is a toolbox of Matlab, MPI, and C implementations of various distributed and fast topic modeling algorithms, including distributed collapsed Gibbs sampling (CGS) (Newman et al, JMLR 2009), asynchronous distributed CGS (Asuncion et al, Statistical Methodology, 2011), Fast-LDA (Porteous et al., KDD 2008), and fast collapsed variational inference (Asuncion et al., UAI 2009). Code written by Arthur Asuncion and collaborators.

Latent Set Model (LSM) code: an R package for implementing a latent set model for coappearance data, from DuBois et al, ICWSM, 2011. Code by Chris DuBois.

JUNG (the Java Universal Network/Graph Framework) is a Java-based software library that provides a common and extendible language for the modeling, analysis, and visualization of data that can be represented as a graph or network. Code written by Joshua O Madadhain, Scott White, and various collaborators.

The Multivariate Non-homogeneous HMM (MVNHMM) Toolbox provides C-code for modeling multivariate time series with hidden Markov models. Written by Sergey Kirshner.

Curve Clustering Toolbox, a Matlab toolbox implementing a family of probabilistic model-based curve-aligned clustering algorithms. Written by Scott Gaffney.

Gaussian Mixture Modeling software provides C-code for the Expectation-Maximization (EM) algorithm for fitting mixtures of Gaussians to data. Written by Igor Cadez.

Useful Links

The Machine Learning Repository is a repository of real-world and synthetic data sets used for empirical analysis of and comparison among machine learning algorithms.

The KDD Archive provides an online repository of large data sets encompassing a wide variety of data types, analysis tasks, and application areas.