The data sets used in Doug Downey's PhD thesis are below, arranged in thesis order. The datasets are documented in readme files for each chapter; for more details, please see the thesis document.

The Java classes needed to execute Urns are provided here without separate documentation. The "probabilitiesForCounts" method learns urn parameters and computes probabilities for a given a set of extraction parameters; an example of its use is given in main(). Note: The code requires both the weka and colt libraries.

An HMM-T model file is provided for a model trained on the REALM corpus above. Each line gives a term, along with its 20-dimensional latent state distribution.