Search
Now showing items 1-10 of 22
Rich Linguistic Structure from Large-Scale Web Data
(2013-10-18)
The past two decades have shown an unexpected effectiveness of Web-scale data in natural language processing. Even the simplest models, when paired with unprecedented amounts of unstructured and unlabeled Web data, have ...
Revisiting Random Utility Models
(2014-06-06)
This thesis explores extensions of Random Utility Models (RUMs), providing more flexible models and adopting a computational perspective. This includes building new models and understanding their properties such as ...
Coupling and Parallelization in Statistical Inference
(2021-01-19)
This thesis considers the design of Markov chain Monte Carlo (MCMC) estimators using couplings.
Couplings play an important part in Markov chain theory, and in recent decades they have also taken on a
central role in ...
Optimizing Methods for Suicide Prediction
(2022-05-23)
Suicide is one of the leading causes of death worldwide, yet clinicians find it difficult to reliably identify individuals at high risk for suicide. Algorithmic approaches for suicide risk detection have been developed in ...
Discriminative Sequence Models Extract Personally Identifiable Information from Public Gene Expression Datasets
(2022-05-25)
The growing scale of functional genomics datasets is enabling researchers to better understand the genetic determinants of gene expression, for example through expression quantitative trait loci (eQTL) studies.
With an ...
OpenDP Programming Framework for Renyi Privacy Filters and Odometers
(2022-05-23)
Data scientists work with large-scale sensitive data, which inevitably leads to privacy risks. Differential Privacy (DP) is a mathematical definition of privacy that aims to mitigate privacy risks inherent in data analysis ...
Off-Policy Evaluation of Reinforcement Learning in Healthcare
(2020-08-10)
Reinforcement learning is a method for learning optimal strategies for tasks which require making sequences of decisions. The ability to make decisions in a manner which balances short versus long term outcomes makes ...
The 2020 Presidential Election on Twitter: An Exploration of Candidates’ Social Presence, Campaign Momentum, and the Effect of Misinformation
(2021-06-04)
Twitter has evolved from a site of inconsequential information spread to an instant primary source used as the preferred outlet to discuss and witness any semblance of news the emerges each day. Political outreach thrives ...
“Please Respect Our Terms and Conditions”: A Causal Analysis of GDPR Impact on Privacy Policies
(2021-06-04)
The General Data Protection Regulation (GDPR) has been widely praised as the most consequential privacy law in history. However, GDPR causal effects have never been formally analyzed, and all GDPR praises are largely ...
Social Bot Detection Through Model-Based Time Series Clustering
(2021-06-04)
In order to maintain the health and safety of online communities, it is important to understand and detect potential threats. Because of the possibility of large-scale impacts achieved through automation, social bots can ...