Faculty of Arts and Sciences
Permanent URI for this communityhttps://dash.harvard.edu/handle/1/1
This community provides open access to material created by faculty, staff, and students of the Faculty of Arts and Sciences. All material in the repository is also harvested by search engines (such as Google Scholar) and Open Archives Initiative data harvesters.
Browse
275 results
Search Results
Publication Analysis of the Harvard Computer Society Email Archives: An Exploration of Differential Privacy in Practice(2024-11-26) Cooper, William Chen; Dwork, CynthiaThis thesis provides a rudimentary introduction to differential privacy as a framework for modern data privacy, using the Harvard Computer Society email list archives as an investigative medium. The differentially private analysis of this dataset includes but is not limited to: time series of list usage, email topic modeling, and sentiment analysis. OpenDP’s Python package for differential privacy is used extensively to execute computations, and the API is evaluated as a standalone programming framework within itself. Novel graph differential private algorithms are both implemented and empirically assessed. Lastly, this thesis discusses a significant inherent challenge in balancing contrasting aspects of differential privacy and exploratory data analysis.Publication An Electrifying Framework for the Future of Transport Optimizing Electric Vehicle Charging Infrastructure for Enhanced Adoption(2024-11-26) Emeigh III, Terry Robin; Voulgaris, Carole; Bangia, SachetAs electric vehicles begin to enter price points that are competitive with internal combustion engine vehicles, a greater dialogue must be had surrounding their lack of adoption. Despite monetary incentives from the government, environmental considerations, and technological advancements electric vehicles are continuously under-adopted in comparison to both their gasoline-powered and hybrid counterparts. Persistent worries about inadequate infrastructure to support electric vehicles dominate the public conscience. These faults necessitate a robust approach to the optimization of electric vehicle charging locations which considers existing travel behaviors such that the complexities of owning an electric vehicle do not have as adverse of an effect on their potential adopters, as they would otherwise. By deploying a genetic algorithm which randomly samples properties throughout Boston, this research assesses the viability of each potential charging location under a set of criteria concerning the proposed location's popularity as a destination for trips with dwell times sufficient enough to charge an electric vehicle, accessibility to nearby amenities, number of trips generated by the property, maximal distance from other proposed chargers, as well as maximal distance from residences with the capacity to charge their vehicle from home. Within the subsequent analysis, we find that the optimized charging locations do indeed align with these idealized hypothetical locations, suggesting that through the deployment of the framework devised in this research, charging locations can be situated optimally into the existing travel behaviors of individuals. Refinements in charging infrastructure allocation as proposed by the methods of this research, are conducive to enhanced adoption rates of electric vehicles as these optimal charging locations situate themselves equitability amongst the existing travel trends of drivers.Publication Using Sentinel 1 C-band SAR imagery to Detect Avalanches: An Analysis of Smaller Scale Avalanches and Proposed Algorithm(2024-11-26) Olsen, Oscar; Huybers, PeterSnow avalanches are a destructive natural hazard that pose a serious threat to humans, ecosystems, and the built environment in mountain regions. There is a lot of human subjectivity involved in the field of avalanche prediction and mitigation. One potential solution is to add objectivity through predictive machine learning models. These models are, however, limited by a lack of training data due to limited, manually recorded avalanche occurrences. Research has been performed into using Sentinel-1 C-band Synthetic Aperture Radar (SAR) imagery to detect avalanches. The detection of avalanches using Sentinel 1 C-band SAR imagery is a problem that has been addressed for large scale avalanches, however, the detection of small-scale avalanches has not been adequately addressed. This investigation looked at using four detection techniques to detect smaller scale avalanches: K-means, DBSCAN, global thresholding and the replication of the Karbou et al. 2022 method. It was found that the global thresholding algorithm performed the best, but the identification of small-scale avalanches is a difficult problem due to the complexity of distinguishing these events from surrounding noise. To reduce errors due to the noise, a mask based off optical Sentinel 2 images using global thresholding was developed. The detection of small-scale avalanches is a difficult and unsolved problem, this thesis attempted to use relatively novel techniques and data in order to detect them, and some leads were detected, though the problem remains difficult.Publication On Arbitrage in Single- and Multi-token Uniswap Markets(2024-11-26) Sun, Nathan; Chen, YilingUniswap and other constant product market makers have proven to be popular decentralized cryptocurrency exchanges, despite their simplicity. However, the exchange rate between pairs of currencies need not be consistent across the entire Uniswap market, and such price discrepancies open up the possibility of arbitrage. In this paper, we propose a theoretical examination into the possibility of efficient arbitrage in single- and multi-token Uniswap markets. We construct a polynomial-time cyclic arbitrage algorithm for single-token Uniswap markets and give insights into the state of the market after arbitrage. We then generalize to multi-token Uniswap markets by constructing a linear program that allows the arbitrage of asset bundles. After arbitraging the market, we then extend results by Goyal et al. (2023) to provide an optimal liquidity provision strategy for a Uniswap market. This provides a pipeline for an arbitrageur: they first extract profit from the market via arbitrage, and then they may reinvest their profits back into the market in the form of liquidity provision. Finally, we conclude with an empirical study into the profitability of arbitrage in historical Uniswap markets.Publication Preaching to the Choir: An AI-Based Analysis of Religious Demand in U.S. Church Sermons, 2000-2023(2024-11-26) Mokski, Elliott Pier; Shleifer, Andrei; McCleary, RachelChristianity retains an outsized impact in American society, yet it is very difficult to quantify what congregants hear at church and how it differs across space and time. In this thesis, I provide evidence that sermons respond to demand from the congregation. I construct a dataset with over 150,000 full-text sermons given at churches throughout the United States over the last two decades. I develop a novel text analysis method based on large language models (LLMs) to extract sentiment and other rhetorical attributes from text. I show that this AI-based text quantification tool can easily be applied in any social science research setting, automating the task of human labeling with 1800x reductions in cost. Using this method on the sermons, I show that pastors respond to both political demand and to economic shocks, even when controlling for denominational effects. When economic times get tough, sermons become increasingly pessimistic, less charitable, and less compassionate. Yet there also exists a large racial divide in how churches react to economic hardship. As poverty increases at the census tract level, sermons in white communities become less compassionate and charitable, while those in Black neighborhoods actually increase in compassion and optimism. I verify the analysis through a Bartik-like event study using China's accession to the World Trade Organization to instrument for unemployment. The new LLM research tools developed in this thesis allowed for a far more granular analysis of American churches than previously available, at precinct, tract, and county levels.Publication Developing an Educational Environmental Game(2024-11-26) Premier, Natural; Huybers, PeterWiththesteadyimprovementoftechnologicalpowerandaccessibility,games havebecomewidespreadandpopularamongallages.Educationalgamesespecially havethepotentialtofosterlearningefficientlyandenjoyably.Thegoalofthisproject wastocreateanenvironmentalgamethatwassimultaneouslyengagingandrealenough toinformplayersaboutthenuancesofclimatechange.Thegamewasbuiltcompletely intheUnitygameengine,andreliesonanenvironmentalmodeltomakeclimate simulationsbasedonactionstakenbytheplayerin-game.Themodelrunsinthe backgroundwithinthegame.Theendproductachievedthisgoal,thoughwithalimited scope.Thegame,calledNP(NaturalPremier)Enviro(mental)Game24,makesuseof bothactiveandpassiveuseractiontoengageplayers.Thegamewasdesignedtogamify importantmethodsofclimateactiontogiveplayersinspirationonrealworldclimate problems.TheEnvironmentalscienceisalsoeasilydigestible,allowingforanenjoyable experienceregardlessoftheplayer’slevelofbackgroundknowledge.Thoughmuchof theprojectisthegameitself,thispaperisintendedtoprovidecontextonthepurposeof thegame,designchoices,andfurtherstepsbeyondthisprojectPublication The Real Burnout: The Effects of Climate Change and Particulate Air Matter Pollution on K-12 Education(2024-11-26) Sekar, Janani; Dell, Melissa L; Friedman, BenAs global warming rises, environmental factors pose new challenges for young individuals; the detrimental effects of pollution exposure extend into health, social well-being, and schooling. This paper introduces a novel method to utilize remote-sensing data to study pollution, specifically PM2.5 (a particulate matter air pollutant that comes from sources such as exhaust and natural fires). To fill in these knowledge gaps from EPA pollution monitors and develop an alternative source of reliable air quality data, I fine-tune a neural model to detect PM2.5 levels from high-resolution satellite images. I construct a data set of approximately 2,500 satellite images taken before, during, and after large wildfires in California, Oregon, and Colorado. I label the images by their corresponding PM2.5 pollution levels, as reported by the nearest EPA air quality monitor. After optimizing hyper parameters, the testing accuracy is just below 90 percent for ViT, while slightly above 90 percent for ResNet-50 and Swin Transformer. The model distinguishes well between very poor and good air quality, with most ambiguities and mistakes at intermediate levels. After completing this analysis, I examine the effects of pollution exposure on student academic performance. I combine pollution data from the EPA with school-level standardized testing data in California, creating a panel that spans 16 years. I find that air pollution exposure has a statistically significant negative effect on the percentage of students who meet or exceed standards on statewide standardized tests, with more severe effects in male, Black, and economically disadvantaged students.Publication Rowing Against the Wind: An Analysis of the Impact of Variable Wind Conditions on Current and Prospective Rowing Selection Methods(2024-11-26) Sullivan, Alexander Michael; Brenner, Michael PIn rowing, it is vitally important to select the best possible crew of athletes in order to win. This is no trivial task. The current method of selection (seat racing) requires inefficient direct comparisons between athletes but is used because it is assumed to generate correct results. As such two questions must be asked. Firstly, ``Does the current method of selection result in the correct athletes being selected?" Secondly, ``Are there alternative forms of selection that may be more accurate or efficient?" This paper finds the current method of selection produces incorrect and biased results when wind conditions vary. This was demonstrated by analysing selection races and regular races to identify statistical trends in their outcomes, finding unfair advantages for some athletes. This paper also examines alternative methods of selection using machine learning, measured on synthetic race data and real selection data. Whilst promising, the consistency of these model was similarly vulnerable to variable winds. Most, if not all, current and potential selection method are barely better than a coin flip when variable winds are at their worst.Publication Pick Me: Reducing Wastefulness in the Random Serial Dictatorship Mechanism(2024-11-26) Bohnet Zurcher, Dominik; Parkes, David; Ravindranath, SaiThis thesis investigates algorithmic improvements to the Random Serial Dictatorship (RSD) Mechanism for allocating indivisible goods, focusing on reducing wastefulness in outcomes when truncated preference reports are allowed. It corroborates Erdil's (2014) refutation of the RSD uniqueness conjecture and answers his open question, seeking to identify the frontier of strategyproof improvements. A linear program is devised to minimize wastefulness while enforcing other desirable properties of RSD, such as ex-post efficiency, equal treatment of equals, and strategyproofness. Through this approach, I identify efficient, fair, and incentive-compatible improvements in both the 4-agent, 3-item and 4-agent, 4-item scenarios, uncovering numerous potential profiles for improvements. The proposed approach, based on solving linear programs, poses several scalability challenges. To address this, I exploit symmetry and make use of constraint generation techniques, resulting in empirical improvements.Publication Geometric Methods for Quantitative Analysis of Romance Languages(2024-11-26) McDonald, Patrick William; Weber, MelaniePrevious work has introduced various quantitative methods to investigate the historical and/or phonetic interrelation of languages and their speakers. Additionally, environments such as hyperbolic space have been found (both theoretically and empirically) to be conducive to representing hierarchically-structured datasets, such as phylogenetic cell data. This thesis tests the suitability of hyperbolic space for representing pronunciation data from several Romance languages, a linguistic family that apparently developed per a hierarchical structure – i.e., one where modern languages are interrelated via tree-like descent from common ancestors. The thesis involves Python implementations of a.) a pipeline that transforms audio files into workable mathematical objects and b.) baseline methods for the aggregation and analysis of this speech data with respect to language-wise covariance structures. We then outline a framework for analyzing the speech data in a hyperbolic setting, whose performance we compare to that of the baseline methods on the tasks of a.) language space reconstruction and b.) interspeaker interpolation. We find that with proper hyperparameter tuning, the Poincaré disk model of hyperbolic geometry is indeed capable of representing the language space and speaker interrelations apparent in our Romance language dataset, suggesting that the hyperbolic setting could be a promising quantitative framework for future linguistic analysis.