Person:

Zittrain, Jonathan

Loading...
Profile Picture

Email Address

AA Acceptance Date

Birth Date

Research Projects

Organizational Units

Job Title

Last Name

Zittrain

First Name

Jonathan

Name

Zittrain, Jonathan

Search Results

Now showing 1 - 8 of 8
  • Publication

    Answering impossible questions: Content governance in age of disinformation

    (Shorenstein Center for Media, Politics and Public Policy, at Harvard University, John F. Kennedy School of Government, 2020) Bowers, John; Zittrain, Jonathan

    The governance of online platforms has unfolded across three eras – the era of Rights (which stretched from the early 1990s to about 2010), the era of Public Health (from 2010 through the present), and the era of Process (of which we are now seeing the first stirrings). Rights-era conversations and initiatives amongst regulators and the public at large centered dominantly on protecting nascent spaces for online discourse against external coercion. The values and doctrine developed in the Rights era have been vigorously contested in the Public Health era, during which regulators and advocates have focused (with minimal success) on establishing accountability for concrete harms arising from online content, even where addressing those harms would mean limiting speech. In the era of Process, platforms, regulators, and users must transcend this stalemate between competing values frameworks, not necessarily by uprooting Rights-era cornerstones like CDA 230, but rather by working towards platform governance processes capable of building broad consensus around how policy decisions are made and implemented. Some first steps in this direction, preliminarily explored here, might include making platforms “content fiduciaries,” delegating certain key policymaking decisions to entities outside of the platforms themselves, and systematically archiving data and metadata about disinformation detected and addressed by platforms.

  • Publication

    Impeachment Defends the Constitution and Bill of Rights

    (Reiss Center on Law and Security, New York University School of Law, 2021-01-13) Zittrain, Jonathan
  • Publication

    Platform Accountability Through Digital "Poison Cabinets"

    (Knight First Amendment Institute at Columbia University, 2021-04-13) Bowers, John; Sedenberg, Elaine; Zittrain, Jonathan

    Preserving records of what user content is taken down—and why—could make platforms more accountable and transparent.

  • Publication

    The Paper of Record Meets an Ephemeral Web: An Examination of Linkrot and Content Drift within The New York Times

    (Harvard Innovation Lab, Harvard Law School, 2021-04-26) Zittrain, Jonathan; Bowers, John; Stanton, Clare

    Hyperlinks are a powerful tool for journalists and their readers. Diving deep into the context of an article is just a click away. But hyperlinks are a double-edged sword; for all of the internet’s boundlessness, what’s found on the web can also be modified, moved, or entirely disappeared. This often-irreversible decay of web content is commonly known as linkrot. It comes with a similar problem of content drift, or the often-unannounced changes––retractions, additions, replacement––to the content at a particular URL.    Our team of researchers at Harvard Law School has undertaken a project to gain insight into the extent and characteristics of journalistic linkrot and content drift. We examined hyperlinks in New York Times articles starting with the launch of the Times website in 1996 up through mid-2019, developed on the basis of a dataset provided to us by the Times. We focus on the Times not because it is an influential publication whose archives are often used to help form a historical record. Rather, the substantial linkrot and content drift we find here across the New York Times corpus accurately reflects the inherent difficulties of long-term linking to pieces of a volatile web.   Results show a near linear increase of linkrot over time, with interesting patterns emerging within certain sections of the paper or across top level domains. Over half of articles containing at least one URL also contained a dead link. Additionally, of the ostensibly “healthy” links existing in articles, a hand review revealed additional erosion to citations via content drift.

  • Publication

    Should Donald Trump be returned to social media?

    (2022-10-14) Zittrain, Jonathan
  • Publication

    Intellectual Debt

    (Cambridge University Press, 2022-11-17) Zittrain, Jonathan

    In this chapter, law and technology scholar Jonathan Zittrain warns of the danger of relying on answers for which we have no explanations. There are benefits to utilising solutions discovered through trial and error rather than rigorous proof: though aspirin was discovered in the late 19th century, it was not until the late 20th century that scientists were able to explain how it worked. But doing so accrues ‘intellectual debt’. This intellectual debt is compounding quickly in the realm of AI, especially in the subfield of machine learning. Whereas we know that ML models can create efficient, effective answers, we don’t always know why the models come to the conclusions they do. This makes it difficult to detect when they are malfunctioning, being manipulated, or producing unreliable results. When several systems interact, the ledger moves further to the red. Society’s movement from basic science towards applied technology that bypasses rigorous investigative research inches us closer to a world in which we are reliant on an oracle AI, one in which we trust regardless of our ability to audit its trustworthiness. Zittrain concludes that we must create an intellectual debt ‘balance sheet’ by allowing academics to scrutinise the systems.

  • Publication

    Answering Impossible Questions: Content Governance in an Age of Disinformation

    (Shorenstein Center for Media, Politics, and Public Policy, 2020-01-14) Zittrain, Jonathan; Bowers, John

    The governance of online platforms has unfolded across three eras – the era of Rights (which stretched from the early 1990s to about 2010), the era of Public Health (from 2010 through the present), and the era of Process (of which we are now seeing the first stirrings). Rights-era conversations and initiatives amongst regulators and the public at large centered dominantly on protecting nascent spaces for online discourse against external coercion. The values and doctrine developed in the Rights era have been vigorously contested in the Public Health era, during which regulators and advocates have focused (with minimal success) on establishing accountability for concrete harms arising from online content, even where addressing those harms would mean limiting speech. In the era of Process, platforms, regulators, and users must transcend this stalemate between competing values frameworks, not necessarily by uprooting Rights-era cornerstones like CDA 230, but rather by working towards platform governance processes capable of building broad consensus around how policy decisions are made and implemented. Some promising steps in this direction could include delegating certain key policymaking decisions to entities outside of the platforms themselves; making platforms “information” or “content” fiduciaries; and systematically archiving data and metadata about disinformation detected and addressed by platforms.

  • Publication

    Institutional Books 1.0: A 242B Token Dataset from Harvard Library's Collections, Refined for Accuracy and Usability

    (2025-06-10) Cargnelutti, Matteo; Brobston, Catherine; Hess, John; Cushman, Jack; Mukk, Kristi; Scourtas, Aristana; Courtney, Kyle; Leppert, Greg; Watson, Amanda; Whitehead, Martha; Zittrain, Jonathan

    Large language models (LLMs) use data to learn about the world in order to produce meaningful correlations and predictions. As such, the nature, scale, quality, and diversity of the datasets used to train these models, or to support their work at inference time, have a direct impact on their quality. The rapid development and adoption of LLMs of varying quality has brought into focus the scarcity of publicly available, high-quality training data and revealed an urgent need to ground the stewardship of these datasets in sustainable practices with clear provenance chains. To that end, this technical report introduces Institutional Books 1.0, a large collection of public domain books originally digitized through Harvard Library's participation in the Google Books project, beginning in 2006. Working with Harvard Library, we extracted, analyzed, and processed these volumes into an extensively-documented dataset of historic texts. This analysis covers the entirety of Harvard Library's collection scanned as part of that project, originally spanning 1,075,899 volumes written in over 250 different languages for a total of approximately 250 billion tokens. As part of this initial release, the OCR-extracted text (original and post-processed) as well as the metadata (bibliographic, source, and generated) of the 983,004 volumes, or 242B tokens, identified as being in the public domain have been made available. This report describes this project's goals and methods as well as the results of the analyses we performed, all in service of making this historical collection more accessible and easier for humans and machines alike to filter, read and use.