Publication: Issue2vec: Legal Issue Embeddings Using Citegrams
Open/View Files
Date
Authors
Published Version
Published Version
Journal Title
Journal ISSN
Volume Title
Publisher
Citation
Abstract
Most case law decisions are divided into discrete sections that address specific legal issues. But even though those sections are generally independent from one another, unrefined machine learning and natural language processing techniques treat those sections as a single document. Moreover, caselaw decisions contain citations to precedential caselaw decisions. But the tokens comprising those citations provide minimal value to the machine learning process.
This project explores these observations by creating corpus of documents where each document is a specific section from a case law decision, and where each citation is replaced with a unique n-gram or, “citegram.” The results demonstrate that isolating specific caselaw sections facilitates document similarity operations and that citegrams ably capture semantic information.