Publication:
Issue2vec: Legal Issue Embeddings Using Citegrams

No Thumbnail Available

Date

2022-01-20

Published Version

Published Version

Journal Title

Journal ISSN

Volume Title

Publisher

The Harvard community has made this article openly available. Please share how this access benefits you.

Research Projects

Organizational Units

Journal Issue

Citation

Murphy, Owen. 2021. Issue2vec: Legal Issue Embeddings Using Citegrams. Master's thesis, Harvard University Division of Continuing Education.

Research Data

Abstract

Most case law decisions are divided into discrete sections that address specific legal issues. But even though those sections are generally independent from one another, unrefined machine learning and natural language processing techniques treat those sections as a single document. Moreover, caselaw decisions contain citations to precedential caselaw decisions. But the tokens comprising those citations provide minimal value to the machine learning process. This project explores these observations by creating corpus of documents where each document is a specific section from a case law decision, and where each citation is replaced with a unique n-gram or, “citegram.” The results demonstrate that isolating specific caselaw sections facilitates document similarity operations and that citegrams ably capture semantic information.

Description

Other Available Sources

Keywords

caselaw, citations, document embedding, legal NLP, machine learning, natural language processing (NLP), Computer science, Artificial intelligence, Information science

Terms of Use

This article is made available under the terms and conditions applicable to Other Posted Material (LAA), as set forth at Terms of Service

Endorsement

Review

Supplemented By

Referenced By

Related Stories