Publication:

A Unified Framework for Collaborative Knowledge Graph Construction, Editing, and Distribution

Loading...
Thumbnail Image

Date

2026-02-10

Published Version

Published Version

Journal Title

Journal ISSN

Volume Title

Publisher

The Harvard community has made this article openly available. Please share how this access benefits you.

Research Projects

Organizational Units

Journal Issue

Citation

Arango, Inaki. 2025. A Unified Framework for Collaborative Knowledge Graph Construction, Editing, and Distribution. Bachelors Thesis, Harvard University Engineering and Applied Sciences.

Abstract

Knowledge graphs (KGs) have emerged as a critical technology for grounding artificial intelligence systems in structured facts, offering a solution to the hallucination and relia- bility issues plaguing large language models (LLMs). Despite their utility, the infrastruc- ture required to construct, store, version, and collaboratively edit large-scale KGs remains fragmented. Previous work has addressed individual aspects of graph management but has failed to provide a unified, version-controlled ecosystem that supports the property- rich graphs required by modern applications. To address this infrastructure gap, this thesis introduces a comprehensive framework comprising four integrated systems: Optimus, a reproducible pipeline for graph construction; Diamond, a novel lossless binary com- pression format; GitGraph, a semantic version control system; and GraphEnv, an en- vironment for multi-agent collaboration. We implemented this framework to enable the end-to-end lifecycle of graph development, from initial data ingestion to downstream appli- cations. We utilized Optimus to construct OptimusKG, a biomedical KG with 192,307 nodes, 21.5M edges, and 88.6M properties, demonstrating a 56.5% reduction in build time through parallel execution. To address storage bottlenecks, we developed the Diamond algorithm, which we benchmarked against standard formats, achieving a 34×compression ratio on the popular PrimeKG dataset while preserving all node and edge properties. Fur- thermore, we formalized the theory of graph versioning by developing a three-way merge algorithm that allows for semantic, structure-aware conflict resolution, enabling true dis- tributed collaboration. Finally, we integrated these tools into GRENCE, a clinical decision support application that uses our infrastructure to ground LLM reasoning in verifiable medical data. This work establishes a robust software engineering foundation for KGs, transforming them from static artifacts into dynamic, evolving knowledge stores that can be efficiently maintained by hybrid teams of human experts and autonomous agents.

Description

Other Available Sources

Research Data

Keywords

Computer science, Statistics

Terms of Use

This article is made available under the terms and conditions applicable to Other Posted Material (LAA), as set forth at Terms of Service

Endorsement

Review

Supplemented By

Related Stories