Graph Embeddings for Harvard Widener Library Data
Open/View Files
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
In this project, I consider complications with retrieval of resources in a database, particularly library entries and a catalog, and consider how graph machine learning can aid in that retrieval pro- cess. Typical retrieval systems are exclusively text-based. This means that they will rely on natural language models only when making determinations about which resources might be relevant for the user. Natural language data is indispensable and at the core of how search engines operate. One can embed the query and all entries in a coordinate space and then find the closest entries based on some distance metric to get the most relevant search results. However, limiting the model to just this natural language data is letting go to waste one of the most powerful tools for understanding the relationships between entries: keywords.
By including keywords in the model, we can understand the hierarchical nature of the relationships between entries and keywords and among the keywords themselves, and this gives us additional information about the extent to which entries might be related when the text might be deceiving at face value. This introduces a new complication, however, as the keywords and entries are related in a graph structure. The keywords and entries are nodes and relations indicate the connection between an entry and a keyword that describes that entry. Since computers cannot look at a graph and make sense of it like humans can, we must use graph machine learning techniques to embed the graph into a coordinate space. Then, we can use the coordinates to make conclusions about the relatedness of entries and the relevance of search results. This gives use a new way to approach retrieval and can be paired with the common natural language model approach to enhance retrieval and return more relevant results to the user.