Creating and exploiting the intrinsically disordered protein knowledge graph (IDP-KG)

Gray, Alasdair and Papadopoulos, Petros and Asif, Imran and Micetic, Ivan and Hatos, Andras (2022) Creating and exploiting the intrinsically disordered protein knowledge graph (IDP-KG). CEUR Workshop Proceedings, 3127. pp. 11-18. ISSN 1613-0073

[thumbnail of Gray-etal-SWATHCLS-2022-Creating-and-exploiting-the-intrinsically-disordered-protein]
Preview
Text. Filename: Gray-etal-SWATHCLS-2022-Creating-and-exploiting-the-intrinsically-disordered-protein.pdf
Final Published Version
License: Creative Commons Attribution 4.0 logo

Download (265kB)| Preview

Abstract

There are many data sources containing overlapping information about Intrinsically Disordered Proteins (IDP). IDPcentral aims to be a registry to aid the discovery of data about proteins known to be intrinsically disordered by aggregating the content from these sources. Traditional ETL approaches for populating IDPcentral require the API and data model of each source to be wrapped and then transformed into a common model. In this paper, we investigate using Bioschemas markup as a mechanism to populate the IDPcentral registry by constructing the Intrinsically Disordered Protein Knowledge Graph (idp-kg). Bioschemas markup is a machine-readable, lightweight representation of the content of each page in the site that is embedded in the HTML. For any site it is accessible through a HTTP request. We harvest the Bioschemas markup in three IDP sources and show the resulting idp-kg has the same breadth of proteins available as the original sources, and can be used to gain deeper insight into their content by querying them as a single, consolidated knowledge graph.