Authors:
Genet Asefa Gesese、Jörg Waitelonis、Zongxiong Chen、Sonja Schimmler、Harald Sack
Paper:
https://arxiv.org/abs/2408.08698
Introduction
The German National Research Data Infrastructure (NFDI) is a non-profit association established to coordinate the creation of a national research data infrastructure. It encompasses 26 consortia covering a broad spectrum of scientific disciplines, including cultural sciences, social sciences, humanities, engineering, life sciences, and natural sciences. These consortia share common goals and concepts, such as their members, structure, data repositories, and services. To enhance interoperability across these consortia, the NFDICore ontology has been developed. This mid-level ontology represents metadata related to NFDI resources, including individuals, organizations, projects, and data portals. NFDICore provides mappings to a wide range of standards across different domains, such as the Basic Formal Ontology (BFO) and Schema.org, to advance knowledge representation, data exchange, and collaboration across diverse domains.
To address domain-specific research questions for each consortium, NFDICore follows a modular architecture. Examples of modular extensions include the NFDI4Culture ontology module (CTO) for cultural heritage and the NFDI-MatWerk ontology module (MWO) for materials science. This paper introduces the NFDI4DS Ontology (NFDI4DSO) for the data science domain as a domain-specific modular extension of NFDICore. The NFDI4DataScience (NFDI4DS) consortium aims to enhance the accessibility and interoperability of research data in the domains of Data Science (DS) and Artificial Intelligence (AI). The NFDI4DSO ontology is built to achieve this by linking digital artifacts and ensuring their FAIR (Findable, Accessible, Interoperable, and Reusable) accessibility, thereby fostering collaboration across various DS and AI platforms.
The NFDI4DataScience Ontology (NFDI4DSO)
NFDI4DSO is created in a modular fashion, building upon NFDICore. Similar to NFDICore, the NFDI4DSO ontology is developed using a bottom-up, iterative, user-centered approach. NFDICore comprises 51 classes, 55 object properties, 8 data properties, 18 annotation properties, and 5 SWRL rules. In NFDI4DSO, in addition to what is provided in NFDICore, 42 classes, 38 object properties, 9 data properties, and 8 SWRL rules are added. The NFDI4DSO ontology not only describes various data science artifacts but also provides information about the resources of the NFDI4DS Consortium, such as personas, consortium members, spokespersons, and task area leads.
As in NFDICore, the classes introduced in NFDI4DSO are also mapped to the top-level ontology BFO and other ontologies such as schema.org, the FaBiO ontology, and the Conference Ontology. NFDI4DSO contains various kinds of classes such as processes, roles, and independent continuants. For instance, Figure 1 depicts how NFDI4DSO represents the relationship between the independent continuant nfdi4dso:SonjaSchimmler and her spokesperson role nfdi4dso:SpokespersonRole by mapping it to BFO. By using roles and processes, NFDI4DSO enables a detailed representation of the relationship between different entities, enhancing the ontology’s level of expressivity. To support easier integration and use of less complex relations, shortcuts are also introduced to simplify the ontology by implementing easy-to-use direct shortcut properties, which can be expanded to fully-fledged BFO-compliant complex path expressions.
Ontology Implementation
The Protégé ontology editor has been used to develop and implement NFDI4DSO. Widoco has been used to create enriched and customized documentation of the ontology automatically. The stable version of the ontology NFDI4DSO v1.0.0 is available on GitHub, with the latest development version also accessible.
NFDI4DSO in Use
The NFDI4DSO is designed to form the foundation of the NFDI4DS Knowledge Graph (NFDI4DS-KG), which is currently under development. The NFDI4DS-KG consists of two main components: the Research Information Graph (RIG) and the Research Data Graph (RDG). RIG includes metadata about the NFDI4DS consortium’s resources, persons, and organizations, while the RDG encompasses content-related index data from the consortium’s heterogeneous data sources. RIG serves as the backend for the NFDI4DS web portal, facilitating interactive access and management of this data. Both RIG and RDG will be accessible and searchable via the NFDI4DS Registry platform. Additionally, the NFDI4DS consortium plans to collaborate with other NFDI consortia to further integrate domain-specific knowledge into the RDG seamlessly. Currently, the first version of the NFDI4DS-KG with RIG is publicly available. For example, to view the list of co-spokespersons of the NFDI4DS Consortium, you can either navigate through the data using SHMARQL or query it using SPARQL.
Conclusion and Future Work
This paper presents the NFDI4DS Ontology and its use for the NFDI4DS-KG that is currently under development. The ontology facilitates the representation and interoperability of data science artifacts within and outside of NFDI4DS. NFDI4DSO is built on top of the NFDICore ontology and mapped to BFO and other ontologies. In the future, there is a plan to perform extensive ontology evaluation using competency questions based on the persona definitions from the NFDI4DS consortium.
Acknowledgments
This publication was written by the NFDI consortium NFDI4DataScience in the context of the work of the association German National Research Data Infrastructure (NFDI) e.V. NFDI is financed by the Federal Republic of Germany and the 16 federal states and funded by the Federal Ministry of Education and Research (BMBF) – funding code M532701 / the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) – project number NFDI4DataScience (460234259).