SKATEBOARD: Semantic Knowledge Advanced Tool for Extraction, Browsing, Organisation, Annotation, Retrieval, and Discovery

Bernasconi, Eleonora; Di Pierro, Davide; Redavid, Domenico; Ferilli, Stefano

doi:10.3390/app132111782

Open AccessArticle

SKATEBOARD: Semantic Knowledge Advanced Tool for Extraction, Browsing, Organisation, Annotation, Retrieval, and Discovery

Department of Computer Science, University of Bari, Via E. Orabona 4, 70125 Bari, Italy

^*

Authors to whom correspondence should be addressed.

Appl. Sci. 2023, 13(21), 11782; https://doi.org/10.3390/app132111782

Submission received: 23 September 2023 / Revised: 23 October 2023 / Accepted: 26 October 2023 / Published: 27 October 2023

(This article belongs to the Special Issue Knowledge and Data Engineering)

Download

Browse Figures

Versions Notes

Abstract

:

This paper introduces Semantic Knowledge Advanced Tool for Extraction Browsing Organisation Annotation Retrieval and Discovery (SKATEBOARD), a tool designed to facilitate knowledge exploration through the application of semantic technologies. The demand for advanced solutions that streamline Knowledge Extraction, management, and visualisation, characterised by abundant information, has grown substantially in the current era. Graph-based representations have emerged as a robust approach for uncovering intricate data relationships, complementing the capabilities offered by AI models. Acknowledging the transparency and user control challenges faced by AI-driven solutions, SKATEBOARD offers a comprehensive framework encompassing Knowledge Extraction, ontology development, management, and interactive exploration. By adhering to Linked Data principles and adopting graph-based exploration, SKATEBOARD provides users with a clear view of data relationships and dependencies. Furthermore, it integrates recommendation systems and reasoning capabilities to augment the knowledge discovery process, thus introducing a serendipity effect generated by the SKATEBOARD interface exploration. This paper elucidates SKATEBOARD’s functionalities while emphasising its user-centric design. After reviewing related research, we provide an overview of the SKATEBOARD pipeline, demonstrating its capacity to bridge RDF and LPG representations. Subsequent sections delve into Knowledge Extraction and exploration, culminating in the evaluation of the tool. SKATEBOARD empowers users to make informed decisions and uncover valuable insights within their data domains, with the added dimension of serendipitous discoveries facilitated by its interface exploration capabilities.

Keywords:

Semantic Web; knowledge graph; labelled property graph; extraction; browsing; organisation; annotation; retrieval; discovery

1. Introduction

In this paper, we propose a tool for exploring knowledge, leveraging semantic technologies for Knowledge Extraction, management, and interactive visualisation. The rapid growth of information and the increasing complexity of data have highlighted the need for advanced tools that empower users to gain valuable insights and discover unexpected relationships within their domain of interest.

Graphs have emerged as a powerful means of representing information when the goal is to derive knowledge and uncover hidden patterns. Unlike traditional tabular data structures, graphs allow us to capture complex relationships between entities, making them an ideal choice for knowledge representation and discovery. Through graph-based approaches, users can navigate and explore interconnected data, unlocking valuable knowledge that might remain hidden in other representations.

Artificial Intelligence (AI) has significantly transformed research methodologies, enabling machines to process vast amounts of data, interpret them, and derive valuable insights. Notably, models like OpenAI’s Generative Pre-trained Transformer (GPT) have demonstrated the capability to generate coherent text, extracting meaningful information from raw data. However, it is essential to acknowledge that these capabilities are based on learned patterns and associations rather than transparent semantic reasoning. This highlights the critical need for transparent systems that empower users with greater control and understanding of the underlying processes.

In the realm of AI and semantic technologies, there is an evident gap when it comes to tools for transparent Knowledge Extraction and exploratory visualisation. Many existing AI-based solutions generate outputs that are not easily comprehensible or controllable by end-users, leading to concerns regarding trust and accountability.

To address this pressing issue, we present a novel framework and tool called Semantic Knowledge Advanced Tool for Extraction Browsing Organisation Annotation Retrieval and Discovery (SKATEBOARD). SKATEBOARD is designed to provide users with complete control over all stages of information extraction and manipulation, offering a transparent approach to knowledge exploration and management.

SKATEBOARD facilitates a multi-faceted approach, encompassing the extraction of relevant information, the creation of domain-specific ontologies based on the extracted data, efficient ontology management, and a powerful platform for interactive exploration. By adopting Linked Data paradigms, such as graph-based exploration, the tool offers users unparalleled transparency, allowing them to interactively navigate through information with full visibility into relationships and dependencies. Additionally, SKATEBOARD opens the door to recommendation systems and reasoning capabilities, unlocking the potential for serendipitous discoveries and novel insights.

In the following sections, we will delve into the details of SKATEBOARD’s functionalities, showcasing how this tool empowers users to harness the power of semantic technologies to unlock the full potential of their data. Through a combination of intelligent Knowledge Extraction, ontology building, and interactive exploration, SKATEBOARD represents a promising step forward in the realm of knowledge discovery and management. Its user-centric design ensures that researchers and practitioners can confidently navigate through the complexities of their domain, making informed decisions and uncovering knowledge that goes beyond their initial expectations.

The rest of the paper is organised as follows. After discussing related work in the next session, we describe in detail the SKATEBOARD pipeline in Section 3 and how it connects RDF and LPG representations in Section 4. Section 5 and Section 6 discuss the Knowledge-Extraction and -exploration functions of SKATEBOARD, respectively, while Section 7 evaluates it. Finally, Section 8 concludes the work.

2. Related Work

In this section, we will delve into tools that share common characteristics with SKATEBOARD, particularly focusing on Knowledge Extraction, information retrieval, semantic data visualisation, and the broader utilisation of semantic technologies. We will analyse these tools, comparing their functionalities and approaches, while also underlining distinctions and parallels with the SKATEBOARD system.

In the realm of Linked Data interfaces, many systems focus on the visualisation of SPARQL endpoints and, thus, Linked Data [1,2].

SPARQL endpoints are web services that provide an interface for querying and retrieving data from semantic datasets using the SPARQL query language. They are critical access points for obtaining information from Linked Data sources through SPARQL queries.

However, SKATEBOARD goes beyond this scope, thanks to its integrated API as described in Section 4 of our work. This API allows SKATEBOARD to connect to one or more SPARQL endpoints containing Linked Data or Labelled Property Graphs (LPGs). This endows SKATEBOARD with the versatility to not only visualise Linked Data in existing endpoints, but also contribute to the creation of Linked Data, addressing the challenges associated with extracting semantic knowledge from unstructured texts and semantic annotation. Furthermore, with its integration with the GraphBRAIN system [3,4] for ontology creation and management, SKATEBOARD provides a comprehensive solution for the lifecycle of Linked Data. The GraphBRAIN system provides a dedicated API that enforces ontology compliance for all interactions with the Knowledge Graph (KG). This API offers a wide range of functionalities for KG management, classified into basic and advanced operations.

The basic functionality includes standard Create, Read, Update, Delete (CRUD) operations, enabling the management of entity instances and relationship instances in the KG. For querying, the API seamlessly interfaces with the Neo4j query language Cypher, ensuring that all queries are compliant with the ontology’s constraints.

The advanced functionality of the GraphBRAIN API encompasses various analysis, mining, and reasoning functions. These include tasks such as computing centrality measures for entity instances based on various algorithms, extracting relevant subgraphs starting from specified nodes (possibly considering user profiles for personalised results), finding all possible paths in the KG between given pairs of nodes, checking the consistency of the available knowledge, and deducing or adducingknowledge that may not be explicitly present in the KG.

Notably, the GraphBRAIN API is designed to be accessible to third-party applications, allowing developers to incorporate its advanced KG management capabilities into their own systems.

This integration plays a central role in SKATEBOARD’s value proposition, delivering a comprehensive solution for overseeing the entire lifecycle of Linked Data, from ontology development to advanced query and analytical capabilities. It offers adaptability, expandability, and the capability to import or export knowledge to and from other formats, including the Semantic Web standard OWL.

It is important to highlight that, while there are numerous studies examining systems for managing and visualising Linked Data, only a handful of them simultaneously tackle the creation aspect of Linked Data. This facet, seen as both the extraction of semantic knowledge from unstructured texts and semantic annotation, represents one of the distinctive strengths of SKATEBOARD.

As highlighted in the work by Bernasconi et al. [2], Linked Data interfaces can be categorised into five key macro-characteristics:

Knowledge Extraction: This category pertains to the use of tools designed to extract knowledge from unstructured data, enabling the analysis of texts or unstructured data to transform them into a structured and semantic form.
Traditional visual-information-seeking tools: This category encompasses conventional systems that enable users to search for information using visual representations, such as search engines, digital libraries, and other visual interfaces.
Visualisation of semantic data: This macro-characteristic focuses on the use of tools to visualise, retrieve, and represent semantic data, often through visualisations that make linked data more understandable and interpretable.
Semantic annotation: This category includes tools that enable collaborative annotation of semantic data, allowing for the enrichment of data with additional information or semantic metadata.
Digital library: The digital library macro-characteristic refers to specific tools dedicated to managing and exploring a collection of digital books or documents. These tools may include advanced digital catalogues, literature search systems, and other similar resources.

2.1. Knowledge Extraction

Here, we will discuss the transformation of unstructured or semi-structured text into structured knowledge representations, focusing on advanced knowledge representation techniques. Knowledge Extraction is pivotal in managing semantic data, encompassing information extraction from unstructured sources and semantic enrichment to refine raw data into structured formats. Various tools have emerged to address Knowledge Extraction, utilising natural language processing, machine learning, and knowledge representation to create coherent and machine-interpretable knowledge representations.

The significance of Knowledge Extraction is evident in initiatives such as the standardisation of RDF extraction from relational databases and projects like the conversion of Wikipedia into structured data, exemplified by DBpedia [5] and Freebase [6]. In the following sections, we examine various tools and techniques for Knowledge Extraction and semantic enrichment within the context of Linked Data interfaces.

AIDA [7] stands out as a versatile framework and online tool designed for Named Entity Recognition (NER) and resolution. Its remarkable capability lies in connecting ambiguous references to precise canonical entities within the YAGO2 knowledge base [8]. AIDA also boasts the added feature of sense tagging and provides a wide array of customizable options.

Apache Stanbol [9] is an open-source HTTP service meticulously crafted to elevate unstructured content by infusing it with semantic annotations. It excels in generating RDF-encoded outcomes through multilingual NER and resolution. Stanbol shines in sense tagging linked to renowned knowledge bases like DBpedia and GeoNames. Moreover, it offers text span grounding, confidence assessment, and seamless support for associated imagery.

DBpedia Spotlight [10] is a specialised tool with an automatic knack for pinpointing and annotating references to DBpedia resources nestled within textual material.

Open Calais [11] is a Knowledge Extraction powerhouse, renowned for its proficiency in extracting named entities and embellishing them with sense labels, factual information, and event details. It is conveniently accessible both as a web application and a web service.

Semiosearch Wikifier [12] is a system that deftly navigates the terrain of named entity resolution, expertly matching named entities or terms to their corresponding DBpedia counterparts. This feat is achieved by employing a blend of named-entity-recognition techniques and heuristic strategies.

The GLOBDEF system [13] showcases its adaptability with the incorporation of pluggable enhancement modules. These modules can be dynamically activated, giving rise to flexible data-enhancement pipelines. Their primary mission is to enhance data quality and structure, contributing significantly to improved data utilisation.

These tools signify substantial progress in the field of Knowledge Extraction, effectively bridging the gap between unstructured text and structured knowledge representations. Notable trends include advancements in Named Entity Recognition and Linking (NERL), semantic enrichment, and support for multiple languages.

In comparison, SKATEBOARD offers a holistic Knowledge Extraction solution. It excels in transforming unstructured text into structured data, identifying entities through NER, and linking these entities to existing knowledge bases like DBpedia (NEL). SKATEBOARD further enhances the extracted data through semantic annotation and ontology-based integration via its integration with the GraphBRAIN system. While the cited tools excel in specific aspects of Knowledge Extraction, SKATEBOARD provides a comprehensive solution that encompasses the entire Knowledge Extraction pipeline, making it a powerful choice for Linked Data management. Moreover, SKATEBOARD emphasises maintenance and updates to ensure long-term usability and relevance, addressing the challenges faced by some of the other tools mentioned. As the field evolves, innovative solutions like SKATEBOARD are expected to cater to diverse Knowledge Extraction needs across various domains and applications.

2.2. Traditional Visual-Information-Seeking Tools

Traditional visual-information-seeking tools, exemplified by systems like L’ERMA (https://www.lerma.it/, accessed on 9 September 2023) and TORROSSA (https://www.torrossa.com/, accessed on 9 September 2023), have long been the cornerstone of data retrieval through visual interfaces. These tools have played a fundamental role in information access. However, a noteworthy shift occurs when we contrast their capabilities with the advanced semantic technologies integrated into the SKATEBOARD system.

Traditional tools, while reliable, often need help with delivering comprehensive search experiences, especially when dealing with unstructured data. In stark contrast, SKATEBOARD, driven by semantic technology, redefines the search landscape. Its incorporation of semantic entities facilitates refined semantic faceted browsing, empowering users to extract meaningful insights through filtered search results.

However, SKATEBOARD does more. It broadens search horizons by seamlessly integrating related entities and keywords, fostering serendipitous discovery. This allows users to stumble upon relevant content they had not explicitly sought, greatly enriching their exploration.

Furthermore, SKATEBOARD capitalises on semantic relationships, intelligently recommending related documents and establishing cross-connections between entities. This added depth and relevance redefine the search experience, a feat that traditional tools often find challenging.

While implementing semantic technologies, such as SKATEBOARD, may involve initial investments, the returns are substantial. The precision it brings to search and the serendipitous discoveries it enables make it a valuable long-term investment.

In conclusion, SKATEBOARD not only reshapes the search landscape by offering precise searches and serendipitous exploration, but also justifies its cost through the enriched user experience and enhanced search capabilities it brings to Linked Data interfaces.

2.3. Visualisation of Semantic Data

The visualisation of semantic data, stemming from document corpora, presents distinctive challenges due to its heterogeneous and dynamic nature. Conventional fixed metadata structures are inadequately equipped to manage such intricacies.

Existing research [2] has identified distinctive characteristics of tools for Linked Data visualisation, categorising them based on various interaction paradigms and types of represented information.

In terms of interaction paradigms, the following are highlighted:

Tabular interaction paradigm: This paradigm organises information about a single resource in a tabular format, enabling users to explore specific attributes, such as media files, descriptions, and links to related resources [5,14,15].

Node–link interaction paradigm: In this paradigm, resources are represented as nodes or boxes, connected by arcs, which represent relationships. Users navigate the graph by traversing these connections [16,17,18].

Visual query composition: These interfaces simplify the creation of SPARQL queries using graphical elements [19,20].

Regarding the types of represented information, the distinctions are made as follows:

Data visualisation: These tools employ graphical representations to enhance data comprehension [21].

Model visualisation: Tools in this category specialise in illustrating data models, including schemas and ontologies, offering users insights into the underlying data structures [3,22].

Data to model visualisation (schema extraction): These tools deduce ontology schemas from RDF triples using SPARQL queries [23,24].

These categories of tools provide various modes of interaction with Linked Data. However, it is essential to note that SKATEBOARD goes beyond these paradigms and categories, adopting a dynamic approach based on the selected entity type. For instance, SKATEBOARD offers interactive maps for locations, reveals details such as birth dates, birth places, and key information for individuals, and presents publication data and authors for books. Furthermore, SKATEBOARD customises its visualisation based on the entity type, allowing users to decipher underlying data models and promoting in-depth exploration of semantic information. As research continues, Linked Data interfaces are expected to become increasingly sophisticated, but SKATEBOARD is at the forefront, offering users a richer and more-intuitive experience.

2.4. Semantic Annotation

Semantic annotation tools are crucial for enriching documents with entities, classes, topics, or facts based on existing ontologies or knowledge bases. These tools typically fall into three main categories: manual, automatic, and semi-automatic approaches.

In the manual annotation approach, users are responsible for the semantic annotation of content. For example, Omeka S (https://omeka.org/s/, accessed on 8 September 2023) and SenTag [25] offer platforms for manual tagging of documents, although this approach may introduce errors due to human limitations and variations in criteria.

Automatic annotation methods leverage machine learning and natural language processing to extract semantic information with minimal user intervention. AnnoTag [26] provides automatic content annotations using entity-level analytics, while the ARCA [27,28,29] system associates unstructured content with concepts in a knowledge graph.

Semi-automatic approaches combine machine automation with human expertise. Tools like tagtog [30] and Recogito [31] facilitate collaborative text annotation, while GoNTogle [32] offers ontology-based semantic annotations for various document formats.

When comparing these annotation tools with SKATEBOARD, it is important to highlight SKATEBOARD’s integration with GraphBRAIN and its collaborative validation of automatic Knowledge Extraction. SKATEBOARD also manages the endpoint in the GraphBRAIN connected system, providing enhanced capabilities for semantic annotations. These integrations and features make SKATEBOARD a comprehensive solution for semantic annotation tasks.

2.5. Exploration of a Digital Library

Within the expansive landscape of Linked Data interfaces, a distinctive subset of tools emerges that is dedicated to fostering the exploration and dissemination of knowledge within digital libraries. These tools stand apart by focusing on the specialised domain of digital collections, seeking to enhance users’ engagement with the rich and diverse content within these repositories. Unlike generic SPARQL endpoints, which cater to a wide range of data sources, the tools within the Digital Library category are tailored to serve the unique needs of digital libraries. These tools go beyond the conventional approach of querying and retrieving data; they are designed to showcase the intellectual treasures embedded in library collections and provide users with an immersive experience that transcends mere data retrieval. By catering to digital libraries, these tools empower institutions to showcase their holdings, whether they encompass books, manuscripts, artworks, or multimedia resources. The primary objective is to facilitate the exploration and discovery of valuable insights and knowledge encapsulated within the library’s catalogue, making the library’s offerings accessible to both casual visitors and dedicated researchers. These tools transform the digital library into a dynamic and interactive space through sophisticated visualisations, intuitive interfaces, and user-centric functionalities. Users can navigate through vast repositories, uncover hidden connections, and traverse the boundaries of disciplines and periods. The tools in this category aim to democratise knowledge access, enabling users to embark on intellectual journeys tailored to their interests. Below, we delve into the intricate realm of digital libraries and the tools dedicated to enhancing the exploration of their collections. By examining the methodologies and features of these tools, we seek to shed light on their significance and impact and the unique ways they contribute to the broader landscape of Linked Data interfaces.

Yewno Discover [33] is an integrated system designed to assist scholars in their research by providing classification and visual exploration of academic materials. However, its adaptability and flexibility in diverse contexts may be limited unless customised for specific requirements. Additionally, in contrast to the proposed system, Yewno Discover utilises the Knowledge Graph (KG) structure for exploration to a lesser extent, which is a key aspect of the research questions presented here.

Sampo-UI [34] offers a comprehensive framework that includes a set of reusable and extensible components, application state management, and a read-only API for SPARQL queries. This framework facilitates the creation of user interfaces for semantic portals. Sampo employs various search paradigms, such as free-text search, faceted search, geospatial search, and temporal search, providing users with diverse ways to access search results, including tables, lists, geospatial visualisations, and temporal visualisations.

Another tool employed for exploring digital libraries is Talk to Books (https://books.google.com/talktobooks/, accessed on 22 September 2023), developed by Google. This tool allows users to explore concepts and discover books by retrieving quotes that correspond to their queries. It aims to assist users in identifying relevant books that might not be easily discoverable through conventional keyword searches. However, Talk to Books does not offer a mechanism for users to independently explore the underlying knowledge base beyond the provided quotes. Its primary focus is delivering book quotes in response to user queries rather than enabling direct exploration of the knowledge base.

ARCA [27] stands out as a versatile platform meticulously crafted for the purpose of delving into the realms of knowledge within digital libraries. This adaptable software incorporates a robust engine designed for Knowledge Extraction and semantic enrichment, paired with an intuitive interface tailored to facilitate data search and exploration. ARCA adopts the node–link visualisation approach as its central paradigm, employing a multi-level representation of information. Within this visualisation framework, the limelight is cast upon books and their associated contents, serving as a focal point for exploration. ARCA’s interaction model follows an incremental approach, allowing users to transition seamlessly from specific pieces of information to broader insights. This mode of interaction promotes serendipitous discoveries, encouraging users to stumble upon unexpected information during their exploration of the graph. To further enhance the exploration process, ARCA boasts a “trace path” component [29], which empowers users to create visual queries. This unique feature enables users to identify commonalities between two selected items, such as books sharing particular concepts or concepts shared by two books. Furthermore, ARCA seamlessly integrates an association validation system [35], facilitating collaborative endeavours aimed at enhancing data quality. This system invites users to actively contribute to the validation and refinement of associations within the digital library, fostering a collaborative approach to bolstering the accuracy and reliability of available data.

In contrast to these tools, SKATEBOARD revolutionises the landscape of semantic extraction and knowledge discovery within digital libraries through its innovative features. It presents users with a dynamic and interactive interface that seamlessly adapts to individual preferences, making exploration a breeze. SKATEBOARD harnesses the power of GraphBRAIN [3] integration, enabling users to delve deep into intricate relationships within digital library collections. Moreover, it actively supports collaborative validation of automatic Knowledge Extraction, elevating the standards of data accuracy and reliability. SKATEBOARD transcends traditional data-retrieval methodologies, offering users a more-enriching and intuitive experience for navigating the vast expanse of knowledge within digital libraries.

3. The Pipeline

The SKATEBOARD pipeline represents the framework’s core and plays a crucial role in extracting, managing, and interactively visualising knowledge based on semantic technologies. Essentially, the pipeline acts as a structured, transparent path that guides users through various key phases, ensuring an efficient and coherent workflow. In the following section, we will delve into the main stages of the SKATEBOARD pipeline (Figure 1).

3.1. Information Extraction

The pipeline’s starting point is extracting relevant information from a wide range of data sources. Users have the flexibility to specify these sources, which may include structured data, unstructured text, databases, or files in various formats. This phase is critical as it represents the initial step in acquiring the necessary knowledge base.

3.2. Preprocessing and Semantic Enrichment

After extracting the data, SKATEBOARD undergoes preprocessing to ensure consistency with the source domain and improve the extracted data’s quality. An essential aspect of this phase is NER, which identifies entities within sentences in the texts. Subsequently, NEL comes into play, disambiguating and linking these entities to existing databases or ontologies. This semantic enrichment process contributes to making the data more structured and understandable.

3.3. Ontology Creation and Management

Once the data have been extracted and prepared, SKATEBOARD sends the data to GraphBRAIN. This system allows users to create a customised ontology representing the specific domain of interest. This step allows users to define classes, properties, and relationships between objects based on the extracted data and the application’s specific requirements. This approach enables a highly personalised representation of knowledge.

After creating the ontology, GraphBRAIN provides tools for the ongoing management and maintenance of the ontology itself. Users can modify class and property definitions, add new relationships, or extend the ontology with new entities. This phase allows users to adapt the ontology dynamically based on the evolution of knowledge within the domain, ensuring an up-to-date and relevant ontology. SKATEBOARD receives real-time information from GraphBRAIN through its connection endpoint.

3.4. Connection to Multiple Endpoints

A distinctive feature of SKATEBOARD is its ability to accept connections from one or more endpoints simultaneously. In addition to interacting with GraphBRAIN for ontology creation and management, this functionality allows SKATEBOARD to interact with other systems and data sources, further expanding its scope and utility.

3.5. Visualisation and Interactive Exploration

Finally, SKATEBOARD provides an advanced platform for research, visualisation, and interactive graph exploration based on semantically enriched data. The user interface offers advanced exploration features that enable users to filter, navigate, and analyse relationships between objects in real-time. This level of interactivity allows users to explore data and discover hidden connections and patterns dynamically. Furthermore, the SKATEBOARD interface can adapt visualisations based on the selected or searched entity types, further facilitating data analysis.

In summary, the SKATEBOARD pipeline offers a comprehensive and transparent methodology for the extraction, management, and interactive visualisation of knowledge based on semantic technologies. Through the integration with GraphBRAIN for ontology creation and management, as well as the integration with RDF-LPG and a powerful exploration platform, SKATEBOARD provides users with a unique framework to extract maximum value from their knowledge and discover new sets of relevant information.

4. Integration of RDF and LPG in SKATEBOARD

Within the framework of SKATEBOARD, the harmonious integration of two fundamental data models, Resource Description Framework (RDF) and Labelled Property Graph (LPG), serves as a cornerstone for our approach. This integration empowered us to leverage the respective strengths of these data formats, significantly expanding the framework’s scope and versatility. This section delves into how SKATEBOARD combines and harnesses these two data models to enrich knowledge and enhance system adaptability.

4.1. RDF: Semantic Structure

RDF is a widely adopted data format for representing structured knowledge within the context of the World Wide Web. Its triple-based schema (subject, predicate, object) provides an ideal framework for expressing information semantically and flexibly. This schema is well suited for defining ontologies, entity relationships, and complex data.

SKATEBOARD extensively employs RDF for specifying ontologies, classes, properties, and relationships between entities. Thanks to RDF, we can create a structured and interoperable representation of knowledge, allowing users to define custom ontologies tailored to their domains. This approach offers unparalleled flexibility, enabling data to be represented in intricate detail within specific contexts.

Furthermore, RDF provides deep semantics, enabling the definition of semantic relationships between entities and attributes, thereby enhancing the comprehensibility and recognizability of knowledge. This semantic richness enables seamless integration with different knowledge bases, enriching and disambiguating knowledge across diverse sources.

4.2. LPG: The Graph Managed by GraphBRAIN

On the other hand, LPGs are highly effective data models for modelling intricate relationships between entities within a localised context. This format is rooted in nodes and edges and is particularly adept at representing graphical data, including social networks, conceptual schemas, and localised knowledge graphs.

In the SKATEBOARD context, the LPG is under the purview of GraphBRAIN, a pivotal component of our system. This graph empowers users to create and manage bespoke ontologies to represent their domains in exhaustive detail. GraphBRAIN offers advanced tools for defining classes, properties, and relationships between entities based on extracted data and specific application requisites. Consequently, users can model data comprehensively within specific contexts.

The LPGs administered by GraphBRAIN also provide formidable reasoning capabilities, enabling advanced analysis and the uncovering of concealed patterns within entity relationships. This facet is paramount for in-depth data analysis within the SKATEBOARD framework.

4.3. Usage Example: The History of Computing

To gain a deeper insight into this integration, let us delve into a practical example centred on the evolution of computing. Imagine we possess an RDF ontology defining various computer hardware classes (e.g., DBpedia). Concurrently, we employ an LPG managed by GraphBRAIN to elucidate the intricate connections and interactions between these hardware components.

Through RDF, we define fundamental classes such as “Processors” and “Memory”, establishing a foundational knowledge structure in a comprehensible and interoperable format. Simultaneously, within the LPG governed by GraphBRAIN, we painstakingly capture the nuances of interactions among specific hardware components, harnessing the reasoning capabilities inherent to LPGs.

For instance, we can designate a processor as an RDF class and leverage the GraphBRAIN interface to oversee the ontology as an LPG. This lets us visually depict how that processor interfaces with a specific RAM module or motherboard. This methodology facilitates the detailed description of relationships and definitions for each hardware component in an unprecedented manner. By adopting this approach, we embark on a journey to explore the fundamental ontology of computer hardware (RDF) while maintaining precision in delineating the connections between these components (LPG). The outcome is a holistic view of knowledge within the computer hardware.

The seamless integration of RDF and the LPG managed by GraphBRAIN in SKATEBOARD epitomises a potent and versatile approach for representing, enriching, and querying knowledge. This unique integration empowers users to interpret data within specific contexts while preserving a foundational ontological structure. In this way, SKATEBOARD serves as an encompassing platform for extracting, managing, and exploring knowledge grounded in semantic technologies, thereby providing a substantial advantage in analysing and discovering pertinent new information.

5. Knowledge Extraction in SKATEBOARD

SKATEBOARD is a powerful Knowledge Extraction tool with a direct interface to GraphBRAIN, the system specialised in ontology creation, management, and maintenance. In this section, we will explore the various phases involved in the Knowledge Extraction process using SKATEBOARD, with a particular emphasis on its integration with GraphBRAIN.

5.1. Data Identification

The initial phase of Knowledge Extraction is the identification of relevant data. In this phase, it is essential to define the domain of interest, identify relevant data sources, and establish data-acquisition methods. Accurately choosing the research domain is fundamental as it will influence both the selection of data sources and the extraction techniques employed. These domains can range from the broad world of literature to specialised fields like archaeology.

Data sources can take various forms: structured, semi-structured, or unstructured, and can be found in various repositories, including digital libraries, online encyclopedias, structured databases, semi-structured documents, and fully unstructured texts. The choice of data acquisition methods depends on the nature of the data and the source itself. For example, the use of web crawlers may be suitable for acquiring web resources, while data-mining techniques may be necessary for extracting data from databases.

Ultimately, this phase aims to acquire and prepare the data necessary for the development of the knowledge graph.

5.2. Construction of the Knowledge Graph Ontology

The next phase involves the construction of the ontology for the knowledge graph, providing a high-level structure for the knowledge graph itself. This phase is crucial when either a clearly defined domain ontology exists that can serve as the basis for the knowledge graph ontology or when working with structured data that provides a framework for ontology construction.

Building the ontology for the knowledge graph allows the definition of predefined entity types and the relationships between them. For this construction, common ontologies like FOAF or GeoNames can be employed, along with widely adopted ontology languages such as RDF(S), OWL, and XML.

A central aspect of SKATEBOARD is its integration with GraphBRAIN, enabling domain experts to manually develop and maintain the ontology, ensuring it is tailored to the specific domain requirements. Additionally, SKATEBOARD can connect to various ontology sources, including DBpedia, further enriching the available knowledge.

5.3. Knowledge Extraction

Once the data are acquired and the ontology is defined, the next step is the extraction of knowledge from the data themselves. The primary objective of this phase is to extract entities, establish relationships among them, and capture meaningful attributes.

Entity extraction involves the discovery and detection of entities across a wide range of data. SKATEBOARD utilises NER, focusing on discovering and classifying entities into predefined categories or types. Furthermore, SKATEBOARD performs NEL to connect recognised entities to relevant ontologies, such as DBpedia and GraphBRAIN.

Relation extraction is an essential step for linking entities together. This phase varies depending on the data’s nature, but employs Natural Language Processing (NLP) techniques for unstructured data. The availability of ontologies, as integrated into SKATEBOARD, allows for assigning relationships between the extracted entities based on predefined definitions.

5.4. Knowledge Processing

The subsequent phase in the Knowledge Extraction process is the processing of the extracted knowledge. The main objective in this phase is to ensure that the extracted knowledge is of high quality by eliminating ambiguity, redundancy, and incompleteness.

5.5. Knowledge Integration

Knowledge integration, also known as knowledge fusion, involves merging information from diverse sources and cleaning it to remove redundancy, contradictions, and ambiguities. This process includes data cleansing, entity resolution, and assigning unique identifiers to entities.

Data cleansing involves removing unnecessary symbols and common words, improving the overall quality of knowledge. Entity resolution is a critical step that seeks to determine if different entities refer to the same real-world objects, effectively connecting them in the knowledge graph.

5.6. Knowledge Completion

The ultimate goal of this phase is to complete and enrich the knowledge within the knowledge graph, including performing reasoning, triple validation, and optimising the knowledge graph.

Reasoning on knowledge is based on predefined rules between relationships and can utilise machine learning methods to discover new knowledge based on existing information. Triple validation ensures that only valid and relevant information is included in the knowledge graph, applying integrity constraints and other conditions.

Optimising the knowledge graph may involve removing nodes or relationships unrelated to the domain of interest, contributing to maintaining a coherent and logical structure.

In conclusion, SKATEBOARD serves as a highly versatile tool for Knowledge Extraction, which, in collaboration with GraphBRAIN and the integration with external ontologies like DBpedia, significantly streamlines the entire process of knowledge graph creation and management. This systematic approach ensures the generation of knowledge graphs suitable for a wide range of applications across various domains.

6. Interactive Graph Exploration in SKATEBOARD

The interactive graph exploration in SKATEBOARD provides an advanced interface that empowers users to interactively explore the knowledge graph. This interface has been meticulously designed with the goal of delivering an innovative exploratory experience, fostering the discovery of new information, and enabling users to gain a profound understanding of the relationships among objects within the graph.

6.1. User Journey

In Figure 2, we illustrate the interface designed for knowledge exploration. This interface plays a crucial role in facilitating the search and exploration of the extensive knowledge base to which it is connected.

The user experience begins with the search bar, where a keyword can be entered to initiate the exploration of the knowledge base. Once the search is initiated, the interface executes a specific query based on the connected endpoint. The results are then presented in a tabular list that lists all nodes in the reference knowledge graph containing the search string in their label and related nodes based on a similarity ranking.

At this point, the user can select a resource of interest and drag it to the central part of the interface, which we will refer to as the “dashboard”. Each node in the graph can be explored in two modes: through the visualisation of all primary connections, i.e., relationships closely connected to the selected node, or through a dedicated table that presents the most-relevant information related to the selected entity’s type.

A distinctive feature of the system is the presence of specific views for each type of selected entity. This allows for the visualisation of closely connected relationships, more-complex relationships, and advanced queries. For example, starting from selecting an author, you can click on a map visualisation to find all publication locations related to works published by that author.

The system profiles the user anonymously, and thanks to this profiling, it can suggest topics relevant to the user’s interests. The history of searched topics allows for tracing the user’s areas of interest and accessing previously searched topics.

An element of innovation is that the system goes beyond knowledge graphs and includes labelled property graphs such as those supported by Neo4j. This allows for visualising graphs with custom domain ontologies and integrating information from public endpoints like DBpedia with proprietary ontologies.

Another important aspect is the collaborative validation of node associations. Users can improve data quality in connected knowledge bases by contributing to the creation of high-quality content for domain experts.

In Figure 3, you can observe an example of selecting an entity of type “Expression”, such as a literary work. The right panel displays the most-pertinent information for an entity of this type. In the following figure, you can see an exploration of an entity of type “Person”, with specific information for that entity type and that specific endpoint.

The primary goal of the SKATEBOARD interface is to highlight the connected endpoint and offer domain experts a space to share knowledge and enrich their information repository through innovative information visualisation, combining visualisation paradigms for linked data and graphs: node–link, tabular, and multilevel. This approach allows for incremental exploration of connected resources and discovering paths that link graph entities, fully leveraging the potential of semantic integration and graph reasoning in LPGs.

6.2. Key Functionalities

The key functionalities enabled by the SKATEBOARD interface encompass:

Navigation;
Selection and filtering;
Guided exploration;
Multi-level incremental visualisation;
Contextual highlighting;
Collaborative validation.

Users have the flexibility to traverse the graph, employing drag and zoom functionalities for navigation. This navigation capability empowers users to delve into specific subdomains of information or undertake a comprehensive exploration of the entire graph. This approach delivers an instinctive visual experience, affording users a holistic insight into the graph and its intricate web of connections.

Users possess the capacity to pinpoint individual nodes or edges within the graph, focusing their attention on areas of particular interest. This selective approach serves to emphasise associated elements and visualise the relationships among them, facilitating the comprehension of interconnections within the graph. Furthermore, the incorporation of filtering options allows users to declutter the visual representation, zeroing in on the most-pertinent portions of the graph.

The exploration interface is further enriched with tools designed to guide users during graph exploration. Users receive recommendations to discover relevant elements or relationships, encouraging navigation toward often-overlooked information. This guided exploration is underpinned by the capabilities of recommendation and reasoning systems seamlessly integrated into the framework.

The interface offers a multi-tiered visualisation scheme, permitting users to explore the graph at varying depths. Users can seamlessly zoom in or out, transitioning between an overarching view and an intricate examination of individual nodes and their relationships. This approach facilitates the identification of concealed associations between elements and facilitates a deeper understanding of their interconnections.

The exploration interface also boasts contextual highlighting features during navigation. For instance, when a user selects a node representing an entity, related nodes are automatically highlighted. This feature aids users in identifying pivotal and pertinent relationships as they progress in their exploration.

SKATEBOARD inherits from the ARCA system [27] the capability for collaborative validation of information from the endpoints it connects to. Multiple users can interact simultaneously with the graph and validate information. This feature promotes knowledge sharing and encourages interaction among different experts, enhancing the discovery and learning process.

6.3. Proof of Concept Availability

The proof of concept for our system can be accessed at the following URL: http://digitalmind.di.uniba.it:3000 (accessed on 22 September 2023).

7. Evaluation

This comprehensive evaluation scrutinises the SKATEBOARD system’s potential in managing, retrieving, and visualising the sprawling World Literature Knowledge Graph (WLKG) [36]. The WLKG represents an extensive repository, housing an impressive compilation of 194,269 writers and their literary works, meticulously aggregated from many authoritative sources, including but not limited to Wikidata [37], Goodreads [38], Google Books [39], and Open Library (https://openlibrary.org/, accessed on 22 September 2023).

As a versatile information-management tool, SKATEBOARD has been strategically employed across a spectrum of domains. Its application has traversed the realms of historical computing, aiding in managing and navigating data relevant to the evolution of computer science. It has also facilitated the exploration of the fascinating world of archaeology, assisting researchers and enthusiasts in their quest to uncover the secrets of the past. Furthermore, SKATEBOARD has been instrumental in the fruition of the endpoint for World Literature, allowing users to delve into the rich tapestry of literary history.

We elected to focus on the WLKG for this comprehensive evaluation, harnessing SKATEBOARD’s prowess to deliver a meticulous analysis. This decision stemmed from our recognition of the profound importance of World Literature as a repository of human knowledge, culture, and creativity. By placing SKATEBOARD in the context of the WLKG, we aimed to shed light on its effectiveness, usability, and potential enhancements in an environment rich in data and diverse in user expectations.

In the following sections, we delve into the intricacies of this evaluation, exploring SKATEBOARD’s performance, user experience, and its role as an enabler of knowledge exploration within the captivating universe of World Literature.

7.1. Methodology

To conduct a comprehensive evaluation of SKATEBOARD’s effectiveness in managing, retrieving, and visualising the WLKG, we established a structured setup:

Environment configuration: We set up a dedicated evaluation environment with access to the WLKG endpoint powered by SKATEBOARD.
User profiles: We recruited a diverse group of users, including literature enthusiasts, researchers, and academics, to represent various perspectives and use cases.
Evaluation metrics: We defined Key Performance Indicators (KPIs) to measure the system’s efficiency, accuracy, and user satisfaction during information retrieval and visualisation tasks.
Test scenarios: We designed test scenarios that reflect real-world usage, including querying for authors, exploring literary works, and examining historical literary trends.
Data collection: We collected quantitative data, such as response times and query success rates, and qualitative data through user feedback and observations.

Informed by the existing body of literature on the evaluation of search tools [40], we identified multiple measures falling into two distinct categories:

Subjective self-reported measures by users: These encompass both quantitative responses on Likert scales and qualitative responses to open-ended questions provided by the users.
Objective measures: These include data such as the user interface event logs, task completion times, and the specific search terms employed.

In structuring the questionnaire and activities for user participation during the evaluation, we adopted a revised framework proposed by Kelly [41]. We identify four key segments:

Demographic information: Gathering insights into user demographics, including gender, age, and educational background.
Pre-task assessment: Exploring users’ existing knowledge or familiarity with the subject matter or the system under examination.
Post-task evaluation: Focusing on the user’s experience during specific tasks, emphasising efficiency, effectiveness, and overall satisfaction in gauging system usability.
Post-system assessment: Encompassing the user’s overarching experience and impressions when interacting with the information system.

7.1.1. Key Evaluation Criteria

Throughout the testing process, our focus was directed towards comprehensively examining the user interaction experience with the interface. Our inquiries aimed to assess several pivotal factors, each contributing to a holistic evaluation of the system’s performance in facilitating information discovery and retrieval within a digital library context. The key factors scrutinised during our evaluation included:

User satisfaction: Our investigations delved into the users’ satisfaction with the system in achieving their research objectives, particularly in discovering and retrieving information from the World Literature.
Effectiveness: We explored the system’s effectiveness in presenting information to the users, assessing how well it conveyed relevant data and content.
Support: An essential aspect of the evaluation focused on evaluating the extent to which the system supported users during their searches and explorations within the digital library.
Usefulness: Recognising the pivotal role of usefulness in determining the overall usability of a system, we assessed the system’s utility and its direct impact on user engagement and satisfaction.
Learnability: Our analysis considered how users adopted and familiarised themselves with the system, shedding light on the ease with which they navigated and harnessed its capabilities.

By rigorously evaluating these key factors, our assessment aimed to provide a holistic understanding.

7.1.2. Users

Our evaluation enlisted the participation of a diverse cohort comprising forty-five individuals carefully chosen from academic and research backgrounds. These individuals possessed a profound knowledge of the domain encapsulated within SKATEBOARD, explicitly focusing on World Literature. This eclectic mix of participants included students and established researchers, ensuring a comprehensive assessment of the system’s efficacy across varying levels of expertise and research acumen within World Literature.

By engaging individuals entrenched in the domain under consideration, our evaluation sought to garner insights from those well versed in the intricacies of World Literature, thereby contributing to a more-profound and -nuanced assessment of SKATEBOARD’s performance and its alignment with the expectations of its intended user base.

7.2. Questions

In this section, we outline the specific tasks and questions presented to the users as part of our evaluation process. These tasks were designed to comprehensively assess the performance and user experience of the SKATEBOARD system across the four key factors described in Section 7.1.1. To ensure a thorough evaluation, we employed a combination of quantitative and qualitative methods. Quantitative assessments were conducted using Likert scale questions to gather numerical ratings, while qualitative feedback was collected to gain deeper insights into user experiences.

The following tables provide a comprehensive list of user questions corresponding to each key factor. These questions were instrumental in evaluating the system’s overall performance and identifying areas for improvement. We encourage a detailed review of both the quantitative and qualitative responses to draw meaningful conclusions about the SKATEBOARD system’s effectiveness in meeting user needs.

7.2.1. User Satisfaction

Motivation: These questions were designed to assess user satisfaction with the SKATEBOARD system. USQO1 quantifies satisfaction, while USQO2 evaluates the likelihood of recommendations. USQA allows users to provide qualitative feedback on the factors affecting their satisfaction:

USQO1: How satisfied are you with the SKATEBOARD system’s ability to help you achieve your research objectives in discovering and retrieving information from World Literature? Response is a Likert scale score from 1—Not satisfied at all, 5—Very satisfied.

USQO2: How likely are you to recommend the SKATEBOARD system to a colleague or peer for their research needs? Response is a Likert scale score from 1—Not likely at all, 5—Very likely.

USQA: Describe your experience with the SKATEBOARD system in terms of user satisfaction. What aspects of the system contributed to or hindered your satisfaction? Response is open.

7.2.2. Effectiveness

Motivation: These questions aimed to evaluate the effectiveness of the SKATEBOARD system in presenting relevant information. EQO1 and EQO2 provide quantitative measures, while EQA gathers qualitative insights with specific examples:

EQO1: Rate the effectiveness of the SKATEBOARD system in presenting relevant information to you. Response is a Likert scale score from 1—Not effective at all, 5—Very effective.

EQO2: How well did the SKATEBOARD system convey relevant data and content for your research needs? Response is a Likert scale score from 1—Not effective at all, 5—Very effective.

EQA: In your own words, explain what you found effective or ineffective about the SKATEBOARD system when it came to presenting information. Provide specific examples if possible. Response is open.

7.2.3. Support

Motivation: These questions assessed the level of support users received from the SKATEBOARD system. SQO1 provides a quantitative rating; SQO2 evaluates ease of access; QA1 gathers qualitative insights on support experiences:

SQO1: Rate the level of support you received. Response is a Likert scale score from 1—Very little support, 5—Excellent support.

SQO2: Were you able to easily find help or support resources when needed while using SKATEBOARD? Response is a Likert scale score from 1—Not easy at all, 5—Very easy.

QA1: Share your thoughts on the support you received (or didn’t receive) from the SKATEBOARD system. Were there any specific instances where you felt supported or unsupported? Response is open.

7.2.4. Usefulness

Motivation: These questions evaluated the usefulness of the SKATEBOARD system:

UQO1: How useful did you find the SKATEBOARD system for your research and information needs? Response is a Likert scale score from 1—Not useful at all, 5—Extremely useful.

UQO2: To what extent did the usefulness of SKATEBOARD impact your overall engagement and satisfaction with the system? Response is a Likert scale score from 1—Not useful at all, 5—Extremely useful.

UQA: Describe the ways in which the SKATEBOARD system was useful or not useful in your research endeavours. Were there specific features or functionalities that stood out in terms of usefulness? Response is open.

7.3. Results

In this section, we present the outcomes of our evaluation, encompassing both quantitative and qualitative data.

7.3.1. Quantitative—Likert Scale Analysis

This result provides a summary of the response distributions for the Likert scale questions listed in Section 7.2 (see Figure 4), each assessing different aspects of a survey or study.

USQO1—user satisfaction: The results for this question indicated a diverse range of opinions among respondents regarding their satisfaction. While there was a notable number of respondents who selected Option 3 (neutral), suggesting a balanced sentiment, Options 4 (satisfied) and 5 (very satisfied) also received substantial endorsements. Some respondents expressed lower levels of satisfaction with Options 1 (not satisfied at all) and 2 (slightly satisfied).

USQO2—user satisfaction: Similar to the previous user satisfaction question, this one also presented a varied sentiment among respondents. Options 3 (neutral), 4 (satisfied), and 5 (very satisfied) received significant responses, indicating a range of positive sentiments. However, Options 1 (not satisfied at all) and 2 (slightly satisfied) also had their representations among the responses, suggesting diversity in satisfaction levels.

EQO1—effectiveness: When assessing the effectiveness of the system, a notable number of respondents selected Options 3 (neutral) and 4 (satisfied). However, there was also the presence of Options 1 (not effective at all) and 2 (slightly effective) in the responses, indicating varying views on the system’s effectiveness.

EQO2—effectiveness: Similar to the previous effectiveness question, this one also presented a mix of opinions. Option 3 (neutral) was prominent, but there was also a range of responses in Options 2 (slightly effective) and 4 (satisfied). This suggests that respondents had diverse perceptions of the system’s effectiveness.

SQO1—support: The results for this question on support indicated a concentration of responses in Option 3 (neutral), signifying a somewhat balanced sentiment. However, there were notable numbers of respondents who selected Options 4 (good support) and 5 (excellent support), indicating positive views of the support received.

SQO2—support: Similar to the previous support question, this one also showed a concentration in Option 3 (neutral) and a range of opinions in Options 2 (slightly dissatisfied) and 4 (good support). This indicates that respondents had varying perceptions of the support provided.

UQO1—usefulness: When evaluating usefulness, a substantial number of respondents chose Options 3 (useful) and 4 (very useful). However, Options 1 (not useful at all) and 2 (slightly useful) also received responses, suggesting varying levels of perceived usefulness.

UQO2—usefulness: Like the previous question on usefulness, this one also displayed positive sentiments with a concentration in Options 3 (useful) and 4 (very useful). Options 1 (not useful at all) and 2 (slightly useful) were also represented among the responses.

LQO1—learnability: For the learnability question, the majority of respondents selected Options 3 (neutral), 4 (easy to learn), and 5 (very easy to learn), indicating a positive sentiment towards the system’s learnability. However, Options 1 (difficult to learn) and 2 (slightly difficult to learn) also received responses.

LQO2—learnability: Similar to the previous question on learnability, this one also showed a preference for Options 3 (neutral), 4 (easy to learn), and 5 (very easy to learn). Options 1 (difficult to learn) and 2 (slightly difficult to learn) had fewer representations in the responses.

In analysing the Likert scale responses across various key factors, it was evident that the opinions and sentiments of the respondents varied significantly. These diverse perspectives shed light on the multifaceted nature of user experiences with the SKATEBOARD system.

For user satisfaction (USQO1 and USQO2), there was a mix of responses ranging from neutral to highly satisfied, reflecting the complexity of user sentiments. Similarly, when evaluating effectiveness (EQO1 and EQO2), support (SQO1 and SQO2), usefulness (UQO1 and UQO2), and learnability (LQO1 and LQO2), we observed a wide range of responses, signifying that users perceived these aspects differently.

This variability in responses underscored the importance of taking into account the diverse needs and preferences of users. It also highlighted potential areas for improvement and optimisation within the SKATEBOARD system to better cater to the varying requirements of its user base.

Overall, these Likert scale results provided valuable insights into user satisfaction, system effectiveness, support, usefulness, and learnability. Understanding these diverse perspectives is essential for enhancing the SKATEBOARD system and ensuring it meets the needs of its users effectively.

7.3.2. Qualitative—Topic Modelling Analysis

In the process of analysing qualitative open-ended responses, we carefully examined users’ input using two distinct approaches: sentiment analysis and topic analysis related to the key factors associated with the qualitative responses. These two methods provided us with a deeper understanding of users’ opinions and perceptions.

Regarding sentiment analysis, we employed the NLTK SentimentIntensityAnalyzer to assess the overall sentiment expressed in the responses. This allowed us to gauge whether responses were generally positive, negative, or neutral. The results of this analysis were used to determine users’ levels of satisfaction or dissatisfaction with the subject under consideration.

For the topic analysis related to key factors, we leveraged the Latent Dirichlet Allocation (LDA) model to identify key themes within the responses. Each key factor was associated with a specific set of representative keywords. Using the LDA model, we identified relevant topics within the responses and calculated the percentage of each topic that aligned with the key factor users were asked to evaluate, as you can see in Figure 5 related to SQA and UQA, Figure 6 for USQA and EQA, Figure 7 for LQA. This allowed us to understand which themes were most relevant to each aspect being assessed.

To visually represent the results intuitively, we created a word cloud for each key factor, as you can see in Figure 5, Figure 6 and Figure 7. The word clouds display the most-frequent and -representative words associated with each key factor, providing an immediate visualisation of the main themes emerging from the responses. Additionally, we generated a chart illustrating the distribution of topics corresponding to each key factor. This visual representation provides an overview of the relationship between user responses and the key factors we were evaluating.

To emphasise the correlation between user responses and these key factors, we leveraged the average sentiment score. This aggregated score offers a comprehensive insight into the sentiment expressed in the responses associated with that specific factor.

By employing this approach, we were able to obtain a detailed and holistic perspective on user opinions. This enabled us to identify primary areas of interest and gauge the alignment of responses with the key factors we were investigating.

8. Conclusions

The evaluation of the SKATEBOARD system, as presented in this study, provided a holistic understanding of its performance, user satisfaction, and areas for improvement. Our analysis encompassed both quantitative Likert scale responses and qualitative open-ended feedback, offering a comprehensive view of user perspectives.

Quantitatively, we observed a diverse range of opinions among users across various key factors. User satisfaction, system effectiveness, support quality, usefulness, and learnability were assessed, and responses spanned the spectrum from highly satisfied to neutral to less satisfied. This diversity of opinions underscored the multifaceted nature of user experiences with the SKATEBOARD system.

Qualitatively, we delved deeper into user feedback, employing sentiment analysis and topic modelling. This analysis revealed nuanced insights into user sentiments and key themes within their responses. Word clouds and topic distribution charts visually represented the most-prominent themes and sentiments associated with each key factor, providing valuable context to the quantitative findings.

In conclusion, the evaluation of the SKATEBOARD system yielded valuable insights that can guide its future development and refinement. The diverse perspectives and feedback from users serve as a foundation for making informed decisions and enhancing the system to better meet the varying needs and preferences of its user base.

8.1. Discussion

As we look ahead, it is paramount to address the areas accentuated by users, be it in terms of user interface design, system performance, customisation options, training and documentation, integration capabilities, or feature requests. Through the active assimilation of user feedback and a steadfast commitment to the ongoing enhancement of the SKATEBOARD system, we can ensure its success in delivering a gratifying and efficacious user experience. This iterative approach to system development aligns seamlessly with the aspiration to not merely meet, but surpass user expectations, culminating in a more-resilient and user-centric SKATEBOARD system.

8.2. Future Directions

The evaluation of the SKATEBOARD system provided valuable insights into its current performance and user satisfaction. To further enhance the system and ensure its continued effectiveness, several key areas deserve attention:

Performance optimisation: Ensuring the system’s performance is vital for user satisfaction, particularly for tasks that require responsiveness and efficiency. Future work should focus on optimising the system’s performance by addressing issues related to lags, delays, and resource utilisation. Regular performance monitoring and fine-tuning are essential.

Integration and interoperability: Improving the system’s integration capabilities with other tools and systems is crucial. Future development should ensure seamless interoperability, allowing users to integrate the SKATEBOARD system effortlessly into their existing workflows.

Feature expansion and innovation: User feedback and feature requests should be carefully considered and prioritised. Future development efforts should focus on expanding the system’s features based on user needs and emerging industry trends, promoting continuous innovation and maintaining competitiveness.

By focusing on these core areas and embracing these future directions, the SKATEBOARD system can evolve into a more user-centric, efficient, and adaptable platform. A commitment to continuous improvement and meeting user needs will ensure the system’s long-term success and solidify its role as a valuable tool for its users.

Furthermore, this paper emphasised the pivotal role of graphs in representing information and their suitability for knowledge representation and discovery. Additionally, it underscores the transformative impact of artificial intelligence, specifically AI models like OpenAI’s GPT, in enabling semantically meaningful reasoning from data.

Acknowledging these advancements, it is crucial to recognise a substantial gap in the current landscape, where many AI-driven solutions lack transparency and user control. To bridge this gap effectively, SKATEBOARD is introduced as a comprehensive framework and tool that empowers users throughout the information-extraction and -manipulation process. It offers a multi-faceted approach, encompassing information extraction, ontology creation, ontology management, and interactive exploration, all guided by Linked Data principles and a graph-based exploration paradigm.

SKATEBOARD’s unwavering commitment to transparency ensures that users can seamlessly navigate information, gaining a clear understanding of relationships and dependencies. Furthermore, the tool introduces the potential for recommendation systems and reasoning capabilities, fostering serendipitous discoveries and the emergence of novel insights. This comprehensive approach positions SKATEBOARD as a pioneering solution in addressing the challenges posed by the evolving knowledge exploration and discovery landscape.

Author Contributions

Conceptualisation, E.B., D.D.P., S.F. and D.R.; methodology, E.B., D.D.P., S.F. and D.R.; software, E.B.; validation, E.B., D.D.P., S.F. and D.R.; formal analysis, E.B., D.D.P., S.F. and D.R.; investigation, E.B.; resources, E.B., D.D.P., S.F. and D.R.; data curation, E.B., D.D.P. and D.R.; writing—original draft preparation, E.B., D.D.P., S.F. and D.R.; writing—review and editing, E.B., D.D.P., S.F. and D.R.; visualisation, E.B.; supervision, S.F.; project administration, S.F.; funding acquisition, S.F. All authors have read and agreed to the published version of the manuscript.

Funding

This research was partially supported by the projects Future AI Research (PE00000013), spoke 6 (FAIR) Symbiotic AI, and Cultural Heritage Active innovation for Next-GEn Sustainable society (CHANGES) (PE00000020), Spoke 3 (Digital Libraries, Archives and Philology), under the NRRP MUR program funded by the NextGenerationEU.

Informed Consent Statement

Informed consent was obtained from all subjects involved in the study.

Data Availability Statement

The entire dataset collected during the analysis of the presented Linked Data interfaces in this survey is available on the website: http://digitalmind.di.uniba.it:3000/ (accessed on 22 September 2023).

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:

AI	Artificial Intelligence
AVG	Average Sentiment Scores
EQA	Effectiveness—Qualitative Question
EQO1	Effectiveness—Quantitative Question 1
EQO2	Effectiveness—Quantitative Question 2
GPT	Generative Pre-trained Transformer
KG	Knowledge Graph
LDA	Latent Dirichlet Allocation
LD	Linked Data
LQA	Learnability—Qualitative Question
LQO1	Learnability—Quantitative Question 1
LQO2	Learnability—Quantitative Question 2
LPG	Labelled Property Graph
NEL	Named Entity Linking
NER	Named Entity Recognition
NERL	Named Entity Recognition and Linking
NLP	Natural Language Processing
RDF	Resource Description Framework
SKATEBOARD	Semantic Knowledge Advanced Tool for Extraction Browsing
	Organisation Annotation Retrieval and Discovery
SQO1	Support—Quantitative Question 1
SQO2	Support—Quantitative Question 2
SQA	Support—Qualitative Question
UQA	Usefulness—Qualitative Question
UQO1	Usefulness—Quantitative Question 1
UQO2	Usefulness—Quantitative Question 2
USQA	User Satisfaction—Qualitative Question
USQO1	User Satisfaction—Quantitative Question 1
USQO2	User Satisfaction—Quantitative Question 2
WLKG	World Literature Knowledge Graph

References

Bizer, C.; Heath, T.; Berners-Lee, T. Linked data-the story so far. In Linking the World’s Information: Essays on Tim Berners-Lee’s Invention of the World Wide Web; Association for Computing Machinery: New York, NY, USA, 2023; pp. 115–143. [Google Scholar]
Bernasconi, E.; Ceriani, M.; Pierro, D.D.; Ferilli, S.; Redavid, D. Linked Data Interfaces: A Survey. Information 2023, 14, 483. [Google Scholar] [CrossRef]
Ferilli, S.; Redavid, D. The GraphBRAIN system for knowledge graph management and advanced fruition. In Proceedings of the Foundations of Intelligent Systems: 25th International Symposium, ISMIS 2020, Graz, Austria, 23–25 September 2020; Springer: Berlin/Heidelberg, Germany, 2020; pp. 308–317. [Google Scholar]
Ferilli, S. Integration Strategy and Tool between Formal Ontology and Graph Database Technology. Electronics 2021, 10, 2616. [Google Scholar] [CrossRef]
Auer, S.; Bizer, C.; Kobilarov, G.; Lehmann, J.; Cyganiak, R.; Ives, Z. Dbpedia: A nucleus for a web of open data. In Proceedings of the International Semantic Web Conference, Busan, Korea, 11–15 November 2007; Springer: Berlin/Heidelberg, Germany, 2007; pp. 722–735. [Google Scholar]
Bollacker, K.; Evans, C.; Paritosh, P.; Sturge, T.; Taylor, J. Freebase: A collaboratively created graph database for structuring human knowledge. In Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data, Vancouver, BC, Canada, 9–12 June 2008; pp. 1247–1250. [Google Scholar]
Hoffart, J.; Yosef, M.; Bordino, I.; Fürstenau, H.; Pinkal, M.; Spaniol, M.; Taneva, B.; Thater, S.; Weikum, G. Robust disambiguation of named entities in text. In Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing, Edinburgh, UK, 27–31 July 2011; pp. 782–792. [Google Scholar]
Hoffart, J.; Suchanek, F.M.; Berberich, K.; Weikum, G. YAGO2: A spatially and temporally enhanced knowledge base from Wikipedia. Artif. Intell. 2013, 194, 28–61. [Google Scholar] [CrossRef]
Sinaci, A.A.; Gonul, S. Semantic content management with apache stanbol. In Proceedings of the The Semantic Web: ESWC 2012 Satellite Events: ESWC 2012 Satellite Events, Heraklion, Crete, Greece, 27–31 May 2012; pp. 371–375. [Google Scholar]
Mendes, P.N.; Jakob, M.; García-Silva, A.; Bizer, C. DBpedia spotlight: Shedding light on the web of documents. In Proceedings of the 7th International Conference on Semantic Systems, Graz, Austria, 7–9 September 2011; pp. 1–8. [Google Scholar]
Butuc, M. Semantically Enriching Content Using OpenCalais. Editia 2009, 9, 77–88. [Google Scholar]
Gangemi, A. A comparison of Knowledge Extraction tools for the Semantic Web. In Proceedings of the Extended Semantic Web Conference, Montpellier, France, 26–30 May 2013; pp. 351–366. [Google Scholar]
Nisheva-Pavlova, M.; Alexandrov, A. GLOBDEF: A framework for dynamic pipelines of semantic data enrichment tools. In Proceedings of the Metadata and Semantic Research: 12th International Conference, MTSR 2018, Limassol, Cyprus, 23–26 October 2018; pp. 159–168. [Google Scholar]
Berners-Lee, T.; Chen, Y.; Chilton, L.; Connolly, D.; Dhanaraj, R.; Hollenbach, J.; Lerer, A.; Sheets, D. Tabulator: Exploring and analysing linked data on the Semantic Web. In Proceedings of the 3rd International Semantic Web User Interaction Workshop, Athens, GA, USA, 6 November 2006; Volume 2006, p. 159. [Google Scholar]
Berners-Lee, T.; Hollenbach, J.; Lu, K.; Presbrey, J.; Prud’ommeaux, E.; Schraefel, M. Tabulator Redux: Browsing and Writing Linked Data; University of Southampton: Southampton, UK, 2008. [Google Scholar]
Nuzzolese, A.; Presutti, V.; Gangemi, A.; Musetti, A.; Ciancarini, P. Aemoo: Exploring knowledge on the web. In Proceedings of the 5th Annual ACM Web Science Conference, Paris France, 2–4 May 2013; pp. 272–275. [Google Scholar]
Micsik, A.; Tóth, Z.; Turbucz, S. LODmilla: Shared visualization of linked open data. In Proceedings of the Theory and Practice of Digital Libraries—TPDL 2013 Selected Workshops, Valletta, Malta, 22–26 September 2013; pp. 89–100. [Google Scholar]
Viola, F.; Roffia, L.; Antoniazzi, F.; D’Elia, A.; Aguzzi, C.; Salmon Cinotti, T. Interactive 3D exploration of RDF graphs through semantic planes. Future Internet 2018, 10, 81. [Google Scholar] [CrossRef]
Ceriani, M.; Bottoni, P. SparqlBlocks: Using blocks to design structured linked data queries. J. Vis. Lang. Sentient Syst. 2017, 1, 11. [Google Scholar] [CrossRef]
Haag, F.; Lohmann, S.; Ertl, T. SPARQLFilterFlow: SPARQL query composition for everyone. In Proceedings of the Extended Semantic Web Conference (ESWC), Crete, Greece, 25–29 May 2014; pp. 362–367. [Google Scholar] [CrossRef]
Marie, N.; Gandon, F.; Ribiere, M.; Rodio, F. Discovery hub: On-the-fly linked data exploratory search. In Proceedings of the 9th International Conference on Semantic Systems, ACM, Graz, Austria, 4–6 September 2013; pp. 17–24. [Google Scholar]
Mouromtsev, D.; Pavlov, D.; Emelyanov, Y.; Morozov, A.; Razdyakonov, D.; Galkin, M. The Simple Web-based Tool for Visualization and Sharing of Semantic Data and Ontologies. In Proceedings of the ISWC 2015 Posters & Demonstrations Track Co-Located with the 14th International Semantic Web Conference (ISWC-2015), Bethlehem, PA, USA, 11 October 2015. [Google Scholar]
Anutariya, C.; Dangol, R. VizLOD: Schema Extraction And Visualization Of Linked Open Data. In Proceedings of the 15th International Joint Conference on Computer Science and Software Engineering (JCSSE), Salaya, Thailand, 11–13 July 2018; pp. 1–6. [Google Scholar] [CrossRef]
Weise, M.; Lohmann, S.; Haag, F. Ld-vowl: Extracting and visualizing schema information for linked data. In Proceedings of the 2nd International Workshop on Visualization and Interaction for Ontologies and Linked Data, Kobe, Japan, 17–21 October 2016; pp. 120–127. [Google Scholar]
Loreggia, A.; Mosco, S.; Zerbinati, A. Sentag: A web-based tool for semantic annotation of textual documents. In Proceedings of the 36th AAAI Conference on Artificial Intelligence, Virtual Event, 22 February–1 March 2022. [Google Scholar]
Kumar, A.; Spaniol, M. Annotag: Concise content annotation via lod tags derived from entity-level analytics. In Proceedings of the Linking Theory and Practice of Digital Libraries, Virtual Event, 13–17 September 2021; pp. 175–180. [Google Scholar]
Bernasconi, E.; Ceriani, M.; Mecella, M. Academic Research Creativity Archive (ARCA). In Proceedings of the International Conference on Research Challenges in Information Science, Limassol, Cyprus, 11–14 May 2021; pp. 713–714. [Google Scholar]
Ceriani, M.; Bernasconi, E.; Mecella, M. A streamlined pipeline to enable the semantic exploration of a bookstore. In Proceedings of the Digital Libraries: The Era of Big Data and Data Science: 16th Italian Research Conference on Digital Libraries—IRCDL 2020, Bari, Italy, 30–31 January 2020; pp. 75–81. [Google Scholar]
Bernasconi, E.; Ceriani, M.; Mecella, M.; Catarci, T.; Capanna, C.; Di Fazio, C.; Marcucci, R.; Pender, E.; Petriccione, F. ARCA. semantic exploration of a bookstore. In Proceedings of the International Conference on Advanced Visual Interfaces, Salerno, Italy, 28 September–2 October 2020; pp. 1–3. [Google Scholar]
Cejuela, J.M.; McQuilton, P.; Ponting, L.; Marygold, S.J.; Stefancsik, R.; Millburn, G.H.; FlyBase Consortium. tagtog: Interactive and text-mining-assisted annotation of gene mentions in PLOS full-text articles. Database 2014, 2014, bau033. [Google Scholar] [CrossRef] [PubMed]
Simon, R.; Barker, E.; Isaksen, L.; de Soto Cañamares, C.P. Linked Data Annotation Without the Pointy Brackets: Introducing Recogito 2. J. Map Geogr. Libr. 2017, 13, 111–132. [Google Scholar] [CrossRef]
Giannopoulos, G.; Bikakis, N.; Dalamagas, T.; Sellis, T. GoNTogle: A tool for semantic annotation and search. In Proceedings of the Extended Semantic Web Conference, Crete, Greece, 30 May–3 June 2010; Springer: Berlin/Heidelberg, Germany, 2010; pp. 376–380. [Google Scholar]
Bolina, M. Yewno Discover. Nord. J. Inf. Lit. High. Educ. 2019, 11. [Google Scholar] [CrossRef]
Ikkala, E.; Hyvönen, E.; Rantala, H.; Koho, M. Sampo-UI: A full stack JavaScript framework for developing semantic portal user interfaces. Semant. Web 2021, 13, 69–84. [Google Scholar] [CrossRef]
Bernasconi, E.; Ceriani, M.; Mecella, M.; Morvillo, A. Automatic Knowledge Extraction from a Digital Library and Collaborative Validation. In Proceedings of the International Conference on Theory and Practice of Digital Libraries, Padua, Italy, 20–23 September 2022; Springer: Padua, Italy, 2022; pp. 480–484. [Google Scholar] [CrossRef]
Stranisci, M.A.; Bernasconi, E.; Patti, V.; Ferilli, S.; Ceriani, M.; Damiano, R. The World Literature Knowledge Graph. arXiv 2023, arXiv:2307.16659. [Google Scholar]
Van Veen, T. Wikidata. Inf. Technol. Libr. 2019, 38, 72–81. [Google Scholar] [CrossRef]
Thelwall, M.; Kousha, K. Goodreads: A social network site for book readers. J. Assoc. Inf. Sci. Technol. 2017, 68, 972–983. [Google Scholar] [CrossRef]
Samuelson, P. Google Book Search and the future of books in cyberspace. Minn. Law Rev. 2009, 94, 1308. [Google Scholar]
O’Brien, H.L.; McCay-Peet, L. Asking “good” questions: Questionnaire design and analysis in interactive information retrieval research. In Proceedings of the 2017 Conference on Conference Human Information Interaction and Retrieval, Oslo, Norway, 7–11 March 2017; pp. 27–36. [Google Scholar]
Kelly, D. Methods for evaluating interactive information retrieval systems with users. Found. Trends Inf. Retr. 2009, 3, 1–224. [Google Scholar] [CrossRef]

Figure 1. SKATEBOARD pipeline.

Figure 2. SKATEBOARD interface—Part 1.

Figure 3. SKATEBOARD interface—Part 2.

Figure 4. Likert scale of results.

Figure 5. Support (qualitative question) and usefulness (qualitative question)—average sentiment scores, wordcloud, and LDA.

Figure 6. User satisfaction (qualitative question) and effectiveness (qualitative question)—average sentiment scores, wordcloud, and LDA.

Figure 7. Learnability (qualitative question)—average sentiment scores, wordcloud, and LDA.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Bernasconi, E.; Di Pierro, D.; Redavid, D.; Ferilli, S. SKATEBOARD: Semantic Knowledge Advanced Tool for Extraction, Browsing, Organisation, Annotation, Retrieval, and Discovery. Appl. Sci. 2023, 13, 11782. https://doi.org/10.3390/app132111782

AMA Style

Bernasconi E, Di Pierro D, Redavid D, Ferilli S. SKATEBOARD: Semantic Knowledge Advanced Tool for Extraction, Browsing, Organisation, Annotation, Retrieval, and Discovery. Applied Sciences. 2023; 13(21):11782. https://doi.org/10.3390/app132111782

Chicago/Turabian Style

Bernasconi, Eleonora, Davide Di Pierro, Domenico Redavid, and Stefano Ferilli. 2023. "SKATEBOARD: Semantic Knowledge Advanced Tool for Extraction, Browsing, Organisation, Annotation, Retrieval, and Discovery" Applied Sciences 13, no. 21: 11782. https://doi.org/10.3390/app132111782

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

SKATEBOARD: Semantic Knowledge Advanced Tool for Extraction, Browsing, Organisation, Annotation, Retrieval, and Discovery

Abstract

1. Introduction

2. Related Work

2.1. Knowledge Extraction

2.2. Traditional Visual-Information-Seeking Tools

2.3. Visualisation of Semantic Data

2.4. Semantic Annotation

2.5. Exploration of a Digital Library

3. The Pipeline

3.1. Information Extraction

3.2. Preprocessing and Semantic Enrichment

3.3. Ontology Creation and Management

3.4. Connection to Multiple Endpoints

3.5. Visualisation and Interactive Exploration

4. Integration of RDF and LPG in SKATEBOARD

4.1. RDF: Semantic Structure

4.2. LPG: The Graph Managed by GraphBRAIN

4.3. Usage Example: The History of Computing

5. Knowledge Extraction in SKATEBOARD

5.1. Data Identification

5.2. Construction of the Knowledge Graph Ontology

5.3. Knowledge Extraction

5.4. Knowledge Processing

5.5. Knowledge Integration

5.6. Knowledge Completion

6. Interactive Graph Exploration in SKATEBOARD

6.1. User Journey

6.2. Key Functionalities

6.3. Proof of Concept Availability

7. Evaluation

7.1. Methodology

7.1.1. Key Evaluation Criteria

7.1.2. Users

7.2. Questions

7.2.1. User Satisfaction

7.2.2. Effectiveness

7.2.3. Support

7.2.4. Usefulness

7.3. Results

7.3.1. Quantitative—Likert Scale Analysis

7.3.2. Qualitative—Topic Modelling Analysis

8. Conclusions

8.1. Discussion

8.2. Future Directions

Author Contributions

Funding

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI