Next Issue
Volume 8, June
Previous Issue
Volume 8, April
 
 

Big Data Cogn. Comput., Volume 8, Issue 5 (May 2024) – 7 articles

Cover Story (view full-size image): This paper introduces a novel approach to enhance topic modelling by extending traditional outputs beyond isolated tokens and using internal textual data to extract and map high-scoring keywords directly. Unlike previous methods that rely on external language sources, this approach avoids the associated risks of unavailability and privacy issues. A comparative analysis with large language models (LLMs) shows that this method not only aligns with but often surpasses existing models by effectively bridging detailed thematic elements. Further evaluations with a variety of datasets and models confirm its superior interpretability and efficiency, as validated by independent annotators. View this paper
  • Issues are regarded as officially published after their release is announced to the table of contents alert mailing list.
  • You may sign up for e-mail alerts to receive table of contents of newly released issues.
  • PDF is the official format for papers published in both, html and pdf forms. To view the papers in pdf format, click on the "PDF Full-text" link, and use the free Adobe Reader to open them.
Order results
Result details
Select all
Export citation of selected articles as:
22 pages, 604 KiB  
Article
XplAInable: Explainable AI Smoke Detection at the Edge
by Alexander Lehnert, Falko Gawantka, Jonas During, Franz Just and Marc Reichenbach
Big Data Cogn. Comput. 2024, 8(5), 50; https://doi.org/10.3390/bdcc8050050 - 17 May 2024
Viewed by 543
Abstract
Wild and forest fires pose a threat to forests and thereby, in extension, to wild life and humanity. Recent history shows an increase in devastating damages caused by fires. Traditional fire detection systems, such as video surveillance, fail in the early stages of [...] Read more.
Wild and forest fires pose a threat to forests and thereby, in extension, to wild life and humanity. Recent history shows an increase in devastating damages caused by fires. Traditional fire detection systems, such as video surveillance, fail in the early stages of a rural forest fire. Such systems would see the fire only when the damage is immense. Novel low-power smoke detection units based on gas sensors can detect smoke fumes in the early development stages of fires. The required proximity is only achieved using a distributed network of sensors interconnected via 5G. In the context of battery-powered sensor nodes, energy efficiency becomes a key metric. Using AI classification combined with XAI enables improved confidence regarding measurements. In this work, we present both a low-power gas sensor for smoke detection and a system elaboration regarding energy-efficient communication schemes and XAI-based evaluation. We show that leveraging edge processing in a smart way combined with buffered data samples in a 5G communication network yields optimal energy efficiency and rating results. Full article
Show Figures

Figure 1

18 pages, 2027 KiB  
Article
Runtime Verification-Based Safe MARL for Optimized Safety Policy Generation for Multi-Robot Systems
by Yang Liu and Jiankun Li
Big Data Cogn. Comput. 2024, 8(5), 49; https://doi.org/10.3390/bdcc8050049 - 16 May 2024
Viewed by 403
Abstract
The intelligent warehouse is a modern logistics management system that uses technologies like the Internet of Things, robots, and artificial intelligence to realize automated management and optimize warehousing operations. The multi-robot system (MRS) is an important carrier for implementing an intelligent warehouse, which [...] Read more.
The intelligent warehouse is a modern logistics management system that uses technologies like the Internet of Things, robots, and artificial intelligence to realize automated management and optimize warehousing operations. The multi-robot system (MRS) is an important carrier for implementing an intelligent warehouse, which completes various tasks in the warehouse through cooperation and coordination between robots. As an extension of reinforcement learning and a kind of swarm intelligence, MARL (multi-agent reinforcement learning) can effectively create the multi-robot systems in intelligent warehouses. However, MARL-based multi-robot systems in intelligent warehouses face serious safety issues, such as collisions, conflicts, and congestion. To deal with these issues, this paper proposes a safe MARL method based on runtime verification, i.e., an optimized safety policy-generation framework, for multi-robot systems in intelligent warehouses. The framework consists of three stages. In the first stage, a runtime model SCMG (safety-constrained Markov Game) is defined for the multi-robot system at runtime in the intelligent warehouse. In the second stage, rPATL (probabilistic alternating-time temporal logic with rewards) is used to express safety properties, and SCMG is cyclically verified and refined through runtime verification (RV) to ensure safety. This stage guarantees the safety of robots’ behaviors before training. In the third stage, the verified SCMG guides SCPO (safety-constrained policy optimization) to obtain an optimized safety policy for robots. Finally, a multi-robot warehouse (RWARE) scenario is used for experimental evaluation. The results show that the policy obtained by our framework is safer than existing frameworks and includes a certain degree of optimization. Full article
(This article belongs to the Special Issue Field Robotics and Artificial Intelligence (AI))
Show Figures

Figure 1

14 pages, 2188 KiB  
Article
Enhanced Linear and Vision Transformer-Based Architectures for Time Series Forecasting
by Musleh Alharthi and Ausif Mahmood
Big Data Cogn. Comput. 2024, 8(5), 48; https://doi.org/10.3390/bdcc8050048 - 16 May 2024
Viewed by 391
Abstract
Time series forecasting has been a challenging area in the field of Artificial Intelligence. Various approaches such as linear neural networks, recurrent linear neural networks, Convolutional Neural Networks, and recently transformers have been attempted for the time series forecasting domain. Although transformer-based architectures [...] Read more.
Time series forecasting has been a challenging area in the field of Artificial Intelligence. Various approaches such as linear neural networks, recurrent linear neural networks, Convolutional Neural Networks, and recently transformers have been attempted for the time series forecasting domain. Although transformer-based architectures have been outstanding in the Natural Language Processing domain, especially in autoregressive language modeling, the initial attempts to use transformers in the time series arena have met mixed success. A recent important work indicating simple linear networks outperform transformer-based designs. We investigate this paradox in detail comparing the linear neural network- and transformer-based designs, providing insights into why a certain approach may be better for a particular type of problem. We also improve upon the recently proposed simple linear neural network-based architecture by using dual pipelines with batch normalization and reversible instance normalization. Our enhanced architecture outperforms all existing architectures for time series forecasting on a majority of the popular benchmarks. Full article
Show Figures

Figure 1

13 pages, 1255 KiB  
Article
International Classification of Diseases Prediction from MIMIIC-III Clinical Text Using Pre-Trained ClinicalBERT and NLP Deep Learning Models Achieving State of the Art
by Ilyas Aden, Christopher H. T. Child and Constantino Carlos Reyes-Aldasoro
Big Data Cogn. Comput. 2024, 8(5), 47; https://doi.org/10.3390/bdcc8050047 - 10 May 2024
Viewed by 566
Abstract
The International Classification of Diseases (ICD) serves as a widely employed framework for assigning diagnosis codes to electronic health records of patients. These codes facilitate the encapsulation of diagnoses and procedures conducted during a patient’s hospitalisation. This study aims to devise a predictive [...] Read more.
The International Classification of Diseases (ICD) serves as a widely employed framework for assigning diagnosis codes to electronic health records of patients. These codes facilitate the encapsulation of diagnoses and procedures conducted during a patient’s hospitalisation. This study aims to devise a predictive model for ICD codes based on the MIMIC-III clinical text dataset. Leveraging natural language processing techniques and deep learning architectures, we constructed a pipeline to distill pertinent information from the MIMIC-III dataset: the Medical Information Mart for Intensive Care III (MIMIC-III), a sizable, de-identified, and publicly accessible repository of medical records. Our method entails predicting diagnosis codes from unstructured data, such as discharge summaries and notes encompassing symptoms. We used state-of-the-art deep learning algorithms, such as recurrent neural networks (RNNs), long short-term memory (LSTM) networks, bidirectional LSTM (BiLSTM) and BERT models after tokenizing the clinical test with Bio-ClinicalBERT, a pre-trained model from Hugging Face. To evaluate the efficacy of our approach, we conducted experiments utilizing the discharge dataset within MIMIC-III. Employing the BERT model, our methodology exhibited commendable accuracy in predicting the top 10 and top 50 diagnosis codes within the MIMIC-III dataset, achieving average accuracies of 88% and 80%, respectively. In comparison to recent studies by Biseda and Kerang, as well as Gangavarapu, which reported F1 scores of 0.72 in predicting the top 10 ICD-10 codes, our model demonstrated better performance, with an F1 score of 0.87. Similarly, in predicting the top 50 ICD-10 codes, previous research achieved an F1 score of 0.75, whereas our method attained an F1 score of 0.81. These results underscore the better performance of deep learning models over conventional machine learning approaches in this domain, thus validating our findings. The ability to predict diagnoses early from clinical notes holds promise in assisting doctors or physicians in determining effective treatments, thereby reshaping the conventional paradigm of diagnosis-then-treatment care. Our code is available online. Full article
(This article belongs to the Special Issue Artificial Intelligence and Natural Language Processing)
Show Figures

Figure 1

23 pages, 1966 KiB  
Article
Imagine and Imitate: Cost-Effective Bidding under Partially Observable Price Landscapes
by Xiaotong Luo, Yongjian Chen, Shengda Zhuo, Jie Lu, Ziyang Chen, Lichun Li, Jingyan Tian, Xiaotong Ye and Yin Tang
Big Data Cogn. Comput. 2024, 8(5), 46; https://doi.org/10.3390/bdcc8050046 - 28 Apr 2024
Viewed by 678
Abstract
Real-time bidding has become a major means for online advertisement exchange. The goal of a real-time bidding strategy is to maximize the benefits for stakeholders, e.g., click-through rates or conversion rates. However, in practise, the optimal bidding strategy for real-time bidding is constrained [...] Read more.
Real-time bidding has become a major means for online advertisement exchange. The goal of a real-time bidding strategy is to maximize the benefits for stakeholders, e.g., click-through rates or conversion rates. However, in practise, the optimal bidding strategy for real-time bidding is constrained by at least three aspects: cost-effectiveness, the dynamic nature of market prices, and the issue of missing bidding values. To address these challenges, we propose Imagine and Imitate Bidding (IIBidder), which includes Strategy Imitation and Imagination modules, to generate cost-effective bidding strategies under partially observable price landscapes. Experimental results on the iPinYou and YOYI datasets demonstrate that IIBidder reduces investment costs, optimizes bidding strategies, and improves future market price predictions. Full article
(This article belongs to the Special Issue Business Intelligence and Big Data in E-commerce)
Show Figures

Figure 1

18 pages, 5253 KiB  
Review
Digital Twins for Discrete Manufacturing Lines: A Review
by Xianqun Feng and Jiafu Wan
Big Data Cogn. Comput. 2024, 8(5), 45; https://doi.org/10.3390/bdcc8050045 - 26 Apr 2024
Viewed by 710
Abstract
Along with the development of new-generation information technology, digital twins (DTs) have become the most promising enabling technology for smart manufacturing. This article presents a statistical analysis of the literature related to the applications of DTs for discrete manufacturing lines, researches their development [...] Read more.
Along with the development of new-generation information technology, digital twins (DTs) have become the most promising enabling technology for smart manufacturing. This article presents a statistical analysis of the literature related to the applications of DTs for discrete manufacturing lines, researches their development status in the areas of the design and improvement of manufacturing lines, the scheduling and control of manufacturing line, and predicting faults in critical equipment. The deployment frameworks of DTs in different applications are summarized. In addition, this article discusses the three key technologies of high-fidelity modeling, real-time information interaction methods, and iterative optimization algorithms. The current issues, such as fine-grained sculpting of twin models, the adaptivity of the models, delay issues, and the development of efficient modeling tools are raised. This study provides a reference for the design, modification, and optimization of discrete manufacturing lines. Full article
Show Figures

Figure 1

26 pages, 12425 KiB  
Article
Topic Modelling: Going beyond Token Outputs
by Lowri Williams, Eirini Anthi, Laura Arman and Pete Burnap
Big Data Cogn. Comput. 2024, 8(5), 44; https://doi.org/10.3390/bdcc8050044 - 25 Apr 2024
Viewed by 807
Abstract
Topic modelling is a text mining technique for identifying salient themes from a number of documents. The output is commonly a set of topics consisting of isolated tokens that often co-occur in such documents. Manual effort is often associated with interpreting a topic’s [...] Read more.
Topic modelling is a text mining technique for identifying salient themes from a number of documents. The output is commonly a set of topics consisting of isolated tokens that often co-occur in such documents. Manual effort is often associated with interpreting a topic’s description from such tokens. However, from a human’s perspective, such outputs may not adequately provide enough information to infer the meaning of the topics; thus, their interpretability is often inaccurately understood. Although several studies have attempted to automatically extend topic descriptions as a means of enhancing the interpretation of topic models, they rely on external language sources that may become unavailable, must be kept up to date to generate relevant results, and present privacy issues when training on or processing data. This paper presents a novel approach towards extending the output of traditional topic modelling methods beyond a list of isolated tokens. This approach removes the dependence on external sources by using the textual data themselves by extracting high-scoring keywords and mapping them to the topic model’s token outputs. To compare how the proposed method benchmarks against the state of the art, a comparative analysis against results produced by Large Language Models (LLMs) is presented. Such results report that the proposed method resonates with the thematic coverage found in LLMs and often surpasses such models by bridging the gap between broad thematic elements and granular details. In addition, to demonstrate and reinforce the generalisation of the proposed method, the approach was further evaluated using two other topic modelling methods as the underlying models and when using a heterogeneous unseen dataset. To measure the interpretability of the proposed outputs against those of the traditional topic modelling approach, independent annotators manually scored each output based on their quality and usefulness as well as the efficiency of the annotation task. The proposed approach demonstrated higher quality and usefulness, as well as higher efficiency in the annotation task, in comparison to the outputs of a traditional topic modelling method, demonstrating an increase in their interpretability. Full article
(This article belongs to the Special Issue Advances in Natural Language Processing and Text Mining)
Show Figures

Figure 1

Previous Issue
Next Issue
Back to TopTop