Second Edition of Predictive Analytics and Data Science

A special issue of Information (ISSN 2078-2489). This special issue belongs to the section "Information and Communications Technology".

Deadline for manuscript submissions: 30 June 2024 | Viewed by 10327

Special Issue Editors


E-Mail
Guest Editor
Department of Computer Science and Systems Technology, University of Pannonia, 8200 Veszprém, Hungary
Interests: artificial intelligence; machine learning; data mining; health informatics; network analysis
Special Issues, Collections and Topics in MDPI journals

E-Mail Website
Guest Editor
MTA-PE Lendület Complex Systems Monitoring Research Group, Department of Process Engineering, University of Pannonia, H-8200 Veszprém, Hungary
Interests: chemical engineering; complex systems; computational intelligence; network science; process engineering
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues,

The development and maintenance of predictive-data-driven models poses several challenges, such as feature selection, model structure optimisation, sensitivity analysis, model validation, model maintenance, transfer learning and adaptation, model deployment, and evaluation of the benefit of the application of the models.

This Special Issue solicits papers covering the development, validation, application, and maintenance of predictive analytics models and presenting real-life applications. The potential topics include, but are not limited to:

  • Classification-based prediction models;
  • Regression-based prediction models;
  • Forecast using deep learning methods and algorithms;
  • Managing the uncertainty and missing data in forecast;
  • The life cycle of predictive models, and maintaining predictive models;
  • Development and validation of online predictive models;
  • Self-learning predictive models;
  • Predictive analytics in Industry 4.0 (application of sensors, historical experience);
  • Predictive analysis in healthcare and economy (e.g., patient pathway prediction, predicting complications, customer relationship management, risk reduction, churn prevention, market trend and analysis, credit scoring);
  • Social media and text-analysis-based predictive models and systems.

Dr. Agnes Vathy-Fogarassy
Prof. Dr. János Abonyi
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Information is an international peer-reviewed open access monthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 1600 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • classification
  • regression
  • deep learning
  • uncertainty
  • validation and maintenance
  • self-learning
  • real-life applications

Related Special Issue

Published Papers (6 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

38 pages, 1413 KiB  
Article
Advanced Machine Learning Techniques for Predictive Modeling of Property Prices
by Kanchana Vishwanadee Mathotaarachchi, Raza Hasan and Salman Mahmood
Information 2024, 15(6), 295; https://doi.org/10.3390/info15060295 - 22 May 2024
Viewed by 380
Abstract
Real estate price prediction is crucial for informed decision making in the dynamic real estate sector. In recent years, machine learning (ML) techniques have emerged as powerful tools for enhancing prediction accuracy and data-driven decision making. However, the existing literature lacks a cohesive [...] Read more.
Real estate price prediction is crucial for informed decision making in the dynamic real estate sector. In recent years, machine learning (ML) techniques have emerged as powerful tools for enhancing prediction accuracy and data-driven decision making. However, the existing literature lacks a cohesive synthesis of methodologies, findings, and research gaps in ML-based real estate price prediction. This study addresses this gap through a comprehensive literature review, examining various ML approaches, including neural networks, ensemble methods, and advanced regression techniques. We identify key research gaps, such as the limited exploration of hybrid ML-econometric models and the interpretability of ML predictions. To validate the robustness of regression models, we conduct generalization testing on an independent dataset. Results demonstrate the applicability of regression models in predicting real estate prices across diverse markets. Our findings underscore the importance of addressing research gaps to advance the field and enhance the practical applicability of ML techniques in real estate price prediction. This study contributes to a deeper understanding of ML’s role in real estate forecasting and provides insights for future research and practical implementation in the real estate industry. Full article
(This article belongs to the Special Issue Second Edition of Predictive Analytics and Data Science)
13 pages, 449 KiB  
Article
A Proactive Decision-Making Model for Evaluating the Reliability of Infrastructure Assets of a Railway System
by Daniel O. Aikhuele and Shahryar Sorooshian
Information 2024, 15(4), 219; https://doi.org/10.3390/info15040219 - 13 Apr 2024
Viewed by 766
Abstract
Railway infrastructure is generally classified as either fixed or movable infrastructure assets. Failure in any of the assets could lead to the complete shutdown and disruption of the entire system, economic loss, inconvenience to passengers and the train operating company(s), and can sometimes [...] Read more.
Railway infrastructure is generally classified as either fixed or movable infrastructure assets. Failure in any of the assets could lead to the complete shutdown and disruption of the entire system, economic loss, inconvenience to passengers and the train operating company(s), and can sometimes result in death or injury in the event of the derailment of the rolling stock. Considering the importance of the railway infrastructure assets, it is only necessary to continuously explore their behavior, reliability, and safety. In this paper, a proactive multi-criteria decision-making model that is based on an interval-valued intuitionistic fuzzy set and some reliability quantitative parameters has been proposed for the evaluation of the reliability of the infrastructure assets. Results from the evaluation show that the failure mode ‘Broken and defective rails’ has the most risk and reliability concerns. Hence, priority should be given to the failure mode to avoid a total system collapse. Full article
(This article belongs to the Special Issue Second Edition of Predictive Analytics and Data Science)
Show Figures

Figure 1

19 pages, 574 KiB  
Article
Generally Applicable Q-Table Compression Method and Its Application for Constrained Stochastic Graph Traversal Optimization Problems
by Tamás Kegyes, Alex Kummer, Zoltán Süle and János Abonyi
Information 2024, 15(4), 193; https://doi.org/10.3390/info15040193 - 31 Mar 2024
Viewed by 710
Abstract
We analyzed a special class of graph traversal problems, where the distances are stochastic, and the agent is restricted to take a limited range in one go. We showed that both constrained shortest Hamiltonian pathfinding problems and disassembly line balancing problems belong to [...] Read more.
We analyzed a special class of graph traversal problems, where the distances are stochastic, and the agent is restricted to take a limited range in one go. We showed that both constrained shortest Hamiltonian pathfinding problems and disassembly line balancing problems belong to the class of constrained shortest pathfinding problems, which can be represented as mixed-integer optimization problems. Reinforcement learning (RL) methods have proven their efficiency in multiple complex problems. However, researchers concluded that the learning time increases radically by growing the state- and action spaces. In continuous cases, approximation techniques are used, but these methods have several limitations in mixed-integer searching spaces. We present the Q-table compression method as a multistep method with dimension reduction, state fusion, and space compression techniques that project a mixed-integer optimization problem into a discrete one. The RL agent is then trained using an extended Q-value-based method to deliver a human-interpretable model for optimal action selection. Our approach was tested in selected constrained stochastic graph traversal use cases, and comparative results are shown to the simple grid-based discretization method. Full article
(This article belongs to the Special Issue Second Edition of Predictive Analytics and Data Science)
Show Figures

Figure 1

32 pages, 1285 KiB  
Article
Comparative Analysis of NLP-Based Models for Company Classification
by Maryan Rizinski, Andrej Jankov, Vignesh Sankaradas, Eugene Pinsky, Igor Mishkovski and Dimitar Trajanov
Information 2024, 15(2), 77; https://doi.org/10.3390/info15020077 - 31 Jan 2024
Viewed by 2287
Abstract
The task of company classification is traditionally performed using established standards, such as the Global Industry Classification Standard (GICS). However, these approaches heavily rely on laborious manual efforts by domain experts, resulting in slow, costly, and vendor-specific assignments. Therefore, we investigate recent natural [...] Read more.
The task of company classification is traditionally performed using established standards, such as the Global Industry Classification Standard (GICS). However, these approaches heavily rely on laborious manual efforts by domain experts, resulting in slow, costly, and vendor-specific assignments. Therefore, we investigate recent natural language processing (NLP) advancements to automate the company classification process. In particular, we employ and evaluate various NLP-based models, including zero-shot learning, One-vs-Rest classification, multi-class classifiers, and ChatGPT-aided classification. We conduct a comprehensive comparison among these models to assess their effectiveness in the company classification task. The evaluation uses the Wharton Research Data Services (WRDS) dataset, consisting of textual descriptions of publicly traded companies. Our findings reveal that the RoBERTa and One-vs-Rest classifiers surpass the other methods, achieving F1 scores of 0.81 and 0.80 on the WRDS dataset, respectively. These results demonstrate that deep learning algorithms offer the potential to automate, standardize, and continuously update classification systems in an efficient and cost-effective way. In addition, we introduce several improvements to the multi-class classification techniques: (1) in the zero-shot methodology, we TF-IDF to enhance sector representation, yielding improved accuracy in comparison to standard zero-shot classifiers; (2) next, we use ChatGPT for dataset generation, revealing potential in scenarios where datasets of company descriptions are lacking; and (3) we also employ K-Fold to reduce noise in the WRDS dataset, followed by conducting experiments to assess the impact of noise reduction on the company classification results. Full article
(This article belongs to the Special Issue Second Edition of Predictive Analytics and Data Science)
Show Figures

Figure 1

20 pages, 8983 KiB  
Article
An Effective Ensemble Convolutional Learning Model with Fine-Tuning for Medicinal Plant Leaf Identification
by Mohd Asif Hajam, Tasleem Arif, Akib Mohi Ud Din Khanday and Mehdi Neshat
Information 2023, 14(11), 618; https://doi.org/10.3390/info14110618 - 18 Nov 2023
Cited by 4 | Viewed by 2997
Abstract
Accurate and efficient medicinal plant image classification is of utmost importance as these plants produce a wide variety of bioactive compounds that offer therapeutic benefits. With a long history of medicinal plant usage, different parts of plants, such as flowers, leaves, and roots, [...] Read more.
Accurate and efficient medicinal plant image classification is of utmost importance as these plants produce a wide variety of bioactive compounds that offer therapeutic benefits. With a long history of medicinal plant usage, different parts of plants, such as flowers, leaves, and roots, have been recognized for their medicinal properties and are used for plant identification. However, leaf images are extensively used due to their convenient accessibility and are a major source of information. In recent years, transfer learning and fine-tuning, which use pre-trained deep convolutional networks to extract pertinent features, have emerged as an extremely effective approach for image-identification problems. This study leveraged the power by three-component deep convolutional neural networks, namely VGG16, VGG19, and DenseNet201, to derive features from the input images of the medicinal plant dataset, containing leaf images of 30 classes. The models were compared and ensembled to make four hybrid models to enhance the predictive performance by utilizing the averaging and weighted averaging strategies. Quantitative experiments were carried out to evaluate the models on the Mendeley Medicinal Leaf Dataset. The resultant ensemble of VGG19+DensNet201 with fine-tuning showcased an enhanced capability in identifying medicinal plant images with an improvement of 7.43% and 5.8% compared with VGG19 and VGG16. Furthermore, VGG19+DensNet201 can outperform its standalone counterparts by achieving an accuracy of 99.12% on the test set. A thorough assessment with metrics such as accuracy, recall, precision, and the F1-score firmly established the effectiveness of the ensemble strategy. Full article
(This article belongs to the Special Issue Second Edition of Predictive Analytics and Data Science)
Show Figures

Figure 1

30 pages, 2295 KiB  
Article
An Integrated GIS-Based Reinforcement Learning Approach for Efficient Prediction of Disease Transmission in Aquaculture
by Aristeidis Karras, Christos Karras, Spyros Sioutas, Christos Makris, George Katselis, Ioannis Hatzilygeroudis, John A. Theodorou and Dimitrios Tsolis
Information 2023, 14(11), 583; https://doi.org/10.3390/info14110583 - 24 Oct 2023
Viewed by 2400
Abstract
This study explores the design and capabilities of a Geographic Information System (GIS) incorporated with an expert knowledge system, tailored for tracking and monitoring the spread of dangerous diseases across a collection of fish farms. Specifically targeting the aquacultural regions of Greece, the [...] Read more.
This study explores the design and capabilities of a Geographic Information System (GIS) incorporated with an expert knowledge system, tailored for tracking and monitoring the spread of dangerous diseases across a collection of fish farms. Specifically targeting the aquacultural regions of Greece, the system captures geographical and climatic data pertinent to these farms. A feature of this system is its ability to calculate disease transmission intervals between individual cages and broader fish farm entities, providing crucial insights into the spread dynamics. These data then act as an entry point to our expert system. To enhance the predictive precision, we employed various machine learning strategies, ultimately focusing on a reinforcement learning (RL) environment. This RL framework, enhanced by the Multi-Armed Bandit (MAB) technique, stands out as a powerful mechanism for effectively managing the flow of virus transmissions within farms. Empirical tests highlight the efficiency of the MAB approach, which, in direct comparisons, consistently outperformed other algorithmic options, achieving an impressive accuracy rate of 96%. Looking ahead to future work, we plan to integrate buffer techniques and delve deeper into advanced RL models to enhance our current system. The results set the stage for future research in predictive modeling within aquaculture health management, and we aim to extend our research even further. Full article
(This article belongs to the Special Issue Second Edition of Predictive Analytics and Data Science)
Show Figures

Figure 1

Planned Papers

The below list represents only planned manuscripts. Some of these manuscripts have not been received by the Editorial Office yet. Papers submitted to MDPI journals are subject to peer-review.

Title: Quantum-Mechanical Approach to Asymmetric Opinion Polarisation in Social Networks
Authors: Ivan S. Maksymov
Affiliation: Artificial Intelligence and Cyber Futures Institute,Charles Sturt University
Abstract: We propose a quantum-mechanical model that represents a human system of beliefs as quantised energy levels of a physical system. This model underscores a novel perspective on opinion dynamics, recreating a broad range of experimental and real-world data that exhibit an asymmetry of opinion radicalisation. In particular, the model demonstrates the phenomena of pronounced conservatism versus mild liberalism when individuals are exposed to opposing views, mirroring recent findings on opinion polarisation via social media exposure. Advancing this model, we establish a solid framework that integrates elements from physics, psychology and philosophy, and also emphasise the inherent advantages of the quantum approach over traditional classical models.

Title: Not yet
Authors: Tajana Simunic Rosing
Affiliation: Department of Computer Science and Engineering, University of California
Abstract: Early prediction of the outcomes in running processes in the industrial internet of things (IIoT) is a growing interest. In the pursuit of process optimization, businesses are increasingly adopting advanced technologies, underscoring the pivotal role of precise early outcome predictions. This study tackles the challenge of early forecasting of outcomes in ongoing processes based on Hyperdimensional Computing (HDC). Our approach, OPERATE-HD presents an innovative method utilizing HDC’s capabilities to predict process outcomes before their finalization. This method enables efficient single-pass inference at various stages of processes and enhances computational efficiency. We also develop an attribute-augmented version of OPERATE-HD that unifies heterogeneous data types into a cohesive high-dimensional space, further improving the baseline HDC-based method’s predictive capacity. We examine our proposed methods using publicly available process mining datasets that encapsulate a diverse array of business scenarios and achieve higher F1-score and ROC curve (AUC), while yielding faster prediction. Keywords: HD Computing, Predictive Process Monitoring, Pattern-based Encoding, Early Sequence Classification

Title: Clustering Offensive Strategies in Australian Rules Football Using Social Network Analysis
Authors: Dr Marion Mundt
Affiliation: UWA Tech & Policy Lab The University of Western Australia
Abstract: Sports teams aim to understand the tactical behaviour of their opposition to gain a competitive advantage. Prior research of tactical behaviour in team sports has predominantly focused on the relationship between key performance indicators and match outcomes. However, key performance indicators fail to capture the patterns of ball movement deployed by teams, which provides deeper insight into a teams' playing style. The purpose of this study was to quantify existing ball movement strategies in Australian rules Football (AF). Detailed descriptions of possession types from 396 matches of the 2019 season were used in this study. Social network analysis was used to measure ball movement patterns for each team during offensive phases of play. K-means clustering identified four unique offensive strategies. The most successful offensive strategy, defined by the number of matches won (83/396), achieved a win-loss ratio of 1.69, and is characterised by: low ball movement predictability, low reliance on well-connected athletes, and a high number of passes. This study's insights into offensive strategy are instructional to AF coaches and high-performance support staff. The outcomes of this study can be used to support the design of tactical training and inform match day decisions surrounding optimal offensive strategies.

Title: A Data-Driven Approach to Set-Theoretic Model Predictive Control for Nonlinear Systems
Authors: Francesco Giannini; Domenico Famularo
Affiliation: DIMES - University of Calabria
Abstract: In this paper, we present a data-driven model predictive control (DDMPC) framework specifically designed for constrained single-input single-output (SISO) nonlinear systems. Our approach involves customizing a set-theoretic receding horizon controller within a data-driven context. To achieve this, we translate model-based conditions into data series of available input and output signals. This translation process leverages recent advances in data-driven control theory, enabling the controller to operate effectively without relying on explicit system models. The proposed framework incorporates a robust methodology for managing system constraints, ensuring that the control actions remain within predefined bounds. By means of time sequences, the controller learns the underlying system dynamics and adapts to changes in real-time, providing enhanced performance and reliability. The integration of set-theoretic methods allows for the systematic handling of uncertainties and disturbances, which are common in nonlinear systems. To validate the effectiveness of our DDMPC framework, we conduct extensive simulations on a nonlinear DC motor system. The results demonstrate significant improvements in control performance, highlighting the robustness and adaptability of our approach compared to traditional model-based MPC techniques. Our findings suggest that the proposed DDMPC framework not only simplifies the control design process by reducing the dependency on accurate mathematical models but also enhances the system's ability to handle complex and dynamic environments. Overall, this work contributes to the growing body of research in data-driven control by offering a practical and efficient solution for the control of constrained SISO nonlinear systems. The insights gained from this study have the potential to influence future developments in DDMPC and its applications across various engineering domains.

Back to TopTop