applsci-logo

Journal Browser

Journal Browser

Emerging Feature Engineering Trends for Machine Learning

A special issue of Applied Sciences (ISSN 2076-3417). This special issue belongs to the section "Computing and Artificial Intelligence".

Deadline for manuscript submissions: closed (31 January 2023) | Viewed by 7639

Special Issue Editors


E-Mail Website
Guest Editor
Department of Computer Languages and Systems, Universitat Jaume I, 12071 Castelló de la Plana, Spain
Interests: pattern recognition; machine learning; data mining; data science
Special Issues, Collections and Topics in MDPI journals

E-Mail Website
Guest Editor
División Multidisciplinaria en Ciudad Universitaria, Universidad Autónoma de Ciudad Juárez, Av. José de Jesús Delgado 18100, Ciudad Juárez 32310, Chihuahua, Mexico
Interests: big data classification; meta-learning; class imbalance; time series; ensembles, neural networks
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues,

Feature engineering is a crucial process aimed at building high-quality data representations from raw data that accurately capture the nature of the problem. Quality data directly impacts the performance of machine learning algorithms by improving their efficiency. In the case of inference algorithms, feature engineering could lead to interpretable models.

Although there are various theories about which techniques belong to the feature engineering process, the truth is that this stage always goes hand in hand with others, such as data cleaning. In this sense, it is possible to say that the processing techniques are firmly related, so the success of each of them depends on the previous or later stages. Additionally, the datasets present a mixture of problems that require the union of various preprocessing techniques from different areas.

In big data, the term smart data has recently emerged, where, as in standard-sized data, obtaining quality data also represents a key element since they provide veracity and validity. However, traditional methods are inefficient when applied to big data since its spatial and temporal complexity increases. This represents a challenge since it is necessary to develop and/or adapt feature engineering techniques that take into account the volume of data and the technologies, programming paradigms, and available platforms.

This Special Issue aims to provide comprehensive coverage on new and state-of-the-art feature engineering and data preprocessing methods for standard and big data problems. Authors are encouraged to submit papers on topics including (but not limited to):

  • Data cleaning;
  • Data imputation;
  • Data normalization;
  • Data transformation;
  • Data reduction.

Prof. Dr. José Salvador Sánchez Garreta
Prof. Dr. Vicente García
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Applied Sciences is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2400 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • data preprocessing
  • smart data
  • data munging
  • data wrangling
  • feature engineering
  • data cleaning
  • data normalization
  • feature extraction
  • feature selection
  • data transformation
  • data integration
  • noise identification
  • missing data
  • data reduction
  • data discretization
  • instance selection
  • instance generation
  • class imbalance

Published Papers (3 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

Jump to: Review

18 pages, 3714 KiB  
Article
Efficient Data Preprocessing with Ensemble Machine Learning Technique for the Early Detection of Chronic Kidney Disease
by Vinoth Kumar Venkatesan, Mahesh Thyluru Ramakrishna, Ivan Izonin, Roman Tkachenko and Myroslav Havryliuk
Appl. Sci. 2023, 13(5), 2885; https://doi.org/10.3390/app13052885 - 23 Feb 2023
Cited by 20 | Viewed by 2501
Abstract
It is a serious global health concern that chronic kidney disease (CKD) kills millions of people each year as a result of poor lifestyle choices and inherited factors. Effective prediction tools for prior detection are essential due to the growing number of patients [...] Read more.
It is a serious global health concern that chronic kidney disease (CKD) kills millions of people each year as a result of poor lifestyle choices and inherited factors. Effective prediction tools for prior detection are essential due to the growing number of patients with this disease. By utilizing machine learning (ML) approaches, this study aids specialists in studying precautionary measures for CKD through prior detection. The main objective of this paper is to predict and classify chronic kidney disease using ML approaches on a publicly available dataset. The dataset of CKD has been taken from the publicly available and accessible dataset Irvine ML Repository, which included 400 instances. ML methods (Support Vector Machine (SVM), K-Nearest Neighbors (KNN), random forest (RF), Logistic Regression (LR), and Decision Tree (DT) Classifier) are used as base learners and their performance has been compared with eXtreme Gradient Boosting (XGBoost). All ML algorithms are evaluated against different performance parameters: accuracy, recall, precision, and F1-measure. The results indicated that XGBoost outperformed with 98.00% accuracy as compared to other ML algorithms. For policymakers to forecast patterns of CKD in the population, the model put forth in this paper may be helpful. The model may enable careful monitoring of individuals who are at risk, early CKD detection, better resource allocation, and management that is patient-centered. Full article
(This article belongs to the Special Issue Emerging Feature Engineering Trends for Machine Learning)
Show Figures

Figure 1

19 pages, 4787 KiB  
Article
Pedestrian Localization in a Video Sequence Using Motion Detection and Active Shape Models
by Juan Alberto Antonio Velázquez, Marcelo Romero Huertas, Roberto Alejo Eleuterio, Everardo Efrén Granda Gutiérrez, Federico Del Razo López and Eréndira Rendón Lara
Appl. Sci. 2022, 12(11), 5371; https://doi.org/10.3390/app12115371 - 26 May 2022
Cited by 2 | Viewed by 1452
Abstract
There is increasing interest in video object detection for many situations, such as industrial processes, surveillance systems, and nature exploration. In this work, we were concerned with the detection of pedestrians in video sequences. The aim was to deal with issues associated with [...] Read more.
There is increasing interest in video object detection for many situations, such as industrial processes, surveillance systems, and nature exploration. In this work, we were concerned with the detection of pedestrians in video sequences. The aim was to deal with issues associated with the background, scale, contrast, or resolution of the video frames, which cause inaccurate detection of pedestrians. The proposed method was based on the combination of two techniques: motion detection by background subtraction (MDBS) and active shape models (ASM). The MDBS technique aids in the identification of a moving region of interest in the video sequence, which potentially includes a pedestrian; then, the ASM algorithm actively finds and adjusts the silhouette of the pedestrian. We tested the proposed MDBS + ASM method with video sequences from open repositories, and the results were favorable in scenes where pedestrians were in a well-illuminated environment. The mean fit error was up to 4.5 pixels. In contrast, in scenes where reflections, occlusions, or pronounced movement are present, the identification was slightly affected; the mean fit error was 8.3 pixels in the worst case. The main contribution of this work was exploring the potential of the combination of MDBS and ASM for performance improvements in the contour-based detection of a moving pedestrian walking in a controlled environment. We present a straightforward method based on classical algorithms which have been proven effective for pedestrian detection. In addition, since we were looking for a practical process that could work in real-time applications (for example, closed-circuit television video or surveillance systems), we established our approach with simple techniques. Full article
(This article belongs to the Special Issue Emerging Feature Engineering Trends for Machine Learning)
Show Figures

Figure 1

Review

Jump to: Research

17 pages, 3029 KiB  
Review
Fuzzy-Based Time Series Forecasting and Modelling: A Bibliometric Analysis
by Luis Palomero, Vicente García and José Salvador Sánchez
Appl. Sci. 2022, 12(14), 6894; https://doi.org/10.3390/app12146894 - 7 Jul 2022
Cited by 8 | Viewed by 2742
Abstract
The purpose of this paper is to present the results of a systematic literature review regarding the development of fuzzy-based models for time series forecasting in the period 2017–2021. The study was conducted using a well-established review protocol and a couple of powerful [...] Read more.
The purpose of this paper is to present the results of a systematic literature review regarding the development of fuzzy-based models for time series forecasting in the period 2017–2021. The study was conducted using a well-established review protocol and a couple of powerful tools for bibliometric analysis to know and analyse the main approaches adopted in the research field of interest. We analysed 118 articles published in peer-reviewed journals indexed in the 2020 Journal Citation Reports of the Web of Science. This allowed us to present an in-depth performance analysis and a science mapping regarding the current situation of fuzzy time series forecasting and modelling. The outputs of this study provide a practical base for further investigations that address this topic from both a methodological point of view and in terms of applicability. Full article
(This article belongs to the Special Issue Emerging Feature Engineering Trends for Machine Learning)
Show Figures

Figure 1

Back to TopTop