1. Introduction
Due to its high incidence and lethality, lung cancer (LC) imposes a significant burden on the healthcare system [
1,
2]. By the end of 2023, it is forecasted that there will be 609,820 cancer-related deaths in the United States, with lung cancer remaining the leading cause of cancer mortality among them [
3]. Currently, clinical detection of lung cancer primarily relies on cellular or histopathology examinations, radiographic imaging such as X-rays and CT scans, and tumor marker assays in bodily fluids. But, these existing detection techniques have obvious drawbacks, including high cost and significant harm to the human body.
Volatile organic compounds (VOCs) contained in exhaled human breath are closely associated with various diseases. Breath analysis was one solution to be put forward, which is a non-invasive early screening method that can be employed for the screening of various diseases, including lung cancer, diabetes, breast cancer and so on [
4,
5,
6] Research has indicated that lung cancer results in elevated levels of acetone and ethanol in exhaled breath. Both of these gases can serve as reliable exhalation biomarkers for early lung cancer [
7,
8]. In current clinical trials, spectroscopic methods, mass spectrometry, chromatography-related techniques, and electronic noses (e-nose) are considered as relatively viable and efficient standard technologies for detecting VOCs in human exhaled breath [
9,
10,
11]. However, techniques such as mass spectrometry, chromatography and spectroscopic methods are constrained by their high equipment costs and technical operator requirements. In contrast, gas sensor array-based e-nose gas sensing technology holds greater cost advantage and development potential for application in breath analysis [
12,
13,
14].
Through the analysis of the data collected by the sensor array by the pattern recognition algorithm, the electronic nose can effectively identify complex gases. In a pattern recognition task, feature selection and feature extraction directly affect the detection performance of electronic nose. Marzorati et al. [
15] extracted nine features from the response of each gas sensor to exhaled gas. Liu et al. [
16] used 19 sensors, selected 13 composite features from each sensor, and combined with classical classifiers to verify the feasibility of identifying LC patients through VOCs. In previous studies, it has been found that the whole feature extraction process is particularly complex, and it is necessary to try the feature extraction method continuously to obtain better results. Gramian Angular Field (GAF) is a data visualization method, proposed by Wang et al., which converts time series into two-dimensional color images [
17]. GAF encodes one-dimensional time series into two-dimensional color images with more prominent key features, so as to display the important information hidden in the sensor signal more clearly. By using the data visualization technology of GAF, the original data do not need to be processed, and are directly converted into two-dimensional color images, which can not only retain the deep features of the signal, but also avoid complex feature extraction engineering [
18].
Currently, pattern recognition algorithms employed in electronic noses are categorized into classical gas identification algorithms (machine learning), artificial neural networks (ANNs), and biologically inspired pulse neural networks (SNNs) [
19,
20,
21]. To address complex gas recognition tasks, ANNs have been regarded as the current popular choice [
22,
23]. Compared with machine learning and SNNs, ANNs exhibit strong adaptability and do not require model redesign for different training tasks. Additionally, the nets can automatically learn complex features from data without the need for manual feature extraction and they encompass various neural network architectures, such as backpropagation neural networks (BPNNs), convolutional neural networks (CNNs), etc. Avian et al. [
24]. proposed a CNN architecture and built two models for analyzing VOCs in exhaled gas. The first model receives the signal processed by different feature extraction methods as input, while the second model directly processes the original signal. The results indicate that different classifiers demonstrate varying effects depending on the employed feature extraction methods, with kernel PCA (KPCA) showing a positive impact on performance. Guo et al. [
25] introduced an innovative deep learning framework that combines an electronic nose to predict odor descriptor ratings, which was the first application of convolutional long short-term memory (ConvLSTM) on an electronic nose for olfaction prediction.
Although convolutional neural networks have found extensive application in gas classification, they possess a substantial number of trainable parameters, high computational complexity, slow inference speed and are not hardware-friendly for devices, which hinder the transportability of pattern recognition algorithms into embedded systems. To address this issue, convolutional neural networks have increasingly evolved towards lightweight architectures [
26,
27].
The contributions of this article are summarized as follows:
- (1)
A hardware-friendly lightweight neural network model (LTNet) using a depth-separable convolution structure for gas classification is constructed.
- (2)
To settle the decrease in classification accuracy caused by depthwise separable convolutions, we propose to add squeeze-and-excitation (SE) attention mechanisms and residual connections in the model.
- (3)
The convolutional and batch normalization (BN) layers are combined together so as to reduce the model parameters, speed up the inference speed and improve the stability of the model.
- (4)
Compared to the unimproved LTNet (LTNet (Original version)), this validates the effectiveness of the improvements made to LTNet.
2. Experimental Section
Data collection was performed using the CGS-8 intelligent gas sensor analysis system provided by Beijing Ailite Technology Co., Ltd., a company located in Beijing, China. Subsequently, the collected data were transformed into images using the Gramian Angular Field (GAF), which served as valid inputs to LTNet. Two different datasets were employed to assess the performance of the model, and the overall process is illustrated in
Figure 1.
2.1. Data Source I: Gas Mixture Dataset
Exhaled breath from the lung cancer patients contains biomarkers, including numerous species, such as acetone, ethanol, isoprene and etc. By detecting the types and concentrations of these biomarkers, we can assess changes in physiological state in vivo and achieve early screening for lung cancer. To validate LTNet’s classification capabilities, the study utilized acetone and ethanol gases to simulate breath biomarkers found in lung cancer patients. In the experiment, a sensor array composed of 16 commercial semiconductor metal oxide gas sensors manufactured by FIGARO was used, which matched the sensor model in Data Source II (UCI database).
The experiment employed a static gas volumetric method using 98% AR acetone and 98% anhydrous ethanol with a microsyringe capable of handling a range of 10
for liquid extraction. As indicated by Equation (1), it can be observed that conducting experiments directly with high-concentration test liquids would result in a very small volume of extracted liquid, making it difficult to inject into the gas chamber. To address this issue, the concentration of the test liquid was chosen to dilute to 10% in the experiment.
where Q is the volume of the test liquid (mL), V denotes the volume of the gas chamber (mL), C stands for the desired gas concentration to be prepared (ppm), M is the molecular weight of the substance, d is the concentration of the test liquid, r signifies the liquid density (
),
represents the laboratory ambient temperature (°C) and
is the gas chamber temperature (°C).
Following the specifications in the sensor manual, the working voltage of all 16 sensors was set to 5 V. The sensor operating current was adjusted on the CGS-8 smart gas sensing analysis system through multiple experiments to identify the optimal operating current for each sensor. The sensor models and their optimal operating currents are presented in
Table 1, respectively.
After setting the optimal operating current, it is necessary to preheat the sensor for two hours and wait for the baseline to stabilize. Then, the evaporation and heating functions of the experimental apparatus are activated. A microsyringe is used to extract a certain amount of liquid from the test liquid prepared according to Formula (1), which is then vertically dropped into the evaporation dish. The sensor array is exposed to acetone, ethanol or a binary mixture of these two VOC gases. The concentration indices for the two gas mixtures are detailed in
Table 2. The experimental response time is set approximately 120 s, and the recovery time is also about 120 s.
2.2. Data Sources II: UCI Database
This work also used a public database from the University of California (UCI) to complement and validate Data Source I. This dataset is a collection of gas sensor drift datasets at different concentrations, collected by the Chemical Signaling Research Laboratory at the UCI BioCircuits Institute in San Diego [
28,
29]. The acetone concentrations in the dataset range from 12–500 ppm and ethanol concentrations range from 10–500 ppm. A total of 4650 datapoints from UCI dataset were used for classification with LTNet.
2.3. Experimental Environment and Hardware Configuration
The algorithmic programming environment for this study is Python 3.10, running on a computer with an RTX 3060 graphics card. The LTNet network, as well as the comparison network, uses the Adam optimizer with a cross-entropy loss function. The model usually converges after about ten rounds of training, so epoch was set to 30. Conventional convolutional neural networks take up a lot of graphics card memory during training, and for comparison purposes, the batch size was set to 16. For the own mixed gas dataset, the learning rate was set to 0.0006, while for the UCI database, the learning rate was set to 0.0004.
4. Results and Discussion
The dataset was divided into training, testing and validation sets in a 6:3:1 ratio. For the own mixed gas dataset, the numbers of images in the training, testing and validation sets were 4474, 2234 and 744, respectively. For the UCI database, the training, testing and validation sets consisted of 2791, 1395 and 464 images, respectively. In this study, validation set accuracy and six evaluation metrics were adopted as criteria for assessing both the lightweight of the model and its classification performance. These criteria include the model’s total accuracy on the validation set (accuracy) (ethanol, acetone and the mixture), the time required for the model to complete thirty training epochs (training time), the GPU memory usage under the same conditions when different models are trained with a cleared background (GPU RAM), the inference time on the validation set (Inference time), the model’s parameters (params) and the size of the best-preserved weights on the test set (weight size).
4.1. Data Conversion Comparison Test and Results Discussion
In the data preprocessing, GASF is compared with Gramian Angular Difference Field (GADF), Short-Time Fourier Transform (STFT) and Markov Transition Field (MTF).
Figure 4 illustrates the images transformed by these four data transformation methods.
LTNet was employed to evaluate four different methods for transforming one-dimensional time series into two-dimensional images. The confusion matrices, which can display the classification of each sample intuitively and make up an important index used to measure the classification performance, are depicted in
Figure 5a–d, while
Table 3 presents a comparative analysis of the results obtained using these four methods.
From
Table 3 and the results of confusion matrices, it is evident that STFT exhibits the highest total classification accuracy, reaching 100%. The majority of errors for GADF and GASF are concentrated in the mixed gas class, with total classification accuracies of 98.79% and 99.06%, respectively. MTF, on the other hand, experiences more classification errors in the mixed gas and ethanol classes, resulting in the total accuracy of only 92.34%. However, training with images transformed using STFT takes the longest training time, reaching 1630.92 s, and it also requires the highest GPU RAM usage. Conversely, training with images transformed using GASF for the same thirty epochs only takes 844.38 s and occupies a mere 2.1 GB of GPU RAM, while achieving similar accuracy to STFT. Therefore, GASF was chosen as the data transformation method in this work.
4.2. Model Evaluation and Comparison Experiment
LTNet is compared with a total of six different networks, including three traditional convolutional neural networks (AlexNet, ResNet50, VGG16) and two lightweight convolutional neural networks (EfficientNet and MobileNetV3_large), as well as the unimproved version of LTNet (LTNet (Original version)). AlexNet is a relatively deep neural network, which facilitates the model in learning more complex features. ResNet50 implements classification using skip-connected residual blocks, VGG16 employs deep convolutional networks for feature extraction from raw samples, followed by classification using fully connected layers. EfficientNet achieves high performance in resource-constrained environments through strategies like compound scaling and width multiplier. MobileNetV3_large classifies raw samples using depthwise separable convolution. To verify the effectiveness of the improvements on LTNet, LTNet (Original version) was used to compare with LTNet. LTNet (Original version) refers to a version of LTNet that does not utilize deep separable convolution, the SE attention mechanism, residual connections or the fusion of convolution layers and BN layers. Instead, it solely implements the network architecture of LTNet.
Before conducting the classification task, we compared the model parameters and the size of the best saved weights during training of LTNet with the other six networks, as shown in
Table 4. Among these seven models, LTNet has only 32,614 parameters, which is less than the number of parameters of the LTNet (Original version). Specifically, it is equivalent to just 0.139% of the traditional convolutional neural network ResNet50, less than 0.1% of AlexNet and VGG16 and even less than 1% of the popular lightweight convolutional neural networks EfficientNet and MobileNetV3_large. The optimal training weight size of LTNet is 0.155 MB, demonstrating a more efficient memory utilization compared to MobileNetV3_large, EfficientNet, AlexNet and LTNet (Original version). This indicates that relative to the existing lightweight convolutional neural networks and traditional convolutional neural networks, LTNet is better suited for use in resource-constrained environments, and by comparing LTNet with its unimproved version, we have demonstrated that the improvements made to LTNet have a lightweight effect.
4.3. Classification Results of Own Mixed Gas Dataset
From
Figure 5e–k and
Table 5, it can be observed that for the own mixed gas dataset, LTNet has five errors in the mixed gas category and only two errors in the ethanol gas category, achieving the highest classification accuracy of 99.06%. Additionally, it is worth noting that LTNet’s GPU RAM usage during training is significantly lower than that of traditional convolutional networks, such as VGG16 and lightweight networks like EfficientNet. It completes thirty rounds of training in only 844.38 s, making it the fastest among all compared networks, far surpassing ResNet50, VGG16 and EfficientNet. LTNet only takes 23 s to complete inference on 744 validation set images, making it the fastest among these six networks. It significantly outperforms traditional convolutional neural networks, with the required inference time being only a quarter of that of the lightweight network MobileNetV3_large. Compared to LTNet (Original version), LTNet has even more advantages.
4.4. UCI Database Classification Results
From
Figure 5l–r and
Table 6, it can be observed that for the UCI database, LTNet achieves similar results as on the own mixed gas dataset. LTNet still maintains the highest classification accuracy while significantly outperforming the traditional convolutional neural network models and lightweight convolutional neural network models in terms of GPU RAM, training time and inference time.
Results from the own mixed gas dataset and the UCI database demonstrate that LTNet can achieve high accuracy in gas classification tasks while maintaining low computational resource requirements, further validating the lightweight nature of LTNet. Moreover, LTNet demonstrates higher accuracy than LTNet (Original version), with better lightweighting effects, validating the effectiveness of LTNet’s improvements.