1. Introduction
With the further intensification of global warming, clean energy has experienced exponential growth for low-carbon development [
1,
2,
3]. The rapid growth of the wind power sector poses maintenance challenges for critical components of wind turbines such as the gearbox, blade, and brake rotors [
4,
5]. Among them, the gearbox, as the fault-prone component, determines the stable operation of the wind turbine [
6,
7]. Once the wind turbine’s gearbox breaks down, a huge economic loss will happen [
5,
8]. Thus, monitoring the operational condition of the gearbox is crucial for enhancing the wind turbine’s reliability and safety. In the actual application, the temperature variation is a suitable index reflecting the health condition of the wind turbine gearbox [
9]. For example, operating under an environment with a high temperature for a long time will directly lead to the degradation of the viscosity of lubricating oil and increase the friction of the gear mesh [
10,
11]. As a result, the temperature will continue to rise, which will result in a vicious cycle, eventually causing damage to the gearbox. Therefore, it is essential to monitor and predict the operation temperature of the wind turbine gearbox [
12]. Although some temperature prediction methods have already been applied in actual applications, the prediction accuracy of the results is not satisfying and can hardly extract the characteristics from the complex nonlinear signals. Therefore, this paper introduces an innovative deep learning network to address the aforementioned challenge. The proposed model outperforms the five classical comparison models in terms of prediction accuracy, meeting the demands for high prediction accuracy in actual engineering applications.
In recent decades, a variety of temperature prediction methods have been proposed and can be categorized into two groups, namely, the traditional mathematical model-based methods and deep learning-based methods [
13,
14]. Traditional approaches have significantly contributed to the advancement of temperature forecasting. For example, Li and coworkers [
15] introduced a data processing technique that involves outlier identification, missing value imputation, and random error reduction. This method was employed to enhance the accuracy of temperature prediction in large-scale concrete applications. To handle locally stationary time series data, Das and coworkers [
16] proposed a model-free temperature prediction method. In the literature, One-step-ahead point prediction models and prediction intervals were developed with the aim of improving prediction accuracy compared to the widely utilized RAMPFIT algorithm in the context of locally stationary long time series data. To deal with predictions of the thermal characteristics of CPU, Wang and coworkers have presented an enhanced linear regression method [
17]. In the proposed method, the conventional linear regression model was enhanced by incorporating the correlation between time series data and the model’s autocorrelation. Although these methods have achieved great success at temperature prediction, disadvantages still limit their further application. Basically, these methods were developed mainly based on prior knowledge and expert experience. Therefore, it is challenging to precisely characterize nonlinear data, such as temperature data, and extract deep abstract features 18.
On the contrary, machine learning-based methods can automatically mine the features of the data in a proper manner when dealing with nonlinear data [
18,
19]. Typical machine learning methods include convolutional neural networks (CNN), recurrent neural networks (RNN), long short-term memory (LSTM) networks, and so on [
20]. Among these networks, LSTM is designed with a specific focus on addressing prediction challenges within time series data. The three-gate structure can handle the time series data well and efficiently avoid gradient disappearance and gradient explosion [
21]. Considering temporal–spatial correlation in traffic systems, Zhao and coworkers [
22] developed an LSTM-based traffic forecasting method for short-term traffic prediction. Vlachas and coworkers have introduced a data-driven forecasting method based on LSTM for high-dimensional chaotic systems [
23]; the results indicated that the LSTM neural network is effective in nonlinear data processing. Jia and coworkers [
24] developed an LSTM model for the long-term and seasonal temperature prediction of the sea, considering the impact of the input length. The experimental results indicate that selecting an appropriate input length can enhance the performance of the model.
Despite the advantages of the LSTM model over the classical CNN and RNN models in processing time series data, it falls short in its ability to extract periodic features, potentially impacting the prediction accuracy. To overcome this problem, the bidirectional long short-term memory (BiLSTM) network is proposed. The BiLSTM model has the advantage in nonlinear temporal signal processing with its bidirectional structure, combining the forward sequences and the reverse sequences of LSTM; thus, it can capture more valuable features from both the forward direction and the reverse direction. Therefore, the prediction accuracy can be significantly improved. Wu and coworkers [
25] proposed a tool wear prediction model based on the BiLSTM model. Based on the dataset processed by singular value decomposition (SVD), the BiLSTM neural network can extract the periodic features of SVD features and achieve a higher prediction accuracy than the comparison models. In actual application, the performance of the BiLSTM model is closely tied to the expertise of professionals in the segmentation of time series data. Thus, Jiang and coworkers [
26] proposed a method combining the elitist preservation genetic algorithm (EGA) with the BiLSTM model to predict battery temperature. The EGA method is employed to derive an optimized data segmentation strategy, thus enabling the BiLSTM neural network to achieve enhanced prediction accuracy. Farah and coworkers [
27] utilized BiLSTM for time series prediction, specifically focusing on confirmed cases, deaths, and recoveries in ten major countries affected by COVID-19. The result confirms that the BiLSTM neural network can achieve robustness and higher prediction accuracy.
Although the BiLSTM model has been successfully applied in many areas, the disadvantages of a single deep learning network are gradually becoming apparent. As the collected temperature data are nonlinear and complex with multiple features (e.g., temporal features, periodic features), it is challenging for a single model to fully capture all the features within the data. Thus, the hybrid model is often used in actual applications, since it successfully preserves the strengths of each model and attains improved prediction accuracy [
28]. Xiao and coworkers [
29] proposed a hybrid model that combines the LSTM neural network with an attention mechanism to handle the time series data to extract more important features from the historical data. The experimental results illustrate that the suggested hybrid model outperforms individual models in terms of robustness and prediction accuracy. Chen and his coworkers [
30] developed a multi-scale CNN and LSTM model for fault diagnosis; this model has shown higher classification accuracy than some traditional intelligent algorithms, particularly in noisy environments. To process the nonlinear data of the concentration of PM2.5, a CNN-LSTM hybrid model is proposed by Li and coworkers [
31]. The hybrid model retains the advantages of CNN and LSTM to achieve a higher prediction accuracy than the single models. Qiao and coworkers [
32] proposed a hybrid model combining the wavelet transform (WT), LSTM, and a stacked autoencoder (SAE) to predict the electricity of America; the hybrid model can achieve higher robustness.
In this paper, a pre-trained 1DCNN-BiLSTM hybrid network is developed for the temperature prediction of the wind turbine gearboxes. In the proposed network, the hybrid model merges the 1DCNN’s ability to process sequential signals and extract spatial features with the BiLSTM network’s capability to extract periodic features. This allows the spatial and periodic features to be well extracted from the actual temperature datasets measured from the target wind turbine gearbox. Moreover, with the pre-training method, the parameters of the model are not randomly initialized, so the problem of local minima can be basically prevented, thus significantly improving the accuracy of the prediction result. In the proposed method, a 1DCNN and a BiLSTM network are pre-trained by the pre-training dataset. Then, the pre-training-based 1DCNN-BiLSTM model is constructed by the above two pre-trained networks. Finally, the wind turbine gearbox dataset is fed into the pre-trained hybrid model for model fine-tuning. To assess the effectiveness of the proposed model, a set of experiments is devised using three temperature datasets; the results indicate that the proposed model leads to improved prediction accuracy and enhanced robustness. The contributions of this paper are summarized as follows:
A novel hybrid 1DCNN-BiLSTM model is designed in this paper. Thus, the most important spatial and periodic features contained in the temperature data can be completely extracted by the hybrid model; thus, higher prediction accuracy can be achieved.
The pre-training method is creatively introduced into the model training. As a result, it is not necessary to randomly initialize the parameters of the hybrid model. This effectively prevents the training process from being trapped by local minima and significantly improves the prediction accuracy of the trained network.
Several experiments are conducted to evaluate the effectiveness of the proposed model by the Commercial Modular Aero-Propulsion System Simulation (C-MAPSS) dataset and the real wind turbine temperature dataset. Evidently, the proposed model outperforms certain classical models for temperature prediction.
The remains of this article are organized as follows. In
Section 2, the basic theory of the proposed model is presented. In
Section 3, the implementation of the proposed method is reported in detail. In
Section 4, several experiments are conducted to evaluate the performance of the proposed model. Finally, the conclusion is given in
Section 5.
5. Conclusions
In regression analysis, the problems based on deep learning, exemplified by the temperature prediction, nonlinearity, and complexity inherent in the data, pose demanding challenges to the feature extraction performance of the model. In order to enhance the feature extraction capability of models, scholars often resort to constructing residual networks or hybrid networks. However, these models may encounter the issue of being trapped in local minima during the training process. Simultaneously, as the depth of the model increases, the associated problems of gradient vanishing or exploding also significantly diminish the effectiveness of the model. Therefore, the objective of this study is to innovatively introduce a pre-training approach while enhancing the feature extraction capability, aiming to circumvent the aforementioned issues.
In this paper, a pre-training-based 1DCNN-BiLSTM model is proposed for temperature prediction of the wind turbine gearbox. The proposed model retains the advantages of the 1DCNN model and the BiLSTM model. Furthermore, the pre-training method, which can pre-adjust the parameters of the model so that it is not necessary to randomly generate initial model parameters, is innovatively employed in the hybrid model. After combining the pre-trained 1DCNN model and the pre-trained BiLSTM model, the model parameters are fine-tuned in the subsequent training. The incorporation of the hybrid model along with the application of the pre-training method results in a remarkable enhancement in the performance of temperature prediction.
Then, three experiments were devised to assess the effectiveness of the proposed model; by comparing the error curves and three key indicators, it can be observed that the method proposed in this paper demonstrates superior performance in terms of both the error upper limit and error fluctuation. Meanwhile, by comparing the proposed method with the 1DCNN-BiLSTM model without pre-training, it is evident that the pre-training method can significantly enhance the model’s performance. It demonstrated superior accuracy in prediction compared to five other existing models, achieving the highest level of prediction accuracy. Based on the experimental case studies, the following can be concluded:
- (a)
The hybrid model retains the advantages of the two single models, and therefore, its more useful features can be extracted by the hybrid model, improving its performance at temperature prediction.
- (b)
The pre-training method can help the model to obtain a better optimization path to obtain optimal parameters. Thus, the ability of the anti-interference of the model can be improved.
- (c)
From the results of the experiments based on measured temperature data, the appropriateness of the proposed model in real applications is demonstrated.
Meanwhile, the research exhibits certain limitations that necessitate additional investigation. The following aspects, in particular, warrant further research:
- (a)
The deep learning-based approach demonstrates strong generalization capabilities; it is reasonable to do further research on its universality. Employing the proposed method to predict the operational temperature of some other mechanical equipment, such as cement production machinery, aerospace engines, and so on, is worthy of further research.
- (b)
The predictive accuracy of the proposed model in Case 1 is observed to be lower than in the other two cases. This is probably attributed to the insufficient data volume and the high complexity of the dataset. Therefore, in subsequent research, further refinement of the model’s structure and parameters can be performed. Additionally, designing a dynamic loss function to capture dynamic biases could be considered, eventually achieving higher prediction accuracy based on a small sample size dataset.