Next Article in Journal
Identifying Original and Restoration Materials through Spectroscopic Analyses on Saturnino Gatti Mural Paintings: How Far a Noninvasive Approach Can Go
Next Article in Special Issue
Infrared Spectroscopy to Assess Manufacturing Procedures of Bone Artefacts from the Chalcolithic Settlement of Vila Nova de São Pedro (Portugal)
Previous Article in Journal
Spatial Deformation Calculation and Parameter Analysis of Pile–Anchor Retaining Structure
Previous Article in Special Issue
Application of Support Vector Machine Algorithm Incorporating Slime Mould Algorithm Strategy in Ancient Glass Classification
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Analysis of the Composition of Ancient Glass and Its Identification Based on the Daen-LR, ARIMA-LSTM and MLR Combined Process

1
School of Information Engineering, Tianjin University of Commerce, Tianjin 300134, China
2
School of Chemical Engineering and Technology, National-Local Joint Engineering Laboratory for Energy Conservation in Chemical Process Integration and Resources Utilization, Hebei University of Technology, Tianjin 300130, China
*
Authors to whom correspondence should be addressed.
These authors contributed equally to this work.
Appl. Sci. 2023, 13(11), 6639; https://doi.org/10.3390/app13116639
Submission received: 21 April 2023 / Revised: 16 May 2023 / Accepted: 29 May 2023 / Published: 30 May 2023

Abstract

:
The glass relics are precious material evidence of the early trade and cultural exchange between the East and the West. To explore the cultural differences and trade development between early China and foreign countries, it is extremely important to classify glass cultural relics. Despite their similar appearances, Chinese glass contains more lead, while foreign glass contains more potassium. In view of this, this paper proposes a joint Daen-LR, ARIMA-LSTM, and MLR machine learning algorithm (JMLA) for the analysis and identification of the chemical composition of ancient glass. We separate the sampling points of ancient glass into two systems: lead-barium glass and high-potassium glass. Firstly, an improved logistic regression model based on a double adaptive elastic network (Daen-LR) is used to select variables with both Oracle and adaptive classification characteristics. Secondly, the ARIMA-LSTM model was used to establish the correlation curve of chemical composition before and after weathering and to predict the change in chemical composition with weathering. Thirdly, combining the data processed by the above two methods, a multiple linear regression model (MLR) is used to classify unknown glass products. It was shown that the sample obtained by this processing method has a very good fit. In comparison with other similar types of models like Decision Trees (DT), Random Forests (RF), Support Vector Machines (SVM), and Random Forests based on classification and regression trees (CART-RF), the classification accuracy of JMLA is 97.9% on the train set. The accuracy rate on the test set reached 97.6%. The results of the research demonstrate that JMLA can improve the accuracy of the glass type classification problem, greatly enhance the research efficiency of archaeological staff, and gain a more reliable result.

1. Introduction

Machine learning (ML) algorithms are a set of mathematical models and statistical [1] methods that can be used in computer systems to learn and make predictions or decisions based on patterns in data. In the field of archaeology, there are many examples of machine learning algorithms applied in the direction of conservation and restoration, provenance research, and the management of cultural heritage. In 1798, the German scientist M.H. Klaproth conducted the first quantitative chemical study of Roman-era glass [2], improving the procedure for weight analysis and devising various procedures for the determination of non-metallic elements, accurately determining the composition of nearly 200 minerals and various industrial products. In 2003, Professor Fuxi Gan and his research team used the proton-excited X-fluorescence (PIXE) technique to quantify the chemical composition of a batch of ancient glass excavated in Yangzhou and Hubei, with the goal of studying the origin, system, and preparation process of ancient Chinese glass [3]. As more and more ancient silicate artifacts were unearthed, some scholars began to classify them based on their chemical composition. In 1992, Korean scientist Lee Chul applied his chemometric pattern recognition method to multivariate data to determine the classification of 94 ancient Korean glass pieces using neutron activation analysis and principal component analysis [4]. In 2010, El-Taher, an Egyptian scholar, used instrumental neutron activation analysis (INAA) and HPGe detector γ-spectroscopy to determine qualitatively and quantitatively for the first time a total of 16 elements in feldspar rock samples collected from Gabel El Dubb, Eastern Desert, Egypt, and to classify their rock samples [5]. In 2011, Thai scholar Won-in K. and his team used Raman spectrophotometry for the first time to characterize fragments of archaeological glass samples with the aim of obtaining information to identify glass samples for classification by laser scattering [6]. In 2019, Nadine Schibille and her team established a temporal model that serves as a tool for dating archaeological glass assemblages as well as a geographical model that allows for a clear classification of Levantine and Egyptian plant ash glasses [7]. However, it is worth noting that the application and extension of machine learning algorithms in the direction of cultural heritage (CH) component analysis and identification of categories are very limited [8].
In recent years, when studying the chemical composition of ancient glass objects, the classification of glass has been mainly determined by the weight ratio of oxides or by analyzing the mass fraction of compounds containing lead and potassium [9,10,11,12,13]. However, the percentage of lead and potassium compounds present varies depending on the region where the glass was produced and the degree of weathering, which would interfere with the classification of the glass. Thereby, this study is based on the data related to ancient glassware provided by the official website of the 2022 China Student Mathematical Modeling Competition [14]. The weathering of glass over thousands of years can cause significant changes in its internal chemical composition. As a result, determining the type of glass by the amount of content in a certain chemical composition is not reliable or scientific. Therefore, based on the double adaptive elastic net improved logistic regression model (Daen-LR), ARIMA-LSTM model, and multiple linear regression model (MLR) [15,16], optimizing and combining these three algorithms, we propose a processing method and process that is suitable for classifying complex data and can be used to predict the unknown classification of glass. This process method is used to analyze and model the data related to the chemical composition and classification information of a batch of ancient Chinese glass products, to find out the correlation between their chemical composition and the basis of their classification, and to use this relationship to predict the category of unknown glass. The accuracy of the algorithm model was judged by testing the presence of heteroskedasticity in the perturbation terms, testing for multicollinearity, and testing the fit of the experimental values through the model to the actual values [17].
With the continuous development of machine learning technology, a variety of machine learning models have been proposed and widely used in classification research. These include Logistic Regression (LR), Naive Bayes(NB), Decision Tree(DT), Support Vector Machine(SVM) and Random Forest(RF), Gradient Boosting Tree(GBT), and so on. However, traditional machine learning methods have some drawbacks in solving real-world problems, such as interference from external factors, failure to meet scientific standards, random results, and poor prediction accuracy. In order to solve these problems, it is necessary to combine sophisticated machine learning methods with more advanced methods. This paper presents a joint machine learning algorithm using Daen-LR, ARIMA-LSTM, and the MLR model (JMLA). We first use an improved logistic regression model based on double adaptive elastic networks (Daen-LR) to select variables that have both Oracle and adaptive classification properties. Secondly, we use the ARIMA-LSTM model to balance the linear and nonlinear trends in the time series data of the chemical content of glass artifacts before and after weathering. Finally, a multiple linear regression model (MLR) was used to classify the experimental samples. By testing the data set of the 2022 Chinese College Students Mathematical Contest in Modeling, this study proves the correctness of the proposed method.
The main contributions of this study include:
  • Successfully established a classification model of ancient glass products with high accuracy.
  • This study combines three different algorithms reasonably and effectively and integrates the advantages of different algorithms into the JMLA algorithm.
  • We made a comprehensive comparison of multiple test sets on multiple models, and the test results show that the algorithm given in this study is superior to other algorithms.
  • In the future, this algorithm model will also be able to support component analysis in many fields, such as water flow pollution, food safety, and environmental protection.
This paper consists of six parts: In the second part, the algorithm and principles of this paper are described in detail. In the third part, the preprocessing of the data and the preparation of the experiment are explained. In the fourth part, the experimental process and results are discussed. In the fifth part, the advantages and limitations of the JMLA model compared with other models are given. In the sixth part, the conclusion, influence, and future research suggestions are drawn.

2. Theory and Method

2.1. An Improved Logistic Regression Model with Double Adaptive Elastic Net

In this analysis of ancient glass artifacts, the relationship between glass weathering and its chemical composition was identified and statistically analyzed by using an improved logistic regression model based on a double adaptive elastic net, i.e., the Daen-Logistic regression (Daen-LR) model. Furthermore, we calculated the p-value of each correlation factor and counted whether each regression coefficient was significant at the 90% confidence level, extracted the strong correlation elements, and excluded the weak correlation elements. The characteristics of the model are as follows:
Logistic regression is an effective method to solve classification problems in which effective estimation of parameters and selection of variables are extremely important. The regularization method [18], which considers adding a penalty term to the optimized loss function to estimate parameters, can simultaneously solve the two key points of logistic regression. Elastic net [19] is one of the representatives of this method.
However, considering the inadequacy of the traditional logistic regression model for the estimation of parameters and the identification of important variables, it has two major shortcomings: First, the selected variables may not be consistent, i.e., they lack oracle properties [20]. Second, the specific effects of strongly correlated variables on the independent variables are not considered, i.e., adaptive categorical effects are missing [21,22].
To overcome the first deficiency of Elastic net, adaptive elastic net is established by combining Adaptive lasso [23] and Ridge to achieve consistency in selected variables. However, the Adaptive coefficient vector W1, which makes an adaptive elastic net with oracle properties, is not easy to set correctly. It is generally determined by the initial estimates of parameters and the constant δ.
To solve the second defect of Elastic net, Van et al. [24] proposed a Generalized ridge in which parameters are first divided into groups and then given different Ridge penalties for each group. The Generalized ridge has an adaptive grouping effect, and its Adaptive ridge also enjoys that effect. However, Generalized ridge does not have the function to select variables and is of limited application.
Based on existing solutions to the Elastic net deficiency, it follows that Adaptive lasso and Adaptive ridge have oracle properties and adaptive grouping effects, respectively, so they can be combined to avoid the two existing disadvantages. This combination of penalties can be called the double adaptive elastic net.
It is assumed that in the composition analysis of glass artifacts, there are m chemical composition influence factors,  X = x 1 , x 2 , , x m  is the characteristic variable of chemical composition content (i.e., independent variable), m is the number of variables, and the weathering status of the corresponding glass artifacts is set as y (i.e., dependent variable), where y represents the dichotomous variable of weathering or not (i.e., 0 means “unweathered” and 1 means “weathered”). To assess the magnitude of the probability of whether a particular glass artifact is weathered, it is necessary to calculate the predicted outcome of the model as the probability of occurrence of  y = 1 , which can then be expressed as  P = f y = 1 x 1 , x 2 , , x m , i.e., the mathematical expression of the traditional logistic regression model is:
Logit ( P ) = ln P ( y = 1 ) 1 P ( y = 1 ) = β 0 + β 1 x 1 + β 2 x 2 + + β m x m
i.e.,
P ( y = 1 ) = exp β 0 + β 1 x 1 + β 2 x 2 + + β m x m 1 + exp β 0 + β 1 x 1 + β 2 x 2 + + β m x m
In the above equation,  β 0 , β 1 , β 2 , , β m  are the regression coefficients to be determined. The great likelihood estimation method is used to find these coefficients:
P ( y = 1 X ) = 1 1 1 + exp β n + β 1 x 1 + β 2 x 2 + + β m x m = π
P ( y = 0 X ) = 1 π
Combining the probability functions of y as:
P y i = π y i ( 1 π ) y i , y i = 0 , 1 ; i = 1 , 2 , , n
According to the Bernoulli distribution, the maximum likelihood function can be expressed as:
l ( β ; X ) = i = 1 n P y i = i = 1 n π i y i 1 π i 1 y i
The log-likelihood function is expressed as:
ln ( l ( β ; X ) ) = i = 1 n y i β 0 + β 1 x i 1 + β 2 x i 2 + + β m x i m ln 1 + exp β 0 + β 1 x i 1 + β 2 x i 2 + + β m x i m
Since the maximum likelihood function is a convex function, the point at which its first derivative equals zero is the point of maximum value. By calculating the first derivative of the undetermined coefficient  β 0 , β 1 , β 2 , , β m  in Equation (6) and setting it equal to zero, all the parameters to be solved in the equation group can be solved.
Considering the shortcomings and deficiencies of the traditional logistic regression model, the oracle effect and adaptive classification effect are integrated into the traditional logistic model to create a double adaptive elastic net model, which makes the identification results of glass cultural relics more relevant and persuasive [25,26].
Theory 1.
Oracle property
In the Logistic model, suppose the real parameters  β 0 = β 01 , β 02 , , β 0 m T ,   A = j β 0 j 0 = 1 , 2 , , m 0 , m 0 < m , Fisher information matrix   I β 0 = I 11 I 12 I 21 I 22 , where   I 11  is a square matrix of order  m 0 ,  ϕ X T β = ln 1 + e X T β , then the double adaptive elastic net logistic has oracle property according to the following conditions.
  • I β 0  Is a positive definite matrix.
  • There exists an open set containing  β 0 , such that for any  β Ω  there exists a function  N ( · )  satisfying:
    ϕ X T β N ( X ) < ,
    and for any m-dimensional vector u, we have  E N ( X ) X T u 3 < ;
  • λ 1 = o ( n ) , and there is a sequence  a n , such that:
    a n β ^ * β 0 = O p ( 1 ) and lim n λ 1 a n δ n = ;
  • λ 2 = o ( n )  and  lim n λ 2 n j = 1 m 0 β 0 j 2 = 0 .
When conditions 1–4 hold, double adaptive elastic net estimate  β ^  has the following properties:
  • n β ^ A β A D N 0 , I 11 1 ;
  • lim n P β ^ A c = 0 = 1 .
Theory 2.
Adaptive Classification Effect
Given the binary data   X i , y i i = 1 n , where   X i = x i 1 , x i 2 , , x i m T  and   j { 1 , 2 , , m } , i = 1 n x i j = 0 ,   i = 1 n x i j 2 = 1 ,   y i { 0 , 1 } . Let   β ^ λ 1 , λ 2  be the estimate of the model and assume that   β ^ k λ 1 , λ 2 β ^ l λ 1 , λ 2 > 0 . Define   D λ 1 , λ 2 k , l = 1 n w 2 k β ^ k λ 1 , λ 2 w 2 l β ^ l λ 1 , λ 2 , then:
D λ 1 , λ 2 ( k , l ) 2 1 ρ k l + λ 1 n w 1 k w 1 l 2 λ 2
where
ρ k l = corr x k , x l
By combining the above two schemes to improve the logistic regression equation,  X i = 1 , x i 1 , x i 2 , , x i m T β = β 0 , β 1 , β 2 , , β m T , y i { 0 , 1 } , i = 1 , 2 , , n , its estimated value of  β  is:
β ^ Daen = arg min β ln ( l ( β ; X ) ) + λ 1 j = 1 m w 1 j β j + λ 2 j = 1 m w 2 j β j 2
where
w 1 j = β ^ j * δ , w 2 j > 0 , λ 1 , λ 2 > 0 , δ > 0
β ^ * = arg min β ln ( l ( β ; X ) ) + λ 2 j = 1 m w 2 j β j 2 .
Since incorporating the correlation of variables into the regression model helps to improve the accuracy of parameter estimation and variable selection [27],
w 2 j = k = 1 , k j m ρ k j m 1 + ε j ,
where  ρ k j = c o r r x k , x j  is the correlation coefficient between variables  x k  and  x j ε 1 , ε 2 , , ε m T  is the vector that can make  w 21 , w 22 , , w 2 m  unequal to each other, and  j = 1 m ε j m = 1 , 0.95 ε j 1.25 .
Equation(9) is equivalent to:
β ^ Daen = arg min β { ln ( l ( β ; X ) ) } , s . t . α j = 1 m w 1 j β j + ( 1 α ) j = 1 m w 2 j β j 2 t ,
where
α = λ 1 λ 1 + λ 2 , t > 0 .
Using the coordinate gradient method and the Newton method to solve  β , Equation (9) can be rewritten as:
β ^ Daen = arg min β I ( β ) + λ 1 j = 1 m w 1 j β j
where
I ( β ) = ln ( l ( β ; X ) ) ) + λ 2 j = 1 m w 2 j β j 2 .
If  β t  is the solution of  β  at step t, then  I β  can be approximated as:
I ( β ) I ( β ( t ) ) + ( β β ( t ) ) T g ( t ) + 1 2 ( β β ( t ) ) T h ( t ) ( β β ( t ) )
where  g t  and  h t  are the gradients of  I β  at  β = β t , respectively, and the Hessian matrix, adding  λ 1 j = 1 m w 1 j β j  to the Equation (12) and making  I ( β ) β = 0  yields:
β ( t + 1 ) = K β ( t ) h 1 ( t ) g ( t ) , λ 1 h 1 ( t ) W 1
where
W 1 = 0 , w 11 , w 12 , , w 1 m T ,
K ( Q , W ) = Q W , 0 W < Q Q + W , 0 W < Q 0 , | Q | W
Since  λ 1 h 1 ( t ) W 1  may have some numbers less than 0, the parameters of some irrelevant variables cannot become 0. Thus, it can be directly rewritten as  λ 1 W 1 . By the above inference, the solution process of  β ^ Daen  can be derived: first generate an initial value of  β , then repeat the calculation of  g t h t , and  β t + 1 , until convergence [26].

2.2. Time Series Forecasting Model Based on ARIMA-LSTM

2.2.1. ARIMA(p,d,q) Model

By using the ARIMA(p,d,q) model, it is possible to analyze observations at past time points, depict the intrinsic link between them, and predict future values, which is achieved based on past time values and linear error equations [28,29,30,31,32]. The ARIMA model is usually denoted as ARIMA(p,d,q), where p is the number of autoregressive terms, q is the number of sliding average terms, and d is the number of differences needed to make it a smooth series [33]. The correlogram, autocorrelation function (ACF), and partial autocorrelation function (PACF) of the time series provide information about the lags [34]. If the time series is found to be smooth, the model can be used for estimation and forecasting. However, if it is not smooth, in order to apply ARIMA, it must be transformed to be smooth by differencing. After identification, an ARIMA model is estimated for a specific smooth time series. The simple ARIMA model is estimated based on the number of effective coefficients, the Bayesian information criterion (BIC) and the Akaike information criterion (AIC), and the adjusted R2 [35]. After estimation, the selected ARIMA model needs to be diagnosed to check if the residuals are white noise. If the residuals are not white noise, the model must be re-estimated, and Q-tests and normality tests can be used to diagnose the residuals [36]. Typically, the ARIMA model is as follows:
y t = α 0 + i = 1 p α i y t i + ε t + i = 1 q β i ε t i
y t = Δ d y t = ( 1 L ) d y t
1 i = 1 p α i L i ( 1 L ) d y t = α 0 + 1 + i = 1 q β i L i ε t
For the study, the more the number of chemical content parameters, the better the model fit, but this will be at the cost of increasing the model complexity, so the model selection should seek the best balance between the model complexity and the ability of the model to explain the data. According to the Bayesian information criterion, when the BIC is smallest, the optimal solution between the fit effect and complexity can be found [37]:
BIC = ln ( T ) ( n ) 2 ln ( M )
T: number of samples;
n: number of unknown parameters, n = p + q + 1;
M: maximum likelihood number of the model.
The maximum likelihood estimation process for the ARIMA(p,d,q) model is [38]:
Y t = c + Φ 1 Y t 1 + Φ 2 Y t 2 + + Φ p Y t p + ε t + θ 1 ε t 1 + + θ q ε t q
Φ 1 , Φ 2 Φ p  respectively represent the autolinear correlation coefficients between  Y t 1 Y t p  and  Y t . By introducing the  p 1  term in the middle, the direct relationship between  Y t 1 Y t p  and  Y t  can be separated, and this relationship is linear.  Φ 1 , Φ 2 Φ p  is the value to measure the size of this influence, which is the so-called PACF( θ 1 θ q  is the same meaning as  Φ 1 , Φ 2 Φ p ).  ε t  is the perturbed term. Where  ε t ~ iidN 0 , σ 2 , The vector of total parameters is
Θ = c , Φ 1 , Φ 2 , , Φ p , θ 1 , θ 2 , , θ q , σ 2
The estimation of the likelihood function for the autoregressive process is conditioned on the initial value of  y , and the estimation of the likelihood function for the moving average process is conditioned on the initial value of  ε . Then ARIMA(p,d,q) is conditioned on d as the difference order and the initial values of  y  and  ε .
Assume the initial values  y 0 = y 0 , y 1 , , y p + 1  and  ε 0 = ε 0 , ε 1 , , ε q + 1  is known, then according to  y 1 , y 2 , , y r , it can be iterate to this equation:
ε t = y t c Φ 1 y t 1 Φ 2 y t 2 Φ p y t p θ 1 ε t 1 θ 2 ε t 2 θ q ε t q
The sequence  ε 1 , ε 2 , , ε T  for  t = 1 , 2 , , T  can be obtained, then the conditional likelihood function is:
L ( θ ) = ln f Y T , Y T 1 , Y 1 Y 0 , t 0 y T , y T 1 , , y 1 Y 0 , ε 0 , θ = T 2 ln ( 2 π ) T 2 ln σ 2 t = 1 T ε t 2 2 σ 2

2.2.2. LSTM Model

ARIMA(p,d,q) model can well deal with the linear part of the chemical composition content in the time series, but it has certain limitations because the obtained residual series results have nonlinear characteristics, and the process of the content of some chemical components changing with the degree of weathering is a nonlinear process. This requires a deep learning model to solve the nonlinear trend of chemical composition changes [39]. The LSTM (Long Short-Term Memory) model is a deep learning model that is very good at solving nonlinear data. Its nonlinear gate unit can adjust the information flowing into and out of memory tuples at each time point so as to better fit the trend of nonlinear data changing over time.
LSTM is a special type of recurrent neural network (RNN) that performs very well with long sequences of data, mainly solving gradient disappearance, gradient explosion, and overfitting problems when training long sequences [40,41,42,43]. RNN is an artificial neural network that operates on time-series data and can use back-propagation algorithms to learn and adapt to the relationship between inputs and outputs. In contrast to standard RNN, LSTM has an input gate, a forgetting gate, and an output gate that control the way information flows through the network. These gates of the LSTM allow it to store past information and update the current state appropriately, thus providing a significant advantage when dealing with long sequences of data. The basic structure is shown in Figure 1 [44,45].
The basic unit of the LSTM network contains an oblivion gate, an input gate, and an output gate. The oblivion gate determines the oblivion part of the state storage unit by combining the input  x i  with the state storage unit  C i 1  and the intermediate output  h i 1 , while the input gate transforms  x i  by means of the Σ and tanh functions. The associated intermediate output  h i  is determined by the updated  C i  and the output  B i  [32]. The calculation formulas are shown in (22) to (27):
f i = σ W f · h i 1 , x i + b f
e i = σ W e · h i 1 , x i + b e
C ˜ i = tan h W c · h i 1 , x i + b c
C i = f i C i 1 + e i C i
B i = σ W B · h i 1 , x i + b B
h i = B i tan h C i
f i e i C ˜ i C i , and  B  are the forgetting gate, input gate, new candidate vector, updated cell state, and output gate, respectively,  W f  and  b f  are the corresponding weight coefficient matrix and bias term, tanh, and  σ  represent the hyperbolic tangent activation function and S-shaped activation function [45]:
tan h ( x ) = 1 exp ( 2 x ) 1 + exp ( 2 x )
σ ( x ) = 1 1 + exp ( x )

2.2.3. ARIMA-LSTM Model

In order to deal with linear and nonlinear trends in the time series data of chemical composition content before and after weathering of cultural relics, the unique advantages of the ARIMA model in dealing with linear data and the excellent performance of LSTM in dealing with nonlinearity were used [46,47]. First, the artifact chemical content data were processed, and linear prediction results and residual series were obtained with the help of the ARIMA model. Then, the nonlinear factors of the residual series were further analyzed by the LSTM model, and the nonlinear prediction results were obtained. Finally, the linear and nonlinear prediction results were superimposed to obtain the final prediction results for the chemical composition content. According to the decomposition principle of the time series model, it is assumed that the time series  Y = y t , t = 1 , 2 , N  consists of linear and nonlinear components  y t = x t + b x t . Therefore, the one-dimensional chemical component data are first linearly predicted by the ARIMA model to obtain the linear component  x t r  and the residual series  δ t = y t + y t r . Then, the residual series are processed by further nonlinear prediction to obtain the nonlinear component  b x t r . Finally, the linear and nonlinear components are combined to obtain the final prediction  y t r = x t r + b x t r . Root mean square error (RMSE) [48], mean absolute percentage error (MAPE), and R2 are used to evaluate the performance of the model [49,50,51,52].
R2 is usually taken as [0,1]; the closer R2 is to 1, the better the fit is, and the equations are as follows:
RMSE = 1 N i = 1 N x i y i 2
MAPE = i = 1 N x i y i x i × 100 N
R 2 = i = 1 N y i y ¯ 2 i = 1 N y i y ^ 2 i = 1 N y i y ¯ 2
In the above equation,  x i  is the observed value,  y i  is the predicted value, N is the sample size,  y ¯  is the mean of  y i , and  y ^ i  is the regression fit [32].

2.3. Multiple Linear Regression Model

Multiple linear regression (MLR) is a statistical method that predicts the distribution of the dependent variable by using multiple independent factors [15]. The goal of the MLR model is to establish linear links between independent and dependent characteristics that influence a given event, and it is an extension of classical least squares regression because it employs multiple explanatory factors.
y = α 0 + α 1 x 1 + α i x i + + α n x n + μ i
where  y  is the dependent variable,  x 1 x n  are the independent variables,  α 0  is the  y  intercept,  α i  is the regression coefficient of the  i th independent variable, and  μ i  is the model error, also known as the residual. The magnitude of the coefficient of determination(R2)and the squared error(MSE)can be used to assess the predictive performance of the MLR model [15]:
M S E = Σ j = 1 n y j y ^ j 2 n
R 2 = 1 j = 1 n y j y ^ j 2 j = 1 n y j y ¯ j 2
y j  is the  j th parameter after normalization, y ^ j  is the  j th parameter predicted,  y ¯ j  is the mean of the predicted parameters, and n is the number of samples.
We performed the BP test (Breucsh and Pagan test) and the White test on the perturbation term  μ i  to see if there was heteroskedasticity. If the perturbation term is correlated with the independent variable, it may make the regression coefficients of the model inaccurate, thus leading to large errors in the results.
In the BP test, it is assumed that the regression model is  y i = β 1 + β 2 x i 2 + + β K x i K + ε i , test the following original hypothesis:
H 0 : E ε i 2 | x 2 , , x k = σ 2
If  H 0  is not true, then the conditional variance  E ε i 2 | x 2 , , x k  is a function of  x 2 , , x k  and is called the conditional variance function. The BP test assumes that the conditional variance function is linear:
ε i 2 = δ 1 + δ 2 x i 2 + + δ K x i K + u i
The original hypothesis can be simplified to:
H 0 : δ 2 = = δ K = 0
If we assume that  H 0  is true, we can show that  ε i  has no correlation with the independent variable  x i K ; that is, there is no autocorrelation, and the perturbation term has no heteroscedasticity. Since the perturbation term  ε i  is not observable, the residual squared  e i 2  is used for auxiliary regression of the explanatory variable:
e i 2 = δ 1 + δ 2 x i 2 + + δ K x i K + e r r o r i
n R 2  statistics were used:
n R 2 d x 2 K 1
R 2  is the  R 2  of auxiliary regression. The difference between the White test and the BP test lies in that when the White test carries out auxiliary regression, there are  x i K  square terms and cross terms in Equation (37), so the BP test can be regarded as a special case of the White test.
In addition, we tested the model for multicollinearity, and the variance inflation factor VIF was used to eliminate the influence factors with multicollinearity, which improved the accuracy of the model:
Assuming that there are k independent variables, then the variance inflation factor  V I F n = 1 1 R 1 k / n 2 R 1 k / n 2  is the goodness of fit obtained by regressing the n-th independent variable as the dependent variable on the remaining k − 1 independent variables; the larger the  V I F n , the greater the correlation between the n-th variable and the other variables [53]. If  V I F n  is greater than 10, there is strict multicollinearity between the variables.

3. Material and Experiment

3.1. Data Pre-Processing

This study is based on the data related to ancient glassware provided by the official website of the 2022 China Student Mathematical Modeling Competition [14]. The glass sampling points are discussed separately by two systems: lead-barium glass and high-potassium glass. The data gives the proportion of the chemical composition of the sampling points of this batch of artifacts, which is characterized by composition, that is, the data of the proportion of the content of each chemical component of the cumulative sum should be found 100%, but may be due to detection means or contain various types of impurities and other reasons, resulting in the proportion of its corresponding components of the cumulative sum of the non-100% situation. Thus, in this study, the data with the sum of components between 85% and 105% were stored as valid data, and the data with severe weathering of the glass were excluded to eliminate the influence of outliers on the model results. The results are shown in Table A1.

3.2. Experimental Procedure

This paper focuses on three improved joint model algorithms, Daen-LR, ARIMA-LSTM and MLR. The experimental software environment Matlab 2021b, SPSS, Stata were used to analyze the identification of ancient glasses.
First, in this paper, the obtained pre-processed data set is used to find the relationship between the chemical composition content and weathering at its sampling points after glass classification by building an improved logistic regression model based on a double adaptive elastic net. Then, by using the ARIMA-LSTM model, we predict the content of chemical components contained in the two glasses before weathering and obtain the correlation curves of chemical components before and after weathering. Finally, based on the results obtained above, this paper uses a multiple linear regression model to predict the type of unknown glass and judges the accuracy and efficiency of the model by testing whether there is heteroskedasticity in the perturbation terms, multicollinearity, and the degree of fit between the experimental and actual values of the model. The flow chart is shown in Figure 2.

4. Process and Result

4.1. Relationship between Glass Weathering and Its Chemical Composition Based on an Improved Logistic Regression Model with Double Adaptive Elastic Net

Based on the data in Table A1, we use Matlab and SPSS to conduct modeling and calculation of the Daen-Logistic Regression model. The dependent variable here is a dichotomous variable (i.e., weathered and unweathered states), and the content of various chemical components is set as the independent variable. The double adaptive elastic net model can determine the classification results of weathered or unweathered glass under different conditions for each independent variable, which can avoid the variability of the results when the independent variables are selected differently, make the classification more adaptive, and reduce the influence of strongly correlated variables on other variables. The calculation gives the following in Table 1:
From the table of high potassium glass type, it can be seen that the values of two chemical components, SiO2 and K2O, are relatively large, and the values of the significance p-value are less than p = 0.1, so these two chemical components have the greatest influence on whether the surface of high potassium glass is weathered or not. From the table of lead-barium glass type, it can be seen that the values of three chemical components, SiO2, PbO, and P2O5, are relatively large, and the values of the significance p-value are less than p = 0.1, so these two chemical components have the greatest influence on whether the surface of high potassium glass is weathered or not.

4.2. Prediction of the Chemical Content of Glass before Weathering Based on the ARIMA-LSTM Model

By solving 4.1, we obtained the results of the relationship between glass weathering and its chemical composition, and using this relationship, we screened 14 chemical elements in two respective types, high potassium and lead-barium, respectively. For high potassium glasses, we have chosen to retain both SiO2 and K2O chemical components. For lead-barium glasses, we chose to retain three chemical components: SiO2, PbO, and P2O5; all of them have relatively complete data and have strong correlations for modeling analysis.
In order to make the model better identify the patterns in the data, the outliers with large deviations are first eliminated. SPSS 24 software was used to detect three abnormal data values with an additive or transient state, and the existence of such outliers would lead to accidental results in the model, leading to wrong conclusions. Taking the SiO2 content of high potassium glass and lead-barium glass as examples, the outliers of both are shown in Table 2.
Through the analysis and calculation in Matlab and SPSS, we tested and fitted the data values of all the chemical composition contents changing with the time series and established the ARIMA-LSTM prediction curve model. We found that the parameters of ARIMA(2,1,0)-LSTM can obtain the maximum likelihood value of the model, and the normalized BIC [54] values of 3.160 and 4.160 for the SiO2 component content in high potassium glass and lead-barium glass, for example, are the smallest values among the parameters. In addition, the smooth R2 values of the model are 0.960 and 0.934, both close to 1, and both p-values are 0.000, both less than 0.05, so it can be considered that the results of the model are significantly reasonable and can fit well with the prediction model (Table 3).
After the initial completion of the estimated time series model based on the chemical composition content, a white noise test of the residuals is required. If the residuals are white noise, then it can indicate that the selected model can identify the laws of the time series data, that is, the model is acceptable; if the residuals are not white noise, then it means that there is still some information not identified; at this time, the model parameters need to be revised to continue to identify this part of the information. The study used Ljung and Box’s Q test to determine whether the residuals are white noise [55,56]:
Assuming that the residual  ϵ t  is a white noise sequence, then  ρ s = 1 , s = 0 0 , s 0 , the autocorrelation coefficient of the sample, is:
r = ρ s ^ = t = s + 1 T x t x ¯ x t s x ¯ t = 1 T x t x ¯ 2
In  H 0 : ρ 1 = ρ 2 = = ρ S = 0 H 1 : ρ i ( i = 1 , 2 , , s )  at least one is not 0. In the case that  H 0  holds, the statistic  Q = T ( T + 2 ) k + 1 s r k 2 T k ~ X s n 2 , from which the p-value can be calculated, and if the p-value is less than 0.05, then the original hypothesis is rejected, indicating that the model is not fully identified and the model parameters need to be modified.
Through the model statistics, the p-values of the Ljung and Box’s Q test for SiO2 content of high potassium glass and lead-barium glass are 0.889 and 0.744, respectively, both of which are greater than 0.05, i.e., we cannot reject the original hypothesis, and we can assume that the residuals are white noise sequences and the model can be fully identified. Figure 3 shows that the autocorrelation coefficients and partial autocorrelation coefficients of all lag orders are not significantly different from 0 [57,58].
By the same method, the fitting coefficients of all mathematical models of the measured chemical composition contents were obtained. In the category of high potassium glass, the R2 values of SiO2 and K2O were 0.960 and 0.969, respectively. In the category of lead-barium glass, the R2 values of SiO2, P2O5, and PbO are 0.934, 0.951, and 0.948, respectively. Finally, the corresponding prediction model curve is drawn, from which the correlation of chemical composition content before and after weathering can be clearly seen, as shown in Figure 4. The blue curve represents the actual value of chemical content changing with time after weathering, while the yellow curve represents the fitting value. The fitting degree of both represents the superiority of the model’s performance. The red curve represents the predicted value of component content over time before weathering. It can be seen that the ARIMA (2,1,0)-LSTM model shows the correlation of chemical composition contents before and after weathering, reduces the interference of “weathering” factors on glass classification, and improves the accuracy of subsequent glass classification.

4.3. Identifying Unknown Artifact Types Based on Multiple Linear Regression Model

Through the results of Section 4.1 and Section 4.2, we conducted chemical content testing and analysis on a batch of newly excavated glass artifacts, as shown in Table 4, and judged the categories to which they belonged by the correlation of related elements and weathering effects. Firstly, the chemical elements with significant correlation in each category were initially screened out by the statistical law of chemical element content, and the multiple linear regression equation between elements and categories was established to find out the experimental values of categories and make errors and fits with the actual values of categories, and to verify the accuracy of the model.
According to the classification rules of chemical content and surface weathering, it can be initially concluded that K2O, CaO, MgO, Al2O3, FeO, PbO, BaO, and P2O5 have strong correlations with surface weathering and categories, while the remaining elements have weak correlations, so the remaining elements can be deleted. In addition, because the chemical element contents of the three heavily weathered glass artifacts are very different from other contents, which will have a large impact on the analysis of the model, they are treated as outliers. In the multiple linear regression equation, the qualitative data should be set as dummy variables, so the qualitative variables (unweathered and weathered) in the surface weathering independent variable  S u w  can be set as quantitative variables (0 and 1), and the qualitative variables (high-potassium and lead-barium) in the discriminatory category dependent variable  y i  can be set as quantitative variables (A and B), and the following multiple linear regression equation can be established as:
y i = α 0 + α 1 x S i i + α 2 x K i + α 3 x C a i + α 4 x M g i + α 5 x A l i + α 6 x F e i + α 7 x P b i + α 8 x B a i + α 9 x P i + β S u w i + μ i
S u w i = 1   denotes   the   i - th   weathering   sample S u w i = 0   denotes   the   i - th   unweathered   sample E ( y S u w = 1   a n d   o t h e r   i n d e p e n d e n t   v a r i a b l e s ) = β × 1 + m ( C o n s t a n t s ) E ( y S u w = 0   a n d   o t h e r   i n d e p e n d e n t   v a r i a b l e s ) = β × 0 + m ( C o n s t a n t s )
The joint significance test indicators for the F-statistic [59,60] for the above model results are as follows:
F(10,55) represents the F joint statistic test value of 51.41, the confidence interval is 95%, and the original hypothesis  H 0  is:  a 1 = a 2 = a 3 = = a 9 = β = 0 . From Table 5, we can see that the p-value is 0, p is less than 0.05, and at this time the original hypothesis is rejected. We have reason to believe that the correlation coefficient is significantly different from 0, so we can consider this model to be useful. The regression coefficients and corresponding p-values for the variables of interest can be derived as in Table 6. Only when the p-value is less than 0.05, we consider it significant, and the regression coefficient is credible at this point, so we can use the regression coefficients corresponding to K2O, Al2O3, PbO, BaO, and Suw (the dummy variable “weathering”), and the larger the absolute value of the regression coefficients, the greater the effect on the dependent variable.
We can derive the multiple linear regression equation for glass artifact class, chemical element content, and weathering type as follows:
y ^ i = 0.7458 + 0.0480 x k 0.0299 x A l 0.0165 x P b 0.0161 x B a + 0.3517   Suw

4.3.1. Testing for the Presence of Heteroskedasticity in the Perturbation Term

Perturbation term μ i is unobservable and requires certain conditions to be met. Our model defaults to a spherical perturbation term, which generally has to satisfy “no autocorrelation” and “homoskedasticity” because if the perturbation term is “correlated with the independent variable”, i.e., endogenous, it will make the correlation regression coefficient inaccurate; if there is “heteroskedasticity”, it will cause the hypothesis test statistic we constructed to be invalid, and the  O L S  estimator cannot be treated as the optimal linear unbiased estimator [61]. Therefore, we performed the BP test and White test on the perturbation term to verify the presence of heteroskedasticity, as shown in Table 7 [62,63,64].
The above two hypotheses were tested for heteroskedasticity, and the original hypothesis  H 0  was that there is no heteroskedasticity in the perturbation term. However, the p-value is greater than 0.05, so  H 0  is accepted, and we can assume that there is no heteroskedasticity in the perturbation term.

4.3.2. Testing for Multicollinearity

If the data matrix X does not satisfy the column rank, i.e., a variable can be linearly expressed by other explanatory variables, then there is “strict multicollinearity”, Stata software was used to calculate the VIF of each variable, and the test results were as follows Table 8:
It is generally believed that when VIF > 10, the regression equation has severe multicollinearity; SiO2 and PbO both exceed 10, but the p-value of SiO2 is higher than 0.05, which is not significant, so its coefficient is not considered in the equation model. PbO, although VIF exceeds 10, the p-value is lower than 0.05 because the coefficient is still significant with variance inflation; if there is no multicollinearity, the regression coefficients would be more significant.

4.3.3. Testing the Fit of the Experimental and Actual Values of the Model and Identifying the Unknown Artifact Types

Due to the fact that the dependent variable is the category of glass artifacts, there are only two categories: high-potassium and lead-barium, so it can be treated as a 1-0 variable. If the experimental value is close to 1, then it is considered the high potassium category; if the experimental value is close to 0, then it is considered the lead-barium category. From Figure 5 and Table 9, it can be seen that the 66 samples fit very well, almost no chance data occur, and the identification results of 8 unknown cultural relic types are completely consistent with reality, so it can be considered that the predicted value of this multiple linear regression equation is quite accurate.

4.4. Comparision of Different Models

To demonstrate the superiority of the proposed method, we compare the proposed joint algorithm with similar decision and classification algorithms like Decision Trees (DT), Random Forests (RF) [65], Support Vector Machines (SVM), Random Forests based on calssification and regression tree (CART-RF) [66,67,68,69]. We did not use any pre-trained models, but trained each model from scratch. When we select the parameters of traditional machine learning algorithm, we take into account the number of data features and avoid overfitting, as we can see in Table 10. Then we perform experimental simulations of these models to be compared as well as the model proposed in this paper using Matlab. The results are presented in the following Table 11. In the classification results, this study uses common evaluation indicators to judge the superiority of the model: Train Acc, Test Acc, Precision, Recall, and F1 Score. TP, TN, FP, FN are required to explain the above indicators, so confusion matrix is introduced, as shown in Figure 6. The specific performance is described as follows:
  • TP (True Positive): The true value of the data is high potassium, and the predicted value is also high potassium.
  • TN (True Negative): The true value of the data is lead barium, and the predicted value is also lead barium.
  • FP (False Positive): The true value of the data is high potassium, but it is incorrectly predicted as lead barium.
  • FN (False Negative): The true value of the data is lead barium, but it is incorrectly predicted as high potassium.
Accuracy is the simplest and most clear index for evaluating classification models, but it is a good measurement standard only when the proportion of samples in each category of the data set is fairly balanced, as shown in Equation (41):
Accuracy = T P + T N T P + F P + T N + F N
Precision represents the proportion of samples that are actually positive in the predicted positive example. As shown in Equation (42):
Precision = T P T P + F P
Recall represents the proportion of the actual number of positive samples in the total positive samples among the predicted positive samples. As shown in Equation (43):
Recall = T P T P + F N
F1 Score is a weighted average of accuracy rate and recall rate, which is a synthesis of both. The value of the F1 Score determines the robustness of the model. It can be considered that the higher F1 is, the more stable the model is. As shown in Equation (44):
F 1 = 2 × Precision × Recall Precision + Recall

5. Discussion

From the results of the comparison experiments, it is not difficult to see that the joint algorithm proposed in this paper shows notable advantages in all performance parameters. As shown in Table 11, in the indicators of Train Acc, Test Acc, Precision, Recall, and F1 Score, we can find that the difference values of JMLA’s performance over the past optimal algorithm model are +0.017, +0.025, +0.046, +0.035, and +0.040, respectively. We improve common machine learning algorithms and combine them with deep learning models to make the classification results more accurate, which provides a new idea for the study of the classification of ancient cultural relics.
Considering the accuracy of the JMLA algorithm in classification results and excellent evaluation indexes, this study believes that the model proposed in this paper is suitable for providing more in-depth research ideas for the classification of ancient cultural relics. The algorithm ideas in this paper can also be applied to other related fields, such as the data analysis of nutrient elements in food, the influence of air oxidation degree on nutrient elements, the classification of water pollution degree, etc. However, there is still room for improvement in the joint algorithm to address its high computational complexity and formula complexity. Compared with the existing algorithm, the calculation cost of JMLA is higher, and the formula is more complex. Further reducing algorithm complexity and better unifying the above three algorithms will be the focus of future research.

6. Conclusions

In this paper, we propose a joint Daen-LR, ARIMA-LSTM, and MLR machine learning algorithm (JMLA). Firstly, we combine a double adaptive elastic network with a traditional logistic model to select variables that have both Oracle and adaptive classification characteristics. These two characteristics eliminate the influence of different categories on the inconsistent selection of important independent variables and the influence of strong-correlation independent variables on the interference of weak independent variables. Secondly, we combine the deep learning model (LSTM) with the ARIMA time series model so that it can handle both linear and nonlinear trends. By calculating the ARIMA-LSTM model, we establish the correlation curve of chemical composition before and after weathering and predict the change in chemical composition with weathering. Thirdly, we combine the data processed by the above two improved methods with the multiple linear regression model to classify the unknown glass relics.
The experimental results show that the accuracy of the JMLA model on the train set is 97.9%, and the accuracy of the JMLA model on the test set is 97.6%. In addition, we compared JMLA with similar classifiers, and the results were shown in Train Acc, Test Acc, Precision, Recall, and F1 Score indexes. The difference values of JMLA’s performance over the past optimal algorithm model are +0.017, +0.025, +0.046, +0.035, and +0.040, respectively. These data show that the JMLA model has better performance than other classification models without changing the structure of similar classification models and under the same experimental conditions. The classification accuracy of the JMLA model is higher than other models, especially for large glass relics with more chemical elements and a harsh environment.
This processing method is practical and reliable in the direction related to the composition analysis and identification of cultural heritage. The application of this method is expected to improve the accuracy of the classification of cultural relics by archaeologists and can effectively reduce the impact of identification difficulties caused by factors such as harsh burial environments. It helps us to have a deeper understanding of the exchange, penetration, and development of ancient Eastern and Western cultures.
In addition, the future research directions of this study can be summarized as follows:
  • Algorithm optimization. The processing method uses a variety of machine learning algorithms that effectively combine the advantages of each algorithm with high practicality and feasibility and a good fitting effect. However, this model is only combined with an LSTM deep learning neural network, which can be combined with more advanced deep learning models in the future so as to improve the accuracy and efficiency of classification.
  • Reduce model calculation costs and formula complexity. Although the classification accuracy of the JMLA model is very high, the calculation time is relatively long compared with other models, and the formula is relatively complex, which is also a pain point for the JMLA model. Therefore, reducing the calculation amount and better integrating the three models will be the focus of future research.
  • Application prospects of this data processing method. In spite of its application in the direction of heritage composition analysis and identification, it is expected to be applied in the areas of health, food safety, and environmental protection, for example: analysis and classification of chemical constituents of tobacco; composition analysis of nutritional composition in food; classification and monitoring of pollutant composition in air, etc.

Author Contributions

Conceptualization, Z.-X.L. and P.-S.L.; data curation, P.-S.L.; formal analysis, Z.-X.L.; investigation, J.-H.L.; methodology, Z.-X.L.; project administration, Z.-H.Y., Y.-P.M. and H.-H.W.; resources, G.-Y.W.; software, P.-S.L.; supervision, G.-Y.W. and H.-H.W.; validation, J.-H.L., Z.-H.Y. and Y.-P.M.; visualization, J.-H.L.; writing—original draft, Z.-X.L. and P.-S.L.; writing—review and editing, G.-Y.W., J.-H.L. and Z.-H.Y. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China (62203332/22008050) and the Natural Science Foundation of Hebei Province (B2022202008).

Institutional Review Board Statement

This study did not require ethical approval.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data are provided in the 2022 China Undergraduate Mathematical Contest in Modeling. The authors do not have permission to share the data.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A

Table A1. The result graph after glass pre-processing.
Table A1. The result graph after glass pre-processing.
Glass Sampling PointsSiO2Na2OK2OCaOMgOAl2O3Fe2O3CuOPbOBaOP2O5SrOSnO2SO2
HPNP0169.3309.996.320.873.931.743.87001.17000.39
HPNP03(1)87.0505.192.0104.0600.780.2500.66000
HPNP03(2)61.71012.375.871.115.52.165.091.412.860.70.100
HPNP0465.8809.677.121.566.442.062.18000.79000.36
HPNP0561.58010.957.351.777.52.623.27000.940.0600.47
HPNP06(1)67.6507.3701.9811.152.392.510.21.384.180.1100
HPNP06(2)59.8107.685.411.7310.056.042.180.350.974.50.1200
HPWP0792.63001.0701.980.173.24000.61000
HPWP0995.0200.590.6201.320.321.55000.35000
HPWP1096.7700.920.2100.810.260.84000000
HPWP1294.2901.010.7201.460.291.65000.15000
HPNP1359.012.8612.538.706.162.884.73001.27000
HPNP1462.473.3812.288.230.669.230.50.471.6200.16000
HPNP1561.873.217.4401.023.151.041.290.1900.26000
HPNP1665.182.114.528.270.526.180.421.070.11000.0400
HPNP1760.712.125.7100.8501.041.090.1900.18000
HPNP1879.4609.4201.533.0500001.360.072.360
HPNP2176.68004.711.226.192.373.2811.971.1000
HPWP2792.72000.940.542.510.21.54000.36000
HPWP2292.3500.741.660.643.50.350.55000.21000
LBNP2037.3600.71005.451.514.789.323.555.75000
LBNP2353.797.9200.50.711.4202.9916.9811.8600.3300
LBNP2431.94000.4701.5908.4629.1426.230.140.9100
LBNP2550.612.3100.6301.91.551.1231.96.650.190.200
LBWP2619.79001.4400.7010.5729.5332.253.130.4501.96
LBWP0820.14001.4801.34010.4128.6831.233.590.3702.58
LBWP1929.64002.930.593.571.333.5142.825.358.830.1900
LBWP1133.5900.213.510.712.6904.9325.3914.619.380.3700
LBWP0236.2801.052.341.185.731.860.2647.4303.570.1900
LBNP2868.0800.261.3414.70.410.3317.144.041.040.120.230
LBNP2963.30.920.32.981.4914.340.810.7412.312.030.410.2500
LBNP30(1)34.3401.414.490.984.352.12039.2210.2900.350.40
LBNP30(2)36.93004.240.513.862.74037.7410.351.410.480.440
LBNP3165.91001.60.893.114.590.4416.553.421.620.300
LBNP3269.7100.210.4602.3610.1119.764.880.17000
LBNP3375.5100.150.6412.3500.4716.163.550.13000
LBWP3435.7800.250.7801.620.471.5146.55100.340.2200
LBNP3565.91000.3801.440.170.1622.055.680.42000
LBWP3639.572.220.140.3701.60.320.6841.6110.830.070.2200
LBNP3760.1200.230.8902.7203.0117.2410.341.460.3103.66
LBWP3832.931.3800.6802.570.290.7349.319.790.480.4100
LBWP3926.25001.1100.500.8861.037.221.160.6100
LBWP4016.71001.8700.450.19070.216.691.770.6800
LBWP4118.4600.444.962.733.331.790.1944.129.767.460.4700
LBNP42(1)51.265.740.150.791.093.5302.6721.8810.470.080.3500
LBNP42(2)51.335.680.3501.165.6602.7220.1210.880000
LBWP43(1)12.41005.240.892.250.765.3559.857.2900.6400
LBWP43(2)21.7006.40.953.411.391.5144.753.2612.830.4700
LBNP4460.743.060.22.14012.690.770.4313.615.2200.2600
LBNP4561.282.660.110.840.74500.5315.9910.9600.2300
LBNP4655.2100.2501.674.7900.7725.2510.060.20.4300
LBNP4751.544.660.290.870.613.0600.6525.49.230.10.8500
LBWP4853.330.80.322.821.5413.651.03015.717.311.10.251.310
LBNP4928.79004.581.475.382.740.734.186.111.10.4600
LBWP4954.6100.32.081.26.51.270.4523.024.194.320.300
LBWP5017.98003.190.471.870.331.134414.26.340.6600
LBNP5045.02003.120.544.1600.730.616.226.340.2300
LBWP51(1)24.61003.581.195.251.191.3740.248.948.10.390.470
LBWP51(2)21.35005.131.452.510.420.7551.3408.75000
LBWP5225.741.2202.270.551.160.230.747.428.645.710.4400
LBNP5363.663.040.110.781.146.0600.5413.668.9900.2700
LBWP5422.2800.323.191.284.1500.8355.467.044.240.8800
LBNP5549.012.7101.1301.4500.8632.927.950.35000
LBWP5629.15001.2101.8500.7941.2515.452.54000
LBWP5725.42001.3102.1801.1645.117.30000
LBWP5830.3900.343.490.793.520.863.1339.357.668.990.2400
Note: HPWP01(1) means that the part 1 of weathering point 01 of high potassium, HPNP01-(1) means that the part 1 of non-weathering point 01 of high potassium, LBWP01(1) means that the part 1 of weathering point 01 of lead barium, LBNP01(1) means that the part 1 of non-weathering point 01 of lead barium, and LBSWP01(1) means that the part 1 of severe weathering point 01 of high potassium.

References

  1. Bzdok, D.; Altman, N.; Krzywinski, M. Points of Significance Statistics versus machine learning. Nat. Methods 2018, 15, 232–233. [Google Scholar] [CrossRef]
  2. Guo, Y.; Zhan, W.; Li, W. Application of Support Vector Machine Algorithm Incorporating Slime Mould Algorithm Strategy in Ancient Glass Classification. Appl. Sci. 2023, 13, 3718. [Google Scholar] [CrossRef]
  3. Li, F.; Li, Q.; Gan, F.; Zhang, B.; Cheng, H. Chemical Composition Analysis for Some Ancient Chinese Glasses by Proton Induced X-ray Emission Technique. J. Chin. Ceram. Soc. 2005, 33, 581–586. [Google Scholar]
  4. Chul, L.; Myungzoon, C.; Seungwon, K.; Kang, H.T.; Du, L.J. Classification of Korean Ancient Glass Pieces by Pattern Recognition Method. J. Korean Chem. Soc. 1992, 36, 113–124. [Google Scholar]
  5. El-Taher, A. Elemental content of feldspar from Eastern Desert, Egypt, determined by INAA and XRF. Appl. Radiat. Isot. 2010, 68, 1185–1188. [Google Scholar] [CrossRef]
  6. Won-in, K.; Thongkam, Y.; Pongkrapan, S.; Intarasiri, S.; Thongleurm, C.; Kamwanna, T.; Leelawathanasuk, T.; Dararutana, P. Raman spectroscopic study on archaeological glasses in Thailand: Ancient Thai Glass. Spectrochim. Acta Part A Mol. Biomol. Spectrosc. 2011, 83, 231–235. [Google Scholar] [CrossRef] [PubMed]
  7. Schibille, N.; Gratuze, B.; Ollivier, E.; Blondeau, E. Chronology of early Islamic glass compositions from Egypt. J. Archaeol. Sci. 2019, 104, 10–18. [Google Scholar] [CrossRef]
  8. Fiorucci, M.; Khoroshiltseva, M.; Pontil, M.; Traviglia, A.; Del Bue, A.; James, S. Machine Learning for Cultural Heritage: A Survey. Pattern Recognit. Lett. 2020, 133, 102–108. [Google Scholar] [CrossRef]
  9. Wei, J.; Chu, X.; Sun, X.Y.; Xu, K.; Deng, H.X.; Chen, J.G.; Wei, Z.M.; Lei, M. Machine learning in materials science. Infomat 2019, 1, 338–358. [Google Scholar] [CrossRef]
  10. Zhou, Y.; Hu, Y.; Tao, Y.; Sun, J.; Cui, Y.; Wang, K.; Hu, D. Study on the microstructure of the multilayer glaze of the 16th–17th century export blue-and-white porcelain excavated from Nan’ao-I Shipwreck. Ceram. Int. 2016, 42, 17456–17465. [Google Scholar] [CrossRef]
  11. Han, M.S. Characteristic Analysis of Chemical Compositions for Ancient Glasses Excavated from the Sarira Hole of Mireuksaji Stone Pagoda, Iksan. J. Conserv. Sci. 2017, 33, 215–223. [Google Scholar] [CrossRef]
  12. Lin, Y.; Liu, T.; Toumazou, M.K.; Counts, D.B.; Kakoulli, I. Chemical analyses and production technology of archaeological glass from Athienou-Malloura, Cyprus. J. Archaeol. Sci. Rep. 2019, 23, 700–713. [Google Scholar] [CrossRef]
  13. Oikonomou, A.; Triantafyllidis, P. An archaeometric study of Archaic glass from Rhodes, Greece: Technological and provenance issues. J. Archaeol. Sci. Rep. 2018, 22, 493–505. [Google Scholar] [CrossRef]
  14. The Official Website of 2022 China Undergraduate Mathematical Contest in Modeling. Available online: http://www.mcm.edu.cn/html_cn/node/5267fe3e6a512bec793d71f2b2061497.html (accessed on 14 May 2023).
  15. Kouadri, S.; Pande, C.B.; Panneerselvam, B.; Moharir, K.N.; Elbeltagi, A. Prediction of irrigation groundwater quality parameters using ANN, LSTM, and MLR models. Environ. Sci. Pollut. Res. 2022, 29, 21067–21097. [Google Scholar] [CrossRef]
  16. Gomah, M.E.; Li, G.; Khan, N.M.; Sun, C.; Xu, J.; Omar, A.A.; Mousa, B.G.; Abdelhamid, M.M.A.; Zaki, M.M. Prediction of Strength Parameters of Thermally Treated Egyptian Granodiorite Using Multivariate Statistics and Machine Learning Techniques. Mathematics 2022, 10, 4523. [Google Scholar] [CrossRef]
  17. Leonardi, B.; Ajjarapu, V. Development of multilinear regression models for online voltage stability margin estimation. IEEE Trans. Power Syst. 2010, 26, 374–383. [Google Scholar] [CrossRef]
  18. Tihonov, A.N. Solution of incorrectly formulated problems and the regularization method. Sov. Math. Dokl. 1963, 5, 1035–1038. [Google Scholar]
  19. Hui, Z.; Hastie, T. Regularization and variable selection via the elastic net. J. R. Stat. Soc. 2005, 67, 768. [Google Scholar]
  20. Ghosh, S. On the grouped selection and model complexity of the adaptive elastic net. Stat. Comput. 2011, 21, 451–462. [Google Scholar] [CrossRef]
  21. Li, J.T.; Jia, Y.M.; Zhao, Z.H. Partly adaptive elastic net and its application to microarray classification. Neural Comput. Appl. 2013, 22, 1193–1200. [Google Scholar] [CrossRef]
  22. Algamal, Z.Y.; Lee, M.H. Applying penalized binary logistic regression with correlation based elastic net for variables selection. J. Mod. Appl. Stat. Methods 2015, 14, 15. [Google Scholar] [CrossRef]
  23. Hui, Z. The Adaptive Lasso and Its Oracle Properties. J. Am. Stat. Assoc. 2006, 101, 1418–1429. [Google Scholar]
  24. Van, D.; Lien, T.G.; Verlaat, W.; Wieringen, W.V.; Wilting, S.M. Better prediction by use of co-data: Adaptive group-regularized ridge regression. Stat. Med. 2016, 35, 368–381. [Google Scholar]
  25. Zhang, F. Combination Model of Enterprise Credit Evaluation Based on XGBoost and Logistic Regression and Its Application. Master’s Thesis, Hebei University of Engineering, Handan, China, 2021. (In Chinese) [Google Scholar] [CrossRef]
  26. Jiang, S.; Dai, J. An Improved Elastic Net Estimate for Logistic Regression Models. Math. Theory Appl. 2022, 42, 108–119. (In Chinese) [Google Scholar]
  27. Anbari, M.E.; Mkhadri, A. Penalized regression combining the L 1 norm and a correlation based penalty. Sankhya B 2014, 76, 82–102. [Google Scholar] [CrossRef]
  28. Wang, Q.; Li, S.; Li, R.; Ma, M. Forecasting US shale gas monthly production using a hybrid ARIMA and metabolic nonlinear grey model. Energy 2018, 160, 378–387. [Google Scholar] [CrossRef]
  29. Singh, S.; Mohapatra, A. Repeated wavelet transform based ARIMA model for very short-term wind speed forecasting. Renew. Energy 2019, 136, 758–768. [Google Scholar]
  30. Wang, C.-C.; Chien, C.-H.; Trappey, A.J. On the application of ARIMA and LSTM to predict order demand based on short lead time and on-time delivery requirements. Processes 2021, 9, 1157. [Google Scholar] [CrossRef]
  31. Fan, D.; Sun, H.; Yao, J.; Zhang, K.; Yan, X.; Sun, Z. Well production forecasting based on ARIMA-LSTM model considering manual operations. Energy 2021, 220, 119708. [Google Scholar] [CrossRef]
  32. Li, C.; Fang, X.; Yan, Z.; Huang, Y.; Liang, M. Research on Gas Concentration Prediction Based on the ARIMA-LSTM Combination Model. Processes 2023, 11, 174. [Google Scholar] [CrossRef]
  33. Jiang, S.; Yang, C.; Guo, J.; Ding, Z. ARIMA forecasting of China’s coal consumption, price and investment by 2030. Energy Sources Part B Econ. Plan. Policy 2018, 13, 190–195. [Google Scholar] [CrossRef]
  34. Ediger, V.Ş.; Akar, S. ARIMA forecasting of primary energy demand by fuel in Turkey. Energy Policy 2007, 35, 1701–1708. [Google Scholar] [CrossRef]
  35. Dey, B.; Roy, B.; Datta, S.; Ustun, T.S. Forecasting ethanol demand in India to meet future blending targets: A comparison of ARIMA and various regression models. Energy Rep. 2023, 9, 411–418. [Google Scholar] [CrossRef]
  36. De Gooijer, J.G. Partial sums of lagged cross-products of AR residuals and a test for white noise. Test 2008, 17, 567–584. [Google Scholar] [CrossRef]
  37. Wang, Y.; Liu, Q. Comparison of Akaike information criterion (AIC) and Bayesian information criterion (BIC) in selection of stock–recruitment relationships. Fish. Res. 2006, 77, 220–225. [Google Scholar] [CrossRef]
  38. Man, K. Maximum likelihood estimation for a nearly random walk model. Commun. Stat. Theory Methods 2000, 29, 677–697. [Google Scholar] [CrossRef]
  39. Qureshi, S.A.; Hsiao, W.W.-W.; Hussain, L.; Aman, H.; Le, T.-N.; Rafique, M. Recent Development of Fluorescent Nanodiamonds for Optical Biosensing and Disease Diagnosis. Biosensors 2022, 12, 1181. [Google Scholar] [CrossRef]
  40. Zheng, C.; Deng, J.; Hong, Z.; Wang, G. Prediction model of suspension density in the dense medium separation system based on LSTM. Processes 2020, 8, 976. [Google Scholar] [CrossRef]
  41. Lyu, P.; Chen, N.; Mao, S.; Li, M. LSTM based encoder-decoder for short-term predictions of gas concentration using multi-sensor fusion. Process Saf. Environ. Prot. 2020, 137, 93–105. [Google Scholar] [CrossRef]
  42. Al-Hajj, R.; Assi, A.; Fouad, M. Short-term prediction of global solar radiation energy using weather data and machine learning ensembles: A comparative study. J. Sol. Energy Eng. 2021, 143, 051003. [Google Scholar] [CrossRef]
  43. Zhu, X.; Li, L.; Liu, J.; Li, Z.; Peng, H.; Niu, X. Image captioning with triple-attention and stack parallel LSTM. Neurocomputing 2018, 319, 55–65. [Google Scholar] [CrossRef]
  44. Hochreiter, S.; Schmidhuber, J. Long short-term memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef] [PubMed]
  45. Olah, C. Understanding lstm Networks. 2015. Available online: https://colah.github.io/posts/2015-08-Understanding-LSTMs/ (accessed on 24 March 2023).
  46. Wu, X.; Zhou, J.; Yu, H.; Liu, D.; Xie, K.; Chen, Y.; Hu, J.; Sun, H.; Xing, F. The development of a hybrid wavelet-ARIMA-LSTM model for precipitation amounts and drought analysis. Atmosphere 2021, 12, 74. [Google Scholar] [CrossRef]
  47. Xu, D.; Zhang, Q.; Ding, Y.; Zhang, D. Application of a hybrid ARIMA-LSTM model based on the SPEI for drought forecasting. Environ. Sci. Pollut. Res. 2022, 29, 4128–4144. [Google Scholar] [CrossRef] [PubMed]
  48. Wei, X.; Shahani, N.M.; Zheng, X. Predictive Modeling of the Uniaxial Compressive Strength of Rocks Using an Artificial Neural Network Approach. Mathematics 2023, 11, 1650. [Google Scholar] [CrossRef]
  49. Xu, P. Prediction of Per Capita Ecological Carrying Capacity Based on ARIMA-LSTM in Tourism Ecological Footprint Big Data. Sci. Program. 2022, 2022, 6012998. [Google Scholar] [CrossRef]
  50. Manowska, A.; Rybak, A.; Dylong, A.; Pielot, J. Forecasting of Natural Gas Consumption in Poland Based on ARIMA-LSTM Hybrid Model. Energies 2021, 14, 8597. [Google Scholar] [CrossRef]
  51. Huang, Y.; Fan, J.; Yan, Z.; Li, S.; Wang, Y. Research on early warning for gas risks at a working face based on association rule mining. Energies 2021, 14, 6889. [Google Scholar] [CrossRef]
  52. Bukhari, A.H.; Raja, M.A.Z.; Sulaiman, M.; Islam, S.; Shoaib, M.; Kumam, P. Fractional neuro-sequential ARFIMA-LSTM for financial market forecasting. IEEE Access 2020, 8, 71326–71338. [Google Scholar] [CrossRef]
  53. Salmerón, R.; García, C.; García, J. Variance inflation factor and condition number in multiple linear regression. J. Stat. Comput. Simul. 2018, 88, 2365–2384. [Google Scholar] [CrossRef]
  54. Burnham, K.P.; Anderson, D.R. Multimodel inference: Understanding AIC and BIC in model selection. Sociol. Methods Res. 2004, 33, 261–304. [Google Scholar] [CrossRef]
  55. Hassani, H.; Yeganegi, M.R. Sum of squared ACF and the Ljung–Box statistics. Phys. A Stat. Mech. Appl. 2019, 520, 81–86. [Google Scholar] [CrossRef]
  56. Lee, T. Wild bootstrap Ljung–Box test for cross correlations of multivariate time series. Econ. Lett. 2016, 147, 59–62. [Google Scholar] [CrossRef]
  57. Hollas, B. An analysis of the autocorrelation descriptor for molecules. J. Math. Chem. 2003, 33, 91–101. [Google Scholar] [CrossRef]
  58. Angel, E.; Zissimopoulos, V. Autocorrelation coefficient for the graph bipartitioning problem. Theor. Comput. Sci. 1998, 191, 229–243. [Google Scholar] [CrossRef]
  59. Goudet, J. FSTAT (version 1.2): A computer program to calculate F-statistics. J. Hered. 1995, 86, 485–486. [Google Scholar] [CrossRef]
  60. Weir, B.S.; Hill, W.G. Estimating F-statistics. Annu. Rev. Genet. 2002, 36, 721–750. [Google Scholar] [CrossRef]
  61. Weaver, B.; Wuensch, K.L. SPSS and SAS programs for comparing Pearson correlations and OLS regression coefficients. Behav. Res. Methods 2013, 45, 880–895. [Google Scholar] [CrossRef]
  62. Halunga, A.G.; Orme, C.D.; Yamagata, T. A heteroskedasticity robust Breusch–Pagan test for Contemporaneous correlation in dynamic panel data models. J. Econom. 2017, 198, 209–230. [Google Scholar] [CrossRef]
  63. Jeong, J.; Lee, K. Bootstrapped White’s test for heteroskedasticity in regression models. Econ. Lett. 1999, 63, 261–267. [Google Scholar] [CrossRef]
  64. Baum, C.; Cox, N. WHITETST: Stata Module to Perform White’s Test for Heteroskedasticity. 2002. Available online: https://econpapers.repec.org/software/bocbocode/s390601.htm (accessed on 25 March 2023).
  65. Koklu, M.; Taspinar, Y.S. Determining the Extinguishing Status of Fuel Flames With Sound Wave by Machine Learning Methods. IEEE Access 2021, 9, 86207–86216. [Google Scholar] [CrossRef]
  66. Su, C.; Wang, J. Research on composition analysis and type identification of ancient glass products based on data mining. Autom. Mach. Learn. 2022, 3, 63–71. [Google Scholar] [CrossRef]
  67. Sun, C.; Li, Z. Analysis and Identification of the Composition of Ancient Glass Objects based on Statistical Research and Machine Learning Algorithms. Highlights Sci. Eng. Technol. 2023, 39, 1412–1418. [Google Scholar] [CrossRef]
  68. Pu, Q.; Jiang, L.; Liu, Z.; Wang, X.; Liu, Z. Research on Classification of Ancient Glass Products Based on Machine Learning. In Proceedings of the 2022 International Conference on Information Technology, Communication Ecosystem and Management (ITCEM), Bangkok, Thailand, 19–21 December 2022. [Google Scholar]
  69. Bai, D. Comparative study on chemical composition of ancient glass based on machine learning and deep learning. Highlights Sci. Eng. Technol. 2022, 22, 234–240. [Google Scholar] [CrossRef]
Figure 1. Basic structure diagram of LSTM model.
Figure 1. Basic structure diagram of LSTM model.
Applsci 13 06639 g001
Figure 2. Processing flow chart.
Figure 2. Processing flow chart.
Applsci 13 06639 g002
Figure 3. (a) ACF and PACF patterns of SiO2 content in high potassium; (b) ACF and PACF patterns of SiO2 content in lead-barium glasses.
Figure 3. (a) ACF and PACF patterns of SiO2 content in high potassium; (b) ACF and PACF patterns of SiO2 content in lead-barium glasses.
Applsci 13 06639 g003
Figure 4. Correlation curve of chemical element content before and after weathering: (a) (High potassium) SiO2; (b) (lead barium) SiO2; (c) (High potassium) K2O; (d) (lead barium) P2O5; (e) (lead barium) PbO.
Figure 4. Correlation curve of chemical element content before and after weathering: (a) (High potassium) SiO2; (b) (lead barium) SiO2; (c) (High potassium) K2O; (d) (lead barium) P2O5; (e) (lead barium) PbO.
Applsci 13 06639 g004aApplsci 13 06639 g004b
Figure 5. Fitting curve of multiple linear regression equation.
Figure 5. Fitting curve of multiple linear regression equation.
Applsci 13 06639 g005
Figure 6. Confusion matrix (1: high potassium, 0: lead barium).
Figure 6. Confusion matrix (1: high potassium, 0: lead barium).
Applsci 13 06639 g006
Table 1. Return coefficients of chemical components in high potassium and lead-barium glasses β and significance p-value.
Table 1. Return coefficients of chemical components in high potassium and lead-barium glasses β and significance p-value.
Regression Coefficient of High Potassium Glass Types βSignificance p-Value of High Potassium Glass type (P > |t|)Regression Coefficient of Lead-Barium Glass Types βSignificance p-Value of Lead-Barium Glass Type (P > |t|)
β015.301P00.000β031.077P00.000
β12.718P10.099β17.788P10.005
β212.318P20.000β21.016P20.313
β35.410P30.020β38.320P30.004
β47.199P40.007β40.039P40.843
β57.051P50.008β52.678P50.102
β64.751P60.029β60.222P60.637
β71.072P70.300β71.293P70.255
β82.629P80.101β825.165P80.000
β91.792P90.181β90.831P90.362
β102.629P100.105β1013.764P100.000
β113.142P110.076β113.702P110.054
β120.451P120.502β120.161P120.688
β131.490P130.222β130.942P130.332
Table 2. Outlier of SiO2 component content.
Table 2. Outlier of SiO2 component content.
Relic NumberOutlier TypeEstimatesS.E.tSignificance
High potassium 02Additive23.8503.4786.8580.000
High potassium 13TransientMagnitude16.2603.4784.6750.001
Decay factor0.8290.2663.1140.011
Lead barium 05TransientMagnitude31.9264.5696.9880.000
Decay factor0.9640.01376.4950.000
Table 3. Parameters of the ARIMA-LSTM model for SiO2 component content in two types of glass.
Table 3. Parameters of the ARIMA-LSTM model for SiO2 component content in two types of glass.
Fitting
Statistics
Stationary R2R2RMSEMAPEMaxAPEMAEMaxAENormalized BIC
Type
High potassium glass0.9600.9601.3301.4118.8422.1776.1303.160
Lead-barium glass0.9340.9341.5232.10510.6244.00111.2504.160
Table 4. Chemical Composition of Unclassified Cultural Relics.
Table 4. Chemical Composition of Unclassified Cultural Relics.
Relic NumberA1A2A3A4A5A6A7A8
Surface WeatheringNoYesNoNoYesYesYesNo
SiO278.4537.7531.9535.4764.2993.1790.8351.12
Na2O0.000.000.000.001.20.000.000.00
K2O0.000.001.360.790.371.350.980.23
CaO6.087.637.192.891.640.641.120.89
MgO1.860.000.811.052.340.210.000.00
Al2O37.232.332.937.0712.751.525.062.12
Fe2O32.150.007.066.450.810.270.240.00
CuO2.110.000.210.960.941.731.179.01
PbO0.0034.339.5824.2812.230.000.0021.24
BaO0.000.004.698.312.160.000.0011.34
P2O51.0614.272.688.450.190.210.131.46
SrO0.030.000.520.280.210.000.000.31
SnO20.000.000.000.000.490.000.000.00
SO20.510.000.000.000.000.000.112.26
Table 5. Indicators for joint significance testing of F-statistics.
Table 5. Indicators for joint significance testing of F-statistics.
F(10,55)51.41
Prob > F0.0000
R-squared0.9633
Adj R-squared0.9558
Table 6. Regression coefficient β and significance p-value.
Table 6. Regression coefficient β and significance p-value.
TypeSiO2K2OCaOMgOAl2O3Fe2O3
Coef.−0.0140.048−0.0010.004−0.2990.035
P > |t|0.7490.0000.9650.9190.0010.105
TypePbOBaOP2O5SUWyi_cons
Coef.−0.166−0.161−0.0180.3520.0000.746
P > |t|0.0010.0210.0780.0000.0000.082
Table 7. Results of the BP test and White test.
Table 7. Results of the BP test and White test.
BP testProb > chi20.1725
White testProb > chi20.4095
Table 8. Results of the variance inflation factor analysis.
Table 8. Results of the variance inflation factor analysis.
VariableVIF
SiO227.28
PbO21.06
BaO6.36
K2O5.11
P2O52.75
CaO2.68
MgO1.83
Al2O31.82
Fe2O31.63
Mean Value7.84
Table 9. Identification results.
Table 9. Identification results.
Relic NumberA1A2A3A4A5A6A7A8
Identification typeHigh potassiumLead bariumLead bariumLead bariumLead bariumHigh potassiumHigh potassiumLead barium
Table 10. Selected parameters of each algorithm.
Table 10. Selected parameters of each algorithm.
AlgorithmsParameters
DTNode splitting evaluation criteria = gini
Feature division point selection criteria = random
Minimum samples for internal node splitting = 2
Minimum samples in leaf nodes = 1
Maximum leaf nodes = 2
Maximum depth of the tree = 15
RFNode split evaluation criterion = gini
Number of decision trees = 5
Minimum samples in leaf nodes = 1
Maximum depth of the tree = 15
Maximum leaf nodes = 2
CART-RFNode split evaluation criterion = gini
Number of decision trees = 6
Minimum samples in leaf nodes = 3
Maximum depth of the tree = 15
Maximum leaf nodes = 2
SVMkernel = ‘rbf’
C = 20
γ = 2.00
JMLAQualitative variable—weathered: 1
Qualitative variable—unweathered: 0
The number of autoregressive terms: 2
The number of sliding average terms: 0
The number of differences needed to make it a smooth series: 1
Table 11. Results of model experiments.
Table 11. Results of model experiments.
AlgorithmsTrain AccTest AccPrecisionRecallF1 Score
DT0.8620.8730.8060.7910.798
RF0.9580.9240.8720.8660.869
CART-RF0.9620.9510.9290.9410.935
SVM0.8500.9090.8690.8300.849
JMLA0.9790.9760.9750.9760.975
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Li, Z.-X.; Lu, P.-S.; Wang, G.-Y.; Li, J.-H.; Yang, Z.-H.; Ma, Y.-P.; Wang, H.-H. Analysis of the Composition of Ancient Glass and Its Identification Based on the Daen-LR, ARIMA-LSTM and MLR Combined Process. Appl. Sci. 2023, 13, 6639. https://doi.org/10.3390/app13116639

AMA Style

Li Z-X, Lu P-S, Wang G-Y, Li J-H, Yang Z-H, Ma Y-P, Wang H-H. Analysis of the Composition of Ancient Glass and Its Identification Based on the Daen-LR, ARIMA-LSTM and MLR Combined Process. Applied Sciences. 2023; 13(11):6639. https://doi.org/10.3390/app13116639

Chicago/Turabian Style

Li, Zhi-Xing, Peng-Sen Lu, Guang-Yan Wang, Jia-Hui Li, Zhen-Hao Yang, Yun-Peng Ma, and Hong-Hai Wang. 2023. "Analysis of the Composition of Ancient Glass and Its Identification Based on the Daen-LR, ARIMA-LSTM and MLR Combined Process" Applied Sciences 13, no. 11: 6639. https://doi.org/10.3390/app13116639

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop