1. Introduction
Cyclic acid anhydrides have been used as tools for bioconjugation [1], in the field of catalysts [1-6], in food chemistry [7], for cancer immunotherapy [8], in the synthesis of herbicides [9], for the preparation of membranes [10], in the green synthesis of macromolecules and nanoparticles [11, 12], in the polymer and copolymer area [13-15], among other applications. Moreover, some cyclic anhydrides present unusual physical behavior and polymorphic phase transitions in crystalline phase whose properties have been obtained by differential scanning calorimetry and powder X-ray diffraction [16-18]. Thermal and calorimetric analyses have been applied in investigations for material characterizations [19, 20] and on obtaining properties of polymorphic organic compounds based on previously established methodologies [21-23].
The standard molar enthalpies of formation are of the utmost importance since they generally are occupied to determine standard molar enthalpies of reaction and, thus, anticipate the exothermic or endothermic nature of a process [13]. Unfortunately, thermochemical properties of certain compounds cannot be obtained experimentally since they decompose during thermal analysis or, when appropriate, they show different transitions during their heating [24]. Therefore, it is necessary to fall back on the application of functional group contribution methods and machine-learning-based models, procedures useful in chemical engineering [25-26]. Among cyclic anhydride derivatives whose standard molar enthalpies of formation had not been determined until reported in this research, are 3-methylglutaric and 3,3-dimethylglutaric anhydrides, Fig. 1.
To 3-methylglutaric anhydride (MGA) the melting point, enthalpy of fusion and molar heat capacity in crystal phase, were obtained by differential scanning calorimetry (DSC). The molar enthalpies of sublimation and vaporization at 298.15K were determined by Knudsen effusion method and by thermogravimetric analysis (TGA), respectively. The standard molar enthalpy of formation in gas phase at 298.15K was calculated from the standard molar enthalpy of sublimation and formation in the crystal phase at 298.15K. This value together with the one of glutaric anhydride (GA) [27], were occupied to validate the functional group contribution methods used to estimate some thermochemical properties of 3,3-dimethylglutaric anhydride (DMGA), which showed drawbacks during purification and, therefore, in all its experimental analysis. Finally, a statistical method applying a multiple linear regression model based on machine learning was applied to estimate the enthalpy of formation in gas phase to DMGA.
2. Experimental
2.1. Materials and purity control
MGA [CAS: 4166-53-4] and DMGA [CAS: 4160-82-1] were acquired from Sigma-Aldrich, the mole-fraction purities reported by them were 0.98 and 0.99, respectively. The device occupied to calculate temperature and molar enthalpy of fusion was a Perkin Elmer DSC7 at a heating rate of 1.0 Kmin-1 and a high purity nitrogen flow of 30.0 cm3min-1 (x = 0.99997 and supplied by Infra Co.), this was calibrated for both temperature and heat flow using Indium metal [CAS: 7440-74-6] provided by NIST with mole fraction purity of 0.999999, fusion enthalpy of 28.6 Jg-1 and melting point of 429.75 K [28-30].
MGA heat capacity was determined by DSC at a heating rate of 10.0 Kmin-1in a constant flow of nitrogen at 30.0 cm3min-1 from 273.15 K to 304.15 K. The calibration was carried out with aluminum oxide [CAS: 1344-28-1, x = 0.9995] as a standard material provided by NIST, using the “two steps” method [31].
2.2. Combustion calorimetry
A Parr 1341 plain jacket adiabatic calorimeter was used for the combustion experiments, the methodology of this technique has been detailed in previous research [13]. The combustion energy of MGA was determined after calibration with benzoic acid as calorimetric standard of NIST (Standard Material Reference 39j), the certified massic energy of combustion of this standard of -(
The combustion experiments of MGA were performed using a 0.022 L capacity bomb filled with high purity oxygen (x = 0.99996, supplied by Infra. Group) at a pressure of 3.04 MPa. In each test 0.1 cm3 of deionized water was occupied, approximately 1.1 g of MGA, 13.15 g of nichrome (with burning energy of
The physical properties of some materials were considered to correct
where ε(cont) is bomb content energy (Eq. (3)), ε
i
(cont) and ε
f
(cont) are the energy equivalents of the bomb contents in the initial and final states, T
i
and T
f
are the initial and final temperatures of experiment and
2.3. Thermogravimetry using the Langmuir method
To estimate the enthalpy of vaporization, the procedure described by Price [36] applying the Langmuir equation was used:
where (dm/dt) is the rate of mass loss, A is the vaporization area, p is the vapor pressure, T is the absolute temperature, M is the molar mass, R is the gas constant, and α is the vaporization coefficient (equivalent to 1 when it comes to macromolecules or under vacuum conditions).
One way to determine the enthalpy of phase change from the vapor pressure is by combining the Langmuir and Clausius-Clapeyron equations, so that the Eq. (5) is obtained:
where C groups the involved constants and
A TA instruments SDT 600 TGA/DSC was used for this procedure, its characteristics are reported in the literature [37]. The device calibration was performed for mass, temperature, and enthalpy of vaporization. Mass calibration was carried out with a standard provided and certified by the NIST as (
2.4. Knudsen effusion method
Another way to calculate the vapor pressure from the rate of mass loss was through the Knudsen effusion equation:
where
Equation (7) obtained by combining the previous expression with the Clausius-Clapeyron equation, allowed to determine the enthalpy of sublimation
The rate of mass loss (Δm/Δt) at constant temperature was
determined from 300.66 K to 306.54 K with steps of 2.0 K and using a vacuum
pressure of 10-6 Torr, this value was utilized to calculate the
enthalpy of sublimation. The experiments were carried using aluminum cells with
a silver pierced disk whose measures were: Cell A (diameter: 1.345 mm,
A0: 1.42 mm
2.5. Estimation methods by contribution of functional groups
Since the enthalpy of formation in both gas and crystalline phases of the compound DMGA could not be determined experimentally due to the presence of three endothermic signals (at 352.76 K, 356.98 K and 397.15 K) before and after purification, three estimation methods proposed by Gani [50], Benson [51-53] and Naef [54, 55] were used to calculate the enthalpy of formation. The method validation was carried out from properties estimation of cyclic acid anhydrides derivatives, including GA and MGA. The procedure has already been described in detail by this research group [11].
Gani’s method [50] separates the functional groups of each molecule into different orders (first, second and third order). This method requires using a H f0 value (42.2361 kJ mol-1) for the estimated enthalpy of formation in gas phase. Benson’s method [51-53] allows to estimate the enthalpy obtaining a theoretical value by locating the different atoms within a molecule (except for hydrogen) and by observing which atom is bonded to the one being studied, for the case of cyclic molecules, it is necessary to introduce a ring strain correction factor (rsc). In Naef’s method [54, 55] the estimate enthalpy of formation in the crystalline phase is made from the enthalpy of combustion, and the estimated enthalpy of formation in the gas phase is calculated from sublimation enthalpy.
2.6. Theoretical results using statistical algorithms
One way to find theoretical values of enthalpy of formation in the gas phase, is applying statistical methods, these methods allow the interpretation of experimental results from another point of view, for this purpose machine learning algorithms were used [56].
2.6.1. Multiple linear regression
This regression model is characterized by the inclusion of multiple regressor variables, in other words, the dependent variable is not affected by only one independent variable. The expression representing this adjustment is presented below.
The model relates a dependent variable with n regressor variables (X n ) and finally a random variable (a 0) that collects all those factors that are not collectable and are associated to change [57].
2.6.2. Ridge regression
Ridge regression is a popular parameter estimation method used to address the collinearity problem frequently arising in multiple linear regression [58]. The expression representing this adjustment is presented below.
where λ is a parameter that controls the degree of penalty: the higher the penalty, the lower the coefficients, the more robust to collinearity. When λ is equal to zero, Ridge is equivalent to linear regression.
2.6.3. Lasso regression
LASSO (Least Absolute Shrinkage and Selection Operator) regression aims to identify the variables and corresponding regression coefficients that lead to a model that minimizes the prediction error. This is achieved by imposing a constraint on the model parameters, which ‘shrinks’ the regression coefficients towards zero, that is by forcing the sum of the absolute value of the regression coefficients to be less than a fixed value (λ) [59]. The expression representing this adjustment is presented below.
where λ is a parameter that controls the degree of penalty: the higher the penalty, the lower the coefficients, the more robust to collinearity. When λ is equal to zero, Lasso is equivalent to linear regression.
3. Results and discussion
From three experiments of MGA the molar fraction, the temperature and molar fusion enthalpy, as well as heat capacity were determined by using DSC device. The molar fraction of MGA after recrystallization from ethyl ether had an average value of (0.9996 ± 0.0001). The values obtained to temperature and molar fusion enthalpy were of (316.05 ± 0.01) K or (
Table I shows the data and average value of
Exp. 1 | Exp. 2 | Exp. 3 | Exp.4 | Exp.5 | |
---|---|---|---|---|---|
m (MGA)/g | 1.10927 | 1.16778 | 1.15478 | 1.17958 | 1.18016 |
m (stainless-steel)/g | 13.35335 | 13.35335 | 13.3598 | 13.35721 | 13.35438 |
m (nichrome)/g | 0.01303 | 0.0129 | 0.01251 | 0.01343 | 0.01283 |
T i /K | 292.5789 | 292.5857 | 292.6173 | 292.5802 | 292.5706 |
T f /K | 295.0831 | 295.2194 | 295.2194 | 295.2432 | 295.2344 |
ΔT corr/K | 0.0126 | 0.0132 | 0.0130 | 0.0134 | 0.0134 |
ΔT ad /K | 2.4916 | 2.6205 | 2.5891 | 2.6496 | 2.6504 |
ε(calor)(- ΔT ad )/kJ | -24.7418 | -26.0218 | -25.7100 | -26.3108 | -26.3187 |
ε i (cont)/kJ K-1 | 0.0185 | 0.0186 | 0.0186 | 0.0186 | 0.0186 |
ε f (cont)/kJ K-1 | 0.0200 | 0.0201 | 0.0201 | 0.0202 | 0.0202 |
ε(cont)(- ΔT ad )/kJ | -0.0415 | -0.0443 | -0.0437 | -0.0446 | -0.0446 |
ΔU ign/kJ | 0.0042 | 0.0042 | 0.0042 | 0.0042 | 0.0042 |
ΔU exp/kJ | 24.6607 | 25.9411 | 25.6316 | 26.2261 | 26.2382 |
ΔU dec (HNO3)/kJ | 0.0006 | 0.0009 | 0.0009 | 0.0018 | 0.0012 |
ΔU corr/kJ | 0.0169 | 0.018 | 0.0178 | 0.0182 | 0.0182 |
(-mΔ c u°)(nichrome)/kJ | 0.0763 | 0.0756 | 0.0733 | 0.0787 | 0.0752 |
(-mΔ c u°)(MGA)/kJ | 24.6853 | 25.9674 | 25.6575 | 26.2525 | 26.2645 |
Δ c u°(MGA)/kJ g-1 | -22.2536 | -22.2366 | -22.2185 | -22.2558 | -22.2550 |
Average value Δ c u°(MGA)/kJ g-1 -22.2439 ± 0.0163 |
𝑎 Data from five representative experiments where the specific energies of combustion at 298.15 K and 0.1 MPa are displayed, m represents the mass of MGA, stainless steel, and nichrome, the masses were corrected for buoyancy using densities of each one; T i and T f are the initial and final temperatures of the experiment, ΔT corr is a correction term, ΔU ad is the corrected temperature rise calculated by ΔT ad = T f - T i - ΔT corr; ε(calor) represents the energy equivalent of the entire system, ε i (cont) and ε f (cont) are the energy equivalents of the bomb contents in the initial and final states, respectively; ε(cont) is bomb content energy calculated by ε(cont)(-ΔT c ) = ε i (cont) (T i - 298.15 K) + ε f (cont)(298.15 K - T f - ΔT corr), ΔU ign is the ignition energy, ΔU dec (HNO3) is the experimental energy of formation of nitric acid, ΔU exp the energy of the experimental bomb process, which was calculated by ΔU exp =-[ε(calor)(-ΔT c )+ ΔU dec (HNO3)+ ΔU ign +(-mΔ c u°)(nichrome)], ΔU corr is the correction to standard state and Δ c u°(MGA) is the compound mass energy of combustion. Its uncertainty corresponds to expanded uncertainty with a confidence level of 0.95 approximately.
The rate of mass loss (dm/dt) in the range of temperature from 339.15 K to 389.15 K was determined by using TGA, and from 300.66 to 306.54 K via Knudsen effusion. From these data, it was possible to obtain the vaporization enthalpy at average temperature of T m = 364.15 K and the sublimation enthalpy at T m = 303.42 K, respectively. These vaporization and sublimation enthalpies at T m were corrected at 298.15 K, using equation series (12) and (13). These values are shown in the Table II (the uncertainties of enthalpies of sublimation and vaporization at T m correspond to weighted average at 298.15 K to twice the combined standard).
|
Interval of T/K | T m /K |
|
Method | Process |
---|---|---|---|---|---|
61.1 ± 0.3 | 339.15-389.15 | 364.15 | 65.6 ± 0.6 | TGA | Vaporization |
81.8 ± 1.7 | 300.66-306.54 | 303.42 | 81.9 ± 3.4 | Knudsen efussion | Sublimation |
Table III compare the enthalpy of sublimation at 298.15 K obtained by Knudsen effusion with the adding the vaporization enthalpy and the fusion enthalpy at 298.15 K, having a difference of 1.3 kJmol-1 (all the uncertainties correspond to twice the combined standard). This procedure was validated with pyrene (with a difference of 0.3 kJmol-1) and phenanthrene
|
|
|
|
---|---|---|---|
kJ mol-1 | kJ mol-1 TGA | kJ mol-1 Knudsen effusion | + |
kJ mol-1 | |||
15.0 ± 0.4 | 65.6 ± 0.6 | 81.9 ± 3.4 | 80.6 ± 0.7 |
To determine the enthalpy of formation in gas phase at 298.15 K of
where T represents the temperature at 298.15 K.
Table IV shows the functional groups and the frequency of appearing for MGA and DMGA, these data were applied to the three estimating methods.
Benson estimation | Gani estimation | Naef estimation | ||||
---|---|---|---|---|---|---|
MGA | ||||||
Groups | Freq. | Groups | Freq. | Atom type | Neighbours | Freq. |
C-(H)3 (C) | 1 | CH3 | 1 | C sp3 | H3C | 1 |
CH3 corr (ter) | 1 | CH2(cyclic) | 2 | C sp3 | HC3 | 1 |
C-(C)3 (H) | 1 | CH(cyclic) | 1 | C sp3 | H2C2 | 2 |
C-(CO)(C)(H) 2 | 2 | CO(cyclic) | 2 | C sp3 | CO=O | 2 |
CO-(C)(O) | 2 | O(cyclic) | 1 | O | C2 (2pi) | 1 |
O-(CO)2 | 1 | CH(cyclic)-CH3 | 1 | |||
rsc GA | 1 | |||||
DMGA | ||||||
Groups | Freq. | Groups | Freq. | Atom type | Neighbours | Freq. |
C-(H)3 (C) | 2 | CH3 | 2 | C sp3 | H3C | 2 |
CH3 corr (qua) | 2 | CH2(cyclic) | 2 | C sp3 | C4 | 1 |
C-(C)4 | 1 | C(cyclic) | 1 | C sp3 | H2C2 | 2 |
C-(CO)(C)(H)2 | 2 | CO(cyclic) | 2 | C sp2 | CO=O | 2 |
CO-(C)(O) | 2 | O(cyclic) | 1 | O | C2 (2pi) | 1 |
O-(CO)2 | 1 | C(cyclic)-CH3 | 1 | |||
rsc GA | 1 |
From Tables V to VIII the estimate values obtained by the functional group-contribution methods proposed by Benson, Gani and Naef, are shown. Tables V to VI present the values of vaporization, sublimation and combustion enthalpy obtained by Naef’s method. To apply it in the estimated enthalpy of vaporization, a correction to O-C2 (2pi) group was made, whose value reported by Naef is -7.15 kJmol-1 [54]. However, if this quantity is used during the estimation of cyclic acid anhydrides, it generates a high absolute error. For this reason, the GAV for the O-C2 (2pi) group was recalculated from the subtraction between the experimental vaporization enthalpy and the theoretical vaporization enthalpy of each anhydride (without considering the recalculated group). Finally, the recalculated average value of 6.0 kJmol-1 was applied to estimate the vaporization enthalpy of all cyclic anhydrides. It was observed that the vaporization, sublimation, and combustion estimated enthalpies had an average error of 6.4 kJmol-1, 14.3 kJmol-1 and 13.6 kJmol-1, respectively. Considering the above values to DMGA a vaporization enthalpy of (57.8 ± 6.4) kJmol-1 a sublimation enthalpy of (97.8 ± 14.3) kJmol-1 and a combustion enthalpy of -(3495.3 ± 13.6) kJmol-1, were estimated.
Compound | Experimental | Estimations | ||||
---|---|---|---|---|---|---|
Vaporization | Sublimation | Vaporization | Δ | Sublimation | Δ | |
Glutaric acid | 101.6 ± 0.8 a | 119.2 ± 1.4 a | 90.4 | 11.2 | 105.1 | 14.1 |
Maleic anhydride | 43.8 ± 3.0 b | 68.8 ± 0.8 c | 50.5 | -6.7 | 54.3 | 14.5 |
GA | 52.6 ± 3.0 b | 86.1 ± 1.6 d | 54.4 | -1.8 | 97.0 | -10.9 |
3,3-Tetramethyleneglutaric anhydride | 81.1 e | 96.4 ± 1.1 e | 70.3 | 10.8 | 81.1 | 15.3 |
Succinic anhydride | 49.9 ± 3.0 b | 80.7 ± 1.6 d | 46.6 | 3.3 | 57.8 | 22.9 |
Methylsuccinic anhydride | 47.6 ± 3.0 b | 50.0 f | 51.7 | -4.1 | 59.2 | -9.2 |
2,2-Dimethylsuccinic anhydride | 45.7 ± 3.0 b | 69.7 g | 50.0 | -4.3 | 58.7 | 11.0 |
MGA | 65.6 ± 0.6 h | 81.9 ± 3.4 h | 56.4 | 9.2 | 98.3 | -16.4 |
DMGA | 57.8 | 97.8 |
𝑎Taken on Ref. [61]. b Taken on Ref. [62]. c Taken on Ref. [63]. d Taken on Ref. [64]. e Taken on Ref. [27]. f Calculated from Refs. [62] and [65]. g Calculated from Refs. [53] and [66]. h Experimental value of this work.
Compound | Experimental | Estimations | Δ |
---|---|---|---|
Glutaric acid | 2152.0 ± 0.5𝑎 | 2169.8 | 17.8 |
Maleic anhydride | 1390.0 ± 1.4 b | 1372.1 | -17.9 |
GA | 2206.5 ± 0.6 c | 2196.3 | -10.2 |
3,3-Tetramethyleneglutaric anhydride | 4588.7 ± 2.1 d | 4557.5 | -30.6 |
Succinic anhydride | 1537.1 ± 0.4 c | 1543.8 | 6.7 |
Methylsuccinic anhydride | 2208.0 e | 2194.8 | -9.2 |
2,2-Dimethylsuccinic anhydride | 2855.0 e | 2842.8 | -12.2 |
MGA | 2851.4 ± 1.9 f | 2847.3 | -4.1 |
DMGA | 3495.3 |
𝑎Taken on Ref. [61]. b Taken on Ref. [63]. c Taken on Ref. [64]. d Taken on Ref. [27]. e Taken on Ref. [66]. f Experimental value of this work.
Compound | Experimental | Benson | Δ | Naef | Δ |
---|---|---|---|---|---|
Glutaric acid | 959.9𝑎 | 962.3 | 2.4 | 941.1 | -18.5 |
Maleic anhydride | 469.9 ± 1.5 b | 469.9 | 0.0 | 487.8 | 17.9 |
GA | 618.5 𝑎 | 618.4 | -0.1 | 628.8 | 10.3 |
3,3-Tetramethyleneglutaric anhydride | 667.9 ± 2.4 c | 678.6 | 10.7 | 699.1 | 31.2 |
Succinic anhydride | 608.6 ± 0.7 d | 601.6 | -7 | 601.9 | -6.7 |
Methylsuccinic anhydride | 620.0 ± 1.2 e | 622.8 | 2.8 | 630.3 | 10.3 |
2,2-Dimethylsuccinic anhydride | 651.4 f | 655.6 | 4.2 | 661.6 | 10.2 |
MGA | 653.0 ± 2.1 g | 650.4 | -2.6 | 657.1 | 4.1 |
DMGA | 688.7 | 688.5 |
𝑎 Taken on Ref. [68]. b Taken on Ref. [63]. c Taken on Ref. [27]. d Taken on Ref. [64]. e Taken on Ref. [65]. f Taken on Ref. [66]. g Experimental value of this work.
Compound | Experimental | Benson | Δ | Gani | Δ | Naef | Δ |
---|---|---|---|---|---|---|---|
Glutaric acid | 840.2 ± 4.6𝑎 | 842.5 | 2.3 | 843.5 | 3.3 | 835.9 | -4.3 |
Maleic anhydride | 401.0 ± 1.7 b | 401.0 | 0.0 | 416.2 | 15.2 | 433.5 | 32.5 |
GA | 532.4 ± 1.8 c | 532.1 | -0.3 | 541.9 | 9.5 | 531.8 | -0.6 |
3,3-Tetramethyleneglutaric anhydride | 571.5 ± 2.6𝑎 | 573.2 | 1.7 | 563.2 | -8.3 | 618.0 | 46.5 |
Succinic anhydride | 527.9 d | 528.0 | 0.1 | 542.2 | 14.3 | 552.3 | 19.6 |
Methylsuccinic anhydride | 570.0 e | 557.4 | -12.6 | 576.2 | 6.2 | 571.1 | 1.0 |
2,2-Dimethylsuccinic anhydride | 581.7 f | 590.1 | 8.4 | 579.5 | -2.2 | 602.9 | 21.2 |
MGA | 571.1 ± 4.0 g | 562.5 | -8.6 | 575.9 | 4.8 | 558.8 | -12.3 |
DMGA | 602.3 | 579.1 | 590.7 |
𝑎Taken on Ref. [27]. b Taken on Ref. [63]. c Taken on Ref. [64]. d Taken on Ref. [69]. e Calculated form Refs. [62] and [65]. f Taken on Ref. [53]. g Experimental value of this work.
For the estimation of the enthalpies of formation in both gas and crystalline phases by the Benson’s method [51-53], the rsc of maleic anhydride had to be calculated because it is not reported, the value obtained was 16.7 kJmol-1 for the gas phase and 19.7 kJmol-1 for the crystalline phase. A new GAV value was obtained [53] for the rsc cyclopentane because the reference [51] does not consider whether cyclopentane has radicals or not, likewise the reference [52] does not contain a value of rsc cyclopentane for the crystalline phase. The values used were 19.55 kJmol-1 for the gas phase and 34.0 kJmol-1 for the crystalline phase. Subsequently these values were used for the estimation of the compound 3,3-Tetramethyleneglutaric anhydride.
Table VII shows the results of enthalpy of formation in crystalline phase to DMGA calculated by Benson and Naef methods. Benson’s method obtained a value of -(688.7 ± 3.7) kJmol-1 while Naef method a value of -(688.5 ± 13.6) kJmol-1.
Finally, in Table VIII are exposed the estimated enthalpies of formation in gas phase to DMGA by Benson, Gani, and Naef methods, obtaining values of
Another way to estimate the enthalpy of formation in gas phase of DMGA was using the enthalpic contribution of the methyl group of -35.8 kJmol-1 obtained from the difference between 4-methylpiperidine [70] and piperidine [71], and of
To apply an algorithm based on machine learning, it was first necessary to create a database representative of the type of molecules studied in this work. The database created contains a total of 70 organic compounds divided between carboxylic acids and acid anhydrides, since an anhydride is a derivative of an acid, it can be considered that their molecular interactions are similar, each existing organic compound within the database contains its respective enthalpy of formation in the gas phase.
A multiple Linear Regression, Ridge Regression and Lasso Regression models were applied in order to predict the enthalpy of formation in the gas phase based on the number of carbon, hydrogen and oxygen atoms, the originally created set was divided into two (training and test) with a value of 0.7 and 0.3 respectively and a seed was used to ensure that the results can be repeatable, likewise the evaluation metrics used to determine the effectiveness of the model were the coefficient of determination (R2) whose methodology has already been explained previously [73], the root mean square error (RMSE) [74] and the mean absolute error (MAE) [75], likewise a cross validation (K-fold) [76] was applied to the training set in order to know its accuracy. The results, as well as the adjustment equation are presented in Table IX.
Multiple Linear Regression | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|
Training set | Test set | Training set | Test set | Training set | Test set | |||||
R2 | 0.9882 | 0.9836 | MAE | 13.2375 | 16.2029 | RMSE | 19.4279 | 21.6080 | Cross val (std) b | 0.9614 |
y = -25.9068 - 32.9022x
1 + 28.2672x
2 + 193.1793 | ||||||||||
Ridge Regression | ||||||||||
Training set | Test set | Training set | Test set | Training set | Test set | |||||
R2 | 0.9846 | 0.9829 | MAE | 14.4831 | 15.4201 | RMSE | 22.1680 | 22.0452 | Cross val (std) b | 0.9619 |
y = -22.0314 - 33.7033x
1 + 28.0985x
2 + 177.3040 | ||||||||||
Lasso Regression | ||||||||||
Training set | Test set | Training set | Test set | Training set | Test set | |||||
R2 | 0.9879 | 0.9833 | MAE | 12.9703 | 16.1354 | RMSE | 19.6778 | 21.7944 | Cross val (std) b | 0.9528 |
y = -15.9902 - 32.4211x
1 + 27.9305x
2 + 189.1128 |
𝑎x 1 represents the carbon atoms, x 2 the hydrogen atoms and x 3 the oxygen atoms present in the molecules coming from anhydrides, the results are in kJ-mol-1. b Represents the mean of standard deviation with cv = 10.
In the case of the hyperparameter Alpha present in both the Ridge Regression and the Lasso Regression during Python programming, a for loop was performed in order to evaluate the optimal value for this parameter, it was found that for both regressions the optimal value for alpha is in the interval of (0,2] because after 2 the value of the coefficient of determination begins to decay and 0 cannot be taken because it would become a multiple linear regression, for this work the value used in both cases was Alpha = 2.
Figure 4 shows the fits obtained using the different types of regression for both the training and test sets, as it can be seen the multiple linear regression and the lasso regression were the ones that had the best fit to the data set, the r 2 value refers to the coefficient of determination between the true data set and the predicted data set.
Based on the metrics shown in Table IX, the types of regression that present a better adjustment in general are Lasso Regression and Multiple Linear Regression, therefore, both adjustment equations will be used to predict the enthalpy of formation of DMGA. Since the equation presented in Table IX do not distinguish between isomerism, it is necessary to introduce a correction factor, this factor was taken from the literature of the Benson type estimation method [53], the value considered was the quaternary -CH3 correction in order to ensure that both methyl groups are attached to the same carbon atom. Using the equations presented in Table IX, the value of the enthalpy of formation in the gas phase of MGA gives a result of -Δ f H°(MGA, g)= 580.3 kJmol-1 using the Lasso Regression, -Δ f H°(MGA, g)= 576.5 kJmol-1 using the Ridge Regression and -Δ f H°(MGA, g)= 582.4 kJmol-1 using the Multiple Linear Regression, comparing the values with the experimental value, shows a difference of 9.2 kJmol-1, 5.4 kJmol-1 and 11.3 kJmol-1 using the Lasso Regression, Ridge Regression and Multiple Linear Regression respectively, and DMGA gives as a result of -Δ f H°(DMGA, g)= 603.7 kJmol-1 using the Lasso Regression, -Δ f H°(DMGA, g)= 599.0 kJmol-1 using the Ridge Regression and -Δ f H°(DMGA, g)= 606.0 kJmol-1 using the Multiple Linear Regression, the value of the correction factor is -4.56 kJmol-1, this factor was applied twice due to the two methyl groups attached to the quaternary carbon, the final values result in -Δ f H°(DMGA, g)= 594.6 kJmol-1, -Δ f H°(DMGA, g)= 589.9 kJmol-1 and -Δ f H°(DMGA, g)= 596.9 kJmol-1 with the Lasso Regression, Ridge Regression and the Multiple Linear Regression, respectively. Comparing the values with the one obtained in Fig. 3, shows a difference of 0.8 kJmol-1, -3.9 kJmol-1 and 3.1 kJmol-1 using the Lasso Regression, Ridge Regression and Multiple Linear Regression respectively.
4. Conclusions
Some thermochemical properties of MGA were determined by applying differential scanning calorimetry, thermogravimetric analysis, and Knudsen’s effusion method. Since DMGA could not be studied experimentally due to the existence of crystal transitions possibly related to the different molecular shape as conformations, conformers or polymorphs, the Benson, Gani, and Naef functional group-contribution methods were applied to calculate the enthalpies of phase change and formation in both the gas phase and the crystalline phase. Likewise, the standard molar enthalpy of formation was estimated in gas phase of DMGA from two ways 1) from enthalpic contribution of methyl group on GA and on MGA, and 2) from different linear regression algorithms applying Machine Learning. As it can be seen, the estimated enthalpy of formation in gas phase obtained by all methods, the differences between the values gathered fall within uncertainty. With regards to the estimated enthalpy of formation in crystalline phase, the values were very close by both the Benson and Naef method.