2026 Volume 39 Issue 2
Article Contents

Yueqian Sun, Shuwen Zhang, John Z. H. Zhang, Tong Zhu. Machine Learning Driven ReaxFF Optimization for Combustion[J]. Chinese Journal of Chemical Physics, 2026, 39(2): 233-244. doi: 10.1063/1674-0068/cjcp2503023
Citation: Yueqian Sun, Shuwen Zhang, John Z. H. Zhang, Tong Zhu. Machine Learning Driven ReaxFF Optimization for Combustion[J]. Chinese Journal of Chemical Physics, 2026, 39(2): 233-244. doi: 10.1063/1674-0068/cjcp2503023

Machine Learning Driven ReaxFF Optimization for Combustion

通讯作者: 陈斌, bchen63@163.com
  • 1. 

    沈阳化工大学材料科学与工程学院 沈阳 110142

  1. 本站搜索
  2. 百度学术搜索
  3. 万方数据库搜索
  4. CNKI搜索

Figures(5)  /  Tables(3)

Article Metrics

Article views(62) PDF downloads(1) Cited by(0)

Access History

Machine Learning Driven ReaxFF Optimization for Combustion

Abstract: Accurate simulation of combustion reactions is crucial for understanding combustion mechanisms. Reactive force fields (ReaxFF) offer a computationally efficient approach to simulating complex combustion processes, but their accuracy depends critically on parameterization. This work presents a comprehensive optimization of ReaxFF parameters for gas-phase combustion reactions using a machine learning driven approach. We constructed a dataset of 33 reactions, encompassing key reaction types in combustion. High-level double hybrid DFT calculations served as a benchmark to evaluate the performance of various density functionals, the semi-empirical PM7 method, and existing ReaxFF parameter sets. We then employed the JAX-ReaxFF framework to optimize the CHO2008 parameters, leveraging its efficient local gradient-based optimization algorithms. The optimized ReaxFF significantly improved the accuracy of potential energy and atomic force predictions, with the mean absolute error (MAE) for energy approaching that of PM7. Analysis of reaction pathways and potential energy surfaces further demonstrated the enhanced performance of the optimized force field, particularly near transition states. This optimized ReaxFF provides a good tool for simulating a wide range of combustion systems, and the presented methodology offers a general strategy for developing system-specific ReaxFF parameters.

    • Combustion reactions play a crucial role in energy production, aerospace propulsion, and various industrial chemical processes. A comprehensive understanding of combustion mechanisms is essential for improving combustion efficiency, reducing pollutant emissions, and developing advanced fuels. However, due to the inherent complexity involving multiple scales and numerous interacting species, experimental methods alone are insufficient to fully elucidate the detailed microscopic mechanisms of combustion. Therefore, theoretical chemistry methods, including molecular simulations and quantum chemical calculations, have emerged as indispensable tools for exploring reaction pathways and energetics at the atomic and molecular levels, providing essential insights that complement and extend experimental observations [16]. For example, Han et al. reported the initial chemical mechanism of soot nanoparticle formation under fuel enrichment conditions using reactive force field (ReaxFF) molecular dynamics (MD) simulations [7]. Zheng et al. simulated the initial chemical reaction of bituminous coal pyrolysis by ReaxFF MD. The simulation results showed that some radicals, such as H· and HO· radicals, play a key role in pyrolysis [8]. Selby et al. clarified the detailed mechanism of the reaction between the phenyl radical and propargyl radical to form indene by using the time-of-flight mass spectrometry method and theoretical calculations [9]. Mao et al. studied the mechanism of C5H5· radical conversion to indane, providing a detailed description of the reaction pathways and calculating the rate constants. The theoretical rate constant showed good agreement with experimental data [10]. Mebel et al. studied the folding and bridging mechanisms of polycyclic aromatic hydrocarbons (PAHs) and the factors accelerating their aggregation based on quantum chemical calculations and MD simulations [11, 12]. ​Wang et al. used quantum chemical calculations and kinetic simulations to explore the reaction mechanism of i-C4H3· radical with acetylene, identifying several new reaction pathways [13]. These results enrich the understanding of radical reactions in combustion and also provide a basis for theoretical studies and experimental designs.

      Theoretical investigations of combustion reactions typically focus on exploring potential energy surfaces (PESs) and energy barriers. However, due to the abundant formation of short-lived, highly energetic intermediates and radical species during combustion, accurate theoretical simulations of these reactions become challenging and computationally expensive. Fortunately, the ReaxFF approach [1418] has significantly advanced the computational study of high-energy chemical reactions, enabling simulations of combustion mechanisms within feasible computational resources. An increasing number of studies have demonstrated the effectiveness and potential applications of ReaxFF in elucidating combustion reaction mechanisms [1923].

      Although ReaxFF has been successfully applied to a wide variety of chemical reactions, its accuracy remains notably lower compared to quantum chemical methods. To improve the accuracy of ReaxFF-based simulations for combustion reactions, the effective optimization of the force field parameters is crucial. Van Duin et al. proposed the successive one-parameter perturbation iteration (SOPPI) method, which utilizes a single-parameter search strategy to automatically optimize ReaxFF parameters, thereby generating high-quality PESs [24]. However, the SOPPI method heavily relies on the initial choice of parameters and incurs significant computational costs. To mitigate these limitations, various global optimization approaches, such as genetic algorithms (GAs) [2527] and evolutionary algorithms (EAs) [28], have been developed. While these methods partially reduce the sensitivity to initial parameters, they still suffer from substantial computational expenses. Guo et al. introduced the intelligent ReaxFF (I-ReaxFF) method, which employs local gradient-based optimization to refine the ReaxFF parameters efficiently [29]. By optimizing the PES specifically for water molecules, their approach significantly reduced the computational cost. However, the I-ReaxFF method was limited to training with absolute energy values, excluding charges and atomic forces. More recently, the JAX-ReaxFF method [30] was proposed, leveraging powerful local optimization algorithms such as L-BFGS-B [31] and SLSQP [32] to achieve higher-quality results with lower computational expense. In addition, JAX-ReaxFF incorporates a broader set of training labels—including relative energies from bond, angle, and torsion scans, as well as partial charges and atomic forces—thereby enriching the information provided by the training dataset and enhancing both accuracy and efficiency in parameter optimization. Furthermore, Wang et al. developed ReaxFFCHO-S22 using high-accuracy quantum mechanical data and a global optimization algorithm to simulate combustion reactions involving C, H, and O elements. This model enhances the accuracy of describing pyrolysis and combustion of hydrocarbon fuel [33]. Wang et al. improved the ReaxFF by optimizing the bond order formula and parameters to make it better fit the potential energy curve calculated by quantum mechanics [34].

      Although ReaxFF was originally developed specifically for combustion reactions, systematic benchmark studies of its performance in predicting gas-phase combustion mechanisms remain relatively limited. Such systematic investigations are crucial for further expanding the applicability and reliability of ReaxFF in combustion simulations. In this work, we constructed a comprehensive dataset of gas-phase combustion reactions, covering key reaction mechanisms commonly encountered in hydrocarbon combustion processes. High-level XYGJ-OS/cc-pVTZ computational results were employed as the reference to benchmark the accuracy of several existing ReaxFF parameter sets [35]. XYGJ-OS offers accuracy comparable to that of coupled-cluster methods but with significantly lower computational cost. While XYG3 and other doubly hybrid density functionals formally scale as the fifth power of system size, XYGJ-OS formally scales as the fourth power, compared to the seventh-power scaling of CCSD(T) [35]. This combination of accuracy and reduced computational demands makes XYGJ-OS a suitable benchmark for the present study. Additionally, we evaluated the performance of widely-used density functionals—including ωB97M-V [36], M06-2X [37], B3LYP-D3 [38], as well as the semi-empirical method PM7 [39]—with XYGJ-OS as the benchmark. The comparative analysis presented here aims to provide valuable insights for optimizing and refining ReaxFF parameters, ultimately enhancing its predictive capabilities in combustion applications.

      This article is structured as follows: Section II introduces the methods of ReaxFF and JAX-ReaxFF, along with dataset preparation. Section III presents the results of the benchmark and optimization of force fields. Finally, Section IV gives the summary of the force field study for gas-phase combustion reactions.

    II.   METHOD
    • In the ReaxFF method, the PES is described by a large number of given parameters (102–103), which are fitted from the results of quantum chemical calculations [14, 40]. The ReaxFF potential function is obtained by the sum of 14 parameterized functions, as shown in Eq.(1). It includes bond energy (Ebond), over/undercoordination (Eover, Eunder), valence angle energy (Eval), torsion angle energy (Etorsion), C2 correction term (EC2) and additional terms used in the dynamic bonding model, such as lone pair (Elp) [41], three-body energy (Etriple) and penalty (Epen), three-body conjugation (Ecoa) and four-body conjugation (Econj), and energy term EH-bond describing hydrogen bonding in the system. The van der Waals energy (EvdWaals) based on the Morse potential, and the electrostatic energy term (ECoulomb) based on dynamic charges calculated from charge models with shielded and range-limited.

      ​An important difference between ReaxFF and other force fields is the introduction of dynamic partial charges and bond order (BO), and the latter is used to determine the bonding and nonbonding within the system. The ReaxFF BO serves as an indicator for determining atomic types and interatomic distances, and it can be modified by considering the information of all atoms surrounding each atom. The ReaxFF BO’s original definition is as follows:

      In the above equation, ro represents the equilibrium bond length between the corresponding atoms. $ {r}_{ij} $, $ {r}_{ij}^{\mathrm{\pi }} $, $ {r}_{ij}^{\mathrm{\pi }\mathrm{\pi }} $ denote the ideal bond lengths for the σ-σ, σ-π, and π-π bonds, respectively. The parameters $ p_{\mathrm{bo},1} $ to $ p_{\mathrm{bo},6} $ are employed to fit these three types of bonds.

      The ReaxFF BO was further optimized by incorporating additional functions and penalty terms. After the simulation, ReaxFF determines whether two atoms are bonded based on whether the BO is greater than 0.3. It then outputs the molecular formula and structural information of the corresponding species.

    • JAX is a framework based on functional programming [30]. Users first define functions, which can then be transformed or reformulated to generate new functions. In JAX-ReaxFF, the local gradient optimization algorithms performed using the JAX library serve as the key step in the main loop. The core algorithm flowchart of JAX-ReaxFF is illustrated in FIG. 1.

      To reduce computational resource consumption during geometry optimization, a neighbor list was constructed before the initial guess of the input geometry. This allows for the exclusion of irrelevant elements in the loop through the use of masks. Then, the input data are clustered [42] based on the length of the interaction list and ensured to be correctly aligned in memory to enable efficient single-instruction, multiple-data (SIMD) parallel processing. The optimization algorithm starts from the initial geometry (Geoinit) and initial force field (FFinit), and then optimizes the force field through an iterative process.

      The initial atomic forces from the FFinit are only used for the first iteration of optimization and then directly enter the main loop. Atomic forces first undergo a gradient-based parameter optimization, followed by geometry optimization. To implement gradient optimization based on JAX, the following equation is used to track the error function and calculate the parameter gradient.

      Here, m represents a model with a set of force field parameters, xi denotes the predicted results of model m, yi represents the true values from quantum chemical calculations, and σi represents the weights assigned to each training item.

      The clustered data are used in the main loop for geometry optimization. The optimization process outputs the optimized geometry and corresponding forces and calculates the error relative to the true values. The error minimization step uses the geometry from the last iteration (Geomin) and the currently optimized force field (FFcur). After applying the selected gradient-based optimization algorithm, FFcur is updated to the new trained force field. At the end of each iteration, it calculates the true error (Ecur) of the FFcur on the training dataset. If Ecur is lower than the previously recorded minimum error (Ebest), then FFcur is saved as the optimal force field (FFbest). To avoid the optimization process falling into local minima, the optimizer introduces 0.01% noise into FFbest, which helps to escape local minima. Although adding noise may improve the optimization results, it generally increases the training time cost.

      The optimized geometry from the previous step is fed into the gradient parameter optimizer to compute the new atomic forces. Subsequently, the updated geometry and atomic forces are used again for geometry optimization. Each iteration compares the current iteration’s loss with the minimum loss recorded in the historical iterations and then saves the minimum loss value. Compared with existing global optimization schemes, the JAX-ReaxFF framework can reduce the overall training time by three orders of magnitude.

      Based on the JAX-ReaxFF framework, we optimized the ReaxFF parameters. The training set included energy-based, force-based, charge-based, and geometry-based (bond, angle, torsion, etc.) items to describe the behavior of the combustion reactions.

    • We reviewed the primary reaction mechanisms previously proposed for studying gas-phase combustion processes and constructed a comprehensive dataset comprising 33 representative reactions. This dataset covers essential reaction types involved in hydrocarbon fuel combustion and includes reaction systems containing key elements typically encountered in combustion simulations, with molecular sizes ranging from small to moderately complex structures (up to 38 atoms). Specifically, reactions rxn01 to rxn11 describe the detailed mechanism of methane combustion [43], covering essential reaction pathways involved in methane oxidation. Reactions rxn12 to rxn21 represent the combustion mechanism of larger hydrocarbons, focusing on methane-to-hydrocarbon transformations and fundamental reactions associated with hydrocarbon decomposition and oxidation processes [44]. Reactions rxn22 to rxn33 illustrate the soot formation mechanism, covering the critical pathways from small carbon species (C2) up to PAHs (≤A2). This category particularly highlights the hydrogen abstraction and carbon addition (HACA) reaction steps [45]. The thermodynamic feasibility of these reaction mechanisms has been validated in previous studies. Detailed information on each reaction—including reactants, products, number of atoms, spin multiplicities, the number of intrinsic reaction coordinate (IRC) structures, and structures generated through normal mode sampling—is summarized in Table I.

      To construct a comprehensive combustion reaction dataset, we followed the approach proposed by Head-Gordon et al. [46, 47] for dataset generation, and performed IRC calculations and normal mode enhanced sampling for each reaction. For each reaction, IRC calculations were first performed to obtain representative structures along the reaction pathway. Subsequently, normal mode sampling was conducted on each IRC structure to generate additional data points, thereby enabling a more comprehensive coverage of each reaction’s PES. We adopted two normal mode sampling schemes based on the complexity of the reaction system. For reactions involving more than 20 atoms, the three lowest-frequency vibrational modes of each IRC structure were sampled. For reactions involving less than 20 atoms, all vibrational modes were sampled for each IRC structure. During the sampling process, displacements of ±0.01, ±0.025, ±0.05, ±0.075, ±0.1, ±0.125, and ±0.15 Å in the direction of each vibrational mode were applied and the resulting structures were preserved. The IRC and the normal mode calculations were performed at the B3LYP/6-31G* [38, 48] level. Single-point energies and atomic forces for all structures obtained from both IRC calculations and normal mode sampling were re-computed at the XYGJ-OS/cc-pVTZ [35, 49] level of theory, with MP2/aug-cc-pVQZ [50] used as an auxiliary basis set for higher precision calculations.

      In this work, we also check the performance of ωB97M-V [36], M06-2X [37], B3LYP-D3 [38] functionals, and the semi-empirical method PM7 [39]. The cc-pVTZ [49] basis set was used for all DFT calculations. Additionally, we selected three commonly used ReaxFF (CHO2008, CHO2016, CHO-lg), which are specifically designed to describe systems containing C, H, and O elements. Among these, CHO2008 [40] and CHO2016 [51] are the general force fields, and CHO-lg is the force field optimized by Liu and colleagues [52] for specific reaction systems (RDX/HMX, etc.). We evaluated the accuracy of these force fields on our combustion reaction dataset. The best force field was then further optimized by using JAX-ReaxFF, and the optimized ReaxFF was evaluated. All parameters involved in the JAX-ReaxFF optimization are listed in detail in Table II. Before optimization, we excluded those parameters with physical significance such as monomer, dimer, and trimer terms, and removed parameters with fixed values such as relative atomic masses and valence atomic numbers. Among the computational methods mentioned above, DFT calculations of XYGJ-OS, ωB97M-V, M06-2X, and B3LYP-D3 functionals were performed using QChem 5.0 [53]. The PM7 calculation and the calculation of the IRC and PES scan was performed using Gaussian 16 [54].

      Although the local gradient-based optimization implemented with JAX significantly reduces computational costs, substantial memory usage remains an issue when processing large datasets. Considering the limited number of parameters in the ReaxFF, we observed that training with only approximately 10% of the full dataset was sufficient to achieve robust parameter optimization. To balance computational efficiency and accuracy, we applied a proportional random reduction based on the number of structures in each reaction. This approach ensures comprehensive coverage of all reaction pathways and their corresponding PES while significantly reducing computational costs.

    III.   RESULTS
    • After performing enhanced sampling via normal mode analyses of all vibrational frequencies for each IRC structure, we selected a total of 14002, 15288, and 13561 structures for methane combustion (rxn01– rxn11), n-dodecane pyrolysis (rxn12–rxn21), and soot growth (rxn22–rxn33), respectively. We then compared the accuracy of commonly used density functionals, the semi-empirical method PM7, and previously reported ReaxFF parameter sets with the XYGJ-OS reference results in describing potential energies and atomic forces. Considering that different functionals exhibit varying convergence behaviors, there may be slight differences in the final amount of data points. The performance of the re-optimized CHO2008 force field parameters by using the JAX-ReaxFF method was also reported (Table III). ​In Table III, the minimum among the error values of the different methods is highlighted in bold.

      By comparing the results of the DFT calculation, it can be seen that mean absolute errors (MAE) of the energy and force of the three functionals are less than 3 kcal/mol and 3 kcal·mol−1Å−1. In n-dodecane pyrolysis and soot particle growth, ωB97M-V exhibits the highest accuracy in energy calculations, followed by M06-2X, while B3LYP-D3 ranks third. However, in methane combustion, ωB97M-V and M06-2X show lower accuracy than B3LYP-D3 in both energy and force calculations. The energy errors of ωB97M-V and M06-2X for each reaction are presented in FIG. 2, where rxn03, rxn07, and rxn08 exhibit notably larger errors. The MAE of ωB97M-V and M06-2X for rxn03 are 13 kcal/mol and 10 kcal/mol, respectively, while for rxn07 and rxn08, the errors exceed 5 kcal/mol. The rxn03 is the decomposition of CH3OOH, and the rxn07 and rxn08 are the reaction of formic acid with oxygen and the decomposition of formate radical, respectively. ωB97M-V and M06-2X may not provide a highly precise description of these reactions.

      By evaluating the performance of the CHO2008, CHO2016, and CHO-lg force fields in comparison with other methods, we found that the accuracy of ReaxFF is lower than that of DFT and semi-empirical methods. In the process of methane combustion and n-dodecane pyrolysis, the energy MAE of these three force fields is about twice that of the PM7 method, while for soot particle growth, it is far from that of the PM7 method. Notably, optimizing the ReaxFF parameters on the combustion reaction dataset significantly improves the accuracy of the force field. In soot particle growth and methane combustion reactions, the energy MAE is reduced from 19.389 kcal/mol and 9.315 kcal/mol to 3.218 kcal/mol and 6.138 kcal/mol, respectively, achieving an energy accuracy comparable to the semi-empirical method PM7. These results indicate that performing local gradient optimization on the target system can effectively enhance the accuracy of ReaxFF.

    • To evaluate the performance of the force fields along the reaction pathway, we selected a representative reaction from each reaction system for reaction pathway calculations. The IRC structures were obtained at the B3LYP/6-31G$ ^* $ level, and the corresponding energy calculations were performed using XYGJ-OS and the four aforementioned force fields. FIG. 3 presents the potential energy profiles along the IRC path as calculated by XYGJ-OS and ReaxFF. Rxn05 represents a hydrogen transfer reaction. In the rxn05, the potential energy curves calculated using CHO2008, CHO2016, and CHO-lg exhibit significant deviations from the XYGJ-OS reference values, whereas the energy profile obtained from the optimized ReaxFF is in agreement with the XYGJ-OS results. For the ring-closure reaction rxn24, CHO-lg overestimated the reaction energy barrier by 20 kcal/mol and CHO2008 ReaxFF overestimated it by 10 kcal/mol, compared to XYGJ-OS. In contrast, the optimized ReaxFF provides a more accurate description of the potential energy profile.

      As shown in FIG. 3(b), the potential energy curve of rxn18 (the n-pentyl cracking reaction) calculated using CHO2008, CHO2016, and CHO-lg exhibits energy discontinuities near the transition state, which is unreasonable. To further investigate the origin of these discontinuities, we decomposed the energy contributions of the force field components with Eq.(1). FIG. 4 presents the main energy terms in the ReaxFF potential function along the IRC path of rxn18. The results indicate that the discontinuities primarily arise from errors in the Eangle and Etorsion terms, suggesting that the current ReaxFF potential function does not accurately describe the energies of structures near the transition state. Although the optimized ReaxFF still exhibits some discontinuities in certain energy terms, its energy fluctuations are significantly reduced compared to other ReaxFF, resulting in improved performance in the potential energy profile. Therefore, optimizing force field parameters for the reaction system allows for a more precise description of the PES near the transition state, enhancing the continuity and accuracy of the potential energy curve.

    • Additionally, we mapped the PES near the IRC path. A flexible scan was performed for rxn01 at the B3LYP/6-31G$ ^* $ level to obtain structures near the reaction pathway, followed by energy calculations using XYGJ-OS and the four previously mentioned ReaxFF (FIG. 5). As shown in FIG. 5(e), the reaction pathway obtained using the optimized ReaxFF closely aligns with the minimum energy pathway computed at the XYGJ-OS/cc-pVTZ level, demonstrating the effectiveness of the ReaxFF optimization. Furthermore, similar conclusions were obtained for rxn05, as illustrated in FIG. S1 (Supplementary materials, SM).

      The results demonstrate significant potential for improvement in the capability of ReaxFF to describe reaction PES. With further optimization, our force field is expected to bring reaction MD simulations more closely to actual chemical processes. We also recognize that there are some issues with uneven sampling in the creation of our dataset. While points sampled along the IRC path are often uniformly distributed in the reaction coordinate(s), the transition state typically occupies a geometrically narrower region compared to the reactant and product basins. Consequently, fewer sampled structures fall within the immediate geometric vicinity of the transition state compared to those within the broader stable state regions. Eliminating redundancy in the sampled data or implementing more effective sampling strategies for the transition state region could further improve the accuracy of the PES near transition states.

    IV.   CONCLUSION
    • This study systematically evaluated and optimized the ReaxFF for simulating gas-phase combustion reactions. We also benchmarked commonly used density functionals (ωB97M-V, M06-2X, B3LYP-D3) and the semi-empirical PM7 method against high-accuracy XYGJ-OS calculations on a comprehensive dataset of 33 reactions. Our results confirmed the accuracy hierarchy as follows: ωB97M-V > M06-2X > B3LYP-D3 >> PM7. Evaluation of existing ReaxFF parameter sets (CHO2008, CHO2016, CHO-lg) revealed limitations in their ability to accurately describe reaction pathways and potential energy surfaces, particularly for the soot growth system.

      To address these limitations, we leveraged the machine learning-driven JAX-ReaxFF framework to optimize the CHO2008 parameters, utilizing a training set derived from the benchmark dataset. This optimization strategy, which incorporated energy, force, and geometry information, significantly enhanced the accuracy of ReaxFF. The optimized force field achieved a level of accuracy in describing potential energies that was comparable to that of the semi-empirical PM7 method, representing a substantial improvement over the original ReaxFF parameter sets. Specifically, we observed considerable improvements in the calculated potential energy profiles and surfaces, especially in the vicinity of transition states.

      The optimized ReaxFF presented here provides a good tool for molecular dynamics simulations of diverse combustion processes. Furthermore, the methodology employed in this study—combining a comprehensive dataset with the JAX-ReaxFF optimization framework—offers a generalizable approach for developing accurate, system-specific ReaxFF parameters for other complex chemical systems beyond combustion.

      While this work demonstrates some progress, there are still some limitations. The current dataset exhibits a bias towards structures near reactants and products, with relatively fewer data points in the transition state regions. Future work will focus on incorporating more advanced sampling techniques to address this imbalance and further refine the description of transition state regions, thereby enhancing the capability and accuracy of combustion simulations. Additionally, while the optimization significantly improves the accuracy for key reactions and shows promise in initial n-dodecane pyrolysis simulations (FIG. S2 in SM), the broad applicability of this specific optimized ReaxFF across diverse combustion regimes requires further systematic validation against reactions and conditions beyond the current training set. Future work will focus on these assessments. We believe that this work will provide valuable insights for combustion research, helping to design more efficient aircraft engines and reduce emissions.

      Supplementary materials: The potential energy surfaces of the reaction rxn05 under different methods, as well as the results of molecular dynamics simulations for the pyrolysis of n-dodecane.

    Figure (5)  Table (3) Reference (54)

Catalog

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return