Mardochée Réveil, PhD
Back to Publications

QeMFi: A Multifidelity Dataset of Quantum Chemical Properties of Diverse Molecules

Vivin Vinod, Peter Zaspel
6/20/2024

Abstract

Progress in both Machine Learning (ML) and Quantum Chemistry (QC) methods have resulted in high accuracy ML models for QC properties. Datasets such as MD17 and WS22 have been used to benchmark these models at some level of QC method, or fidelity, which refers to the accuracy of the chosen QC method. Multifidelity ML (MFML) methods, where models are trained on data from more than one fidelity, have shown to be effective over single fidelity methods. Much research is progressing in this direction for diverse applications ranging from energy band gaps to excitation energies. One hurdle for effective research here is the lack of a diverse multifidelity dataset for benchmarking. We provide the Quantum chemistry MultiFidelity (QeMFi) dataset consisting of five fidelities calculated with the TD-DFT formalism. The fidelities differ in their basis set choice: STO-3G, 3-21G, 6-31G, def2-SVP, and def2-TZVP. QeMFi offers to the community a variety of QC properties such as vertical excitation properties and molecular dipole moments, further including QC computation times allowing for a time benefit benchmark of multifidelity models for ML-QC.

AI-Generated Overview

Here is a brief overview of the provided text, structured into bullet points based on the specified categories:

  • Research Focus: The paper introduces the Quantum chemistry MultiFidelity (QeMFi) dataset, which contains multifidelity quantum chemical properties for various molecules to facilitate the benchmarking of multifidelity machine learning (MFML) methods.

  • Methodology: The dataset was generated by sampling 15,000 geometries from the WS22 database for nine diverse molecules, and quantum chemical (QC) properties were calculated at five different levels of fidelity using the TD-DFT formalism.

  • Results: QeMFi comprises 135,000 data points that provide information on ground state energies, vertical excitation energies, dipole moments, and QC computation times across varying fidelity levels.

  • Key Contribution(s): The dataset is notable for being the first multifidelity quantum chemical dataset that includes time-cost benchmarks for quantum chemistry calculations, thus enabling meaningful assessments of the computational efficiency of MFML methods.

  • Significance: By presenting a diverse and comprehensive multifidelity dataset alongside associated computational time data, QeMFi supports advancements in machine learning techniques within quantum chemistry, facilitating more efficient model training and validation.

  • Broader Applications: The QeMFi dataset can be leveraged in various fields that utilize quantum chemical modeling and machine learning, such as materials science, drug discovery, and chemical engineering, to foster improved predictive modeling and computational resource management.

Relevant Links

Stay Updated

Subscribe to my Substack for periodic updates on AI and Materials Science