<tt>metatensor</tt>
                    and
                    <tt>metatomic</tt>
                    : Foundational libraries for interoperable atomistic machine learning

Filippo Bigi; Joseph W. Abbott; Philip Loche; Arslan Mazitov; Davide Tisi; Marcel F. Langer; Alexander Goscinski; Paolo Pegolo; Sanggyu Chong; Rohit Goswami; Pol Febrer; Sofiia Chorna; Matthias Kellner; Michele Ceriotti; Guillaume Fraux

doi:10.1063/5.0304911

What is it about?

Machine learning for molecular simulations has produced a large ecosystem of software, written in different languages (Python, C++, Fortran, Julia) and built on different frameworks (PyTorch, JAX, scikit-learn). These tools work well individually but exchanging data or models between them requires ad hoc conversion code. Integrating ML models into simulation engines like LAMMPS or GROMACS presents the same problem: each combination of model and simulator needs its own interface. metatensor addresses the data side. It provides a labeled, block-sparse array format designed for atomistic quantities. The labels carry metadata about what each element represents (atom types, angular momentum channels, spatial components), and the format stores gradients (forces, stress) alongside the values they derive from. The data structure works across Python, C, C++, Rust, and Fortran through a shared C library. metatomic addresses the model side. It wraps a trained ML model together with metadata describing its inputs, outputs, and capabilities into a portable archive. A simulation engine that supports the metatomic interface can load any compliant model without knowing how the model works internally. The paper also describes the ecosystem built around these two libraries: metatrain for training workflows, featomic for computing atomic descriptors, and integrations with LAMMPS, i-PI, ASE, PLUMED, eOn, and other simulation tools.

Photo by Dayne Topkin on Unsplash

Why is it important?

The atomistic ML field has a fragmentation problem. Each group builds models in their preferred framework, and using someone else's model in your simulation requires writing interface code specific to that model-simulator pair. The number of required interfaces grows as the product of models and simulators, which does not scale. metatensor and metatomic reduce this to a sum: each model implements the metatomic interface once, each simulator implements it once, and all combinations work. This is the same pattern that made file formats like HDF5 and protocols like MPI successful in scientific computing. The practical consequence: a researcher can train a model with metatrain, export it as a metatomic archive, and run it in LAMMPS, i-PI, or eOn without writing any interface code. The model's metadata ensures that the simulator uses it correctly (right units, right neighbor list settings, right output quantities). The libraries are designed for long-term maintainability, with CI testing across platforms, semantic versioning, and backwards-compatible data formats.

Perspectives

My contributions to this paper were the eOn integration section and the PLUMED integration. Both connect metatensor to tools I use directly in my saddle point search work. The eOn integration demonstrates the practical value of metatomic: by implementing the interface in eOn, any metatensor-compatible ML potential (PET-MAD, SOAP-based models, equivariant neural networks) becomes immediately available for saddle point searches, NEB calculations, and long-timescale dynamics without any model-specific code in eOn. eOn's core is C++, and metatensor's C API fits that architecture naturally. The PLUMED integration extends this interoperability to enhanced sampling workflows, where ML-based collective variables defined through metatensor can drive metadynamics and related methods.
Rohit Goswami
University of Iceland

This page is a summary of: metatensor and metatomic : Foundational libraries for interoperable atomistic machine learning, The Journal of Chemical Physics, February 2026, American Institute of Physics,
DOI: 10.1063/5.0304911.
You can read the full text:

Read

Resources

Contributors

The following have contributed to this page

Rohit Goswami
University of Iceland

Common data formats and model interfaces for atomistic machine learning

What is it about?

Why is it important?

Perspectives

Resources

Ecosystem landing page

Atomistic Cookbook

Contributors

Discover more

Medical Research

Life Sciences

Physical Sciences

Technology and Engineering

Environmental Research

Arts and Humanities

Social Sciences

Business and Management

Common data formats and model interfaces for atomistic machine learning

What is it about?

Featured Image

Why is it important?

Perspectives

Read the Original

Resources

Ecosystem landing page

Atomistic Cookbook

Contributors

Share this page:

Discover more

Medical Research

Life Sciences

Physical Sciences

Technology and Engineering

Environmental Research

Arts and Humanities

Social Sciences

Business and Management