What is it about?
Machine learning for molecular simulations has produced a large ecosystem of software, written in different languages (Python, C++, Fortran, Julia) and built on different frameworks (PyTorch, JAX, scikit-learn). These tools work well individually but exchanging data or models between them requires ad hoc conversion code. Integrating ML models into simulation engines like LAMMPS or GROMACS presents the same problem: each combination of model and simulator needs its own interface. metatensor addresses the data side. It provides a labeled, block-sparse array format designed for atomistic quantities. The labels carry metadata about what each element represents (atom types, angular momentum channels, spatial components), and the format stores gradients (forces, stress) alongside the values they derive from. The data structure works across Python, C, C++, Rust, and Fortran through a shared C library. metatomic addresses the model side. It wraps a trained ML model together with metadata describing its inputs, outputs, and capabilities into a portable archive. A simulation engine that supports the metatomic interface can load any compliant model without knowing how the model works internally. The paper also describes the ecosystem built around these two libraries: metatrain for training workflows, featomic for computing atomic descriptors, and integrations with LAMMPS, i-PI, ASE, PLUMED, eOn, and other simulation tools.
Featured Image
Photo by Dayne Topkin on Unsplash
Why is it important?
The atomistic ML field has a fragmentation problem. Each group builds models in their preferred framework, and using someone else's model in your simulation requires writing interface code specific to that model-simulator pair. The number of required interfaces grows as the product of models and simulators, which does not scale. metatensor and metatomic reduce this to a sum: each model implements the metatomic interface once, each simulator implements it once, and all combinations work. This is the same pattern that made file formats like HDF5 and protocols like MPI successful in scientific computing. The practical consequence: a researcher can train a model with metatrain, export it as a metatomic archive, and run it in LAMMPS, i-PI, or eOn without writing any interface code. The model's metadata ensures that the simulator uses it correctly (right units, right neighbor list settings, right output quantities). The libraries are designed for long-term maintainability, with CI testing across platforms, semantic versioning, and backwards-compatible data formats.
Perspectives
My contributions to this paper were the eOn integration section and the PLUMED integration. Both connect metatensor to tools I use directly in my saddle point search work. The eOn integration demonstrates the practical value of metatomic: by implementing the interface in eOn, any metatensor-compatible ML potential (PET-MAD, SOAP-based models, equivariant neural networks) becomes immediately available for saddle point searches, NEB calculations, and long-timescale dynamics without any model-specific code in eOn. eOn's core is C++, and metatensor's C API fits that architecture naturally. The PLUMED integration extends this interoperability to enhanced sampling workflows, where ML-based collective variables defined through metatensor can drive metadynamics and related methods.
Rohit Goswami
University of Iceland
Read the Original
This page is a summary of: metatensor
and
metatomic
: Foundational libraries for interoperable atomistic machine learning, The Journal of Chemical Physics, February 2026, American Institute of Physics,
DOI: 10.1063/5.0304911.
You can read the full text:
Resources
Contributors
The following have contributed to this page







