Traditional molecule generation methods often rely on sequence or graph-based representations, which can limit their expressive power or require complex permutation-equivariant architectures. This paper introduces a novel paradigm for learning molecule generative models based on functional representations. Specifically, we propose Molecular Implicit Neural Generation (MING), a diffusion-based model that learns molecular distributions in the function space. Unlike standard diffusion processes in the data space, MING employs a novel functional denoising probabilistic process, which jointly denoises the information in both the function's input and output spaces by leveraging an expectation-maximization procedure for latent implicit neural representations of data. This approach allows for a simple yet effective model design that accurately captures underlying function distributions. Experimental results on molecule-related datasets demonstrate MING's superior performance and ability to generate plausible molecular samples, surpassing state-of-the-art data-space methods while offering a more streamlined architecture and significantly faster generation times.
MING is built upon Python 3.10.1 and Pytorch 1.12.1. To install additional packages, run the below command:
pip install -r requirements.txtAnd rdkit for molecule graphs:
conda install -c conda-forge rdkit=2020.09.1.0We follow GDSS to set up QM9, ZINC250k and DiGress to set up MOSES. To download data, run:
sh setup.shWe provide MING's hyperparameters in the config/exp folder.
cd ming
sh sh_run.sh -d ${dataset} -t diff -e exp -n ${name}where:
dataset: data type (inconfig/data)name: name of experiment (inexp/name)
Example:
cd ming
sh sh_run.sh -d zinc -t diff -e exp -n zincSet up different sampling batch size (SAMPLER_BATCH) to adapt sepecific hardware contraints in ming/sh_run.sh
export SAMPLER_BATCH=2024
We provide code that caculates the mean and std of different metrics on sampled molecules (3x samplings).
cd ming
sh sh_run.sh -d ${dataset} -t sample -e exp -n ${name}where:
dataset: data type (inconfig/data)name: name of experiment (inexp/name)
Example:
cd ming
sh sh_run.sh -d zinc -t sample -e exp -n zincTo download our model checkpoints, run:
sh setup.sh -t ckptPlease refer to our work if you find our paper with the released code useful in your research. Thank you!
@inproceedings{
nguyen2025ming,
title={{MING}: A Functional Approach to Learning Molecular Generative Models},
author={Van Khoa Nguyen and Maciej Falkiewicz and Giangiacomo Mercatali and Alexandros Kalousis},
booktitle={The 28th International Conference on Artificial Intelligence and Statistics},
year={2025},
url={https://openreview.net/forum?id=ofoxdvlzAZ}
}
