Skip to content

v18nguye/MING

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MING: A Functional Approach to Learning Molecular Generative Models

Abstract

Traditional molecule generation methods often rely on sequence or graph-based representations, which can limit their expressive power or require complex permutation-equivariant architectures. This paper introduces a novel paradigm for learning molecule generative models based on functional representations. Specifically, we propose Molecular Implicit Neural Generation (MING), a diffusion-based model that learns molecular distributions in the function space. Unlike standard diffusion processes in the data space, MING employs a novel functional denoising probabilistic process, which jointly denoises the information in both the function's input and output spaces by leveraging an expectation-maximization procedure for latent implicit neural representations of data. This approach allows for a simple yet effective model design that accurately captures underlying function distributions. Experimental results on molecule-related datasets demonstrate MING's superior performance and ability to generate plausible molecular samples, surpassing state-of-the-art data-space methods while offering a more streamlined architecture and significantly faster generation times.

Dependency

MING is built upon Python 3.10.1 and Pytorch 1.12.1. To install additional packages, run the below command:

pip install -r requirements.txt

And rdkit for molecule graphs:

conda install -c conda-forge rdkit=2020.09.1.0

Data

We follow GDSS to set up QM9, ZINC250k and DiGress to set up MOSES. To download data, run:

sh setup.sh

Training

We provide MING's hyperparameters in the config/exp folder.

cd ming
sh sh_run.sh -d ${dataset} -t diff -e exp -n ${name}

where:

  • dataset: data type (in config/data)
  • name: name of experiment (in exp/name)

Example:

cd ming
sh sh_run.sh -d zinc -t diff -e exp -n zinc

Sampling

Set up different sampling batch size (SAMPLER_BATCH) to adapt sepecific hardware contraints in ming/sh_run.sh

export SAMPLER_BATCH=2024

We provide code that caculates the mean and std of different metrics on sampled molecules (3x samplings).

cd ming
sh sh_run.sh -d ${dataset} -t sample -e exp -n ${name}

where:

  • dataset: data type (in config/data)
  • name: name of experiment (in exp/name)

Example:

cd ming
sh sh_run.sh -d zinc -t sample -e exp -n zinc

To download our model checkpoints, run:

sh setup.sh -t ckpt

Citation

Please refer to our work if you find our paper with the released code useful in your research. Thank you!

@inproceedings{
nguyen2025ming,
title={{MING}: A Functional Approach to Learning Molecular Generative Models},
author={Van Khoa Nguyen and Maciej Falkiewicz and Giangiacomo Mercatali and Alexandros Kalousis},
booktitle={The 28th International Conference on Artificial Intelligence and Statistics},
year={2025},
url={https://openreview.net/forum?id=ofoxdvlzAZ}
}

About

MING: A Functional Approach to Learning Molecular Generative Models (AISTATS 25)

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors