Skip to content

DrrDom/rdkit-scripts

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

200 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

RDKit scripts

let's make life easier

The purpose of this repository is to collect useful scripts which mainly use RDKit. Contributions are welcome!

Some scripts may require further dependencies.

Comments and recommendations for contributors:

  1. There is a read_input.py script which contains the function read_input. It reads molecules from SMI, SDF, SDF.GZ and PKL (pickled molecules as tuples of mol and mol_title) files and STDIN (SMI and SDF formats are supported) and it returns tuples of (mol, mol_title). This is a generator and can be applied to process large collections of molecules. I advise to use this function if you do not need other data from input files.
  2. There is _template.py file which can be used as a template for new scripts. Please do not change names for input, output, ncpu and verbose arguments. This will help to make command line arguments consistent across scripts.
  3. Add help messages to your scripts.
  4. Ideally scripts should be able to communicate with STDIN and STDOUT to combine them with pipes. I implemented this in gen_stereo_rdkit.py and gen_conf_rdkit.py.
  5. All scripts can contain errors, so use them on your own risk. If you will find a mistake please create the issue and we will fix it. However, we constantly revise old scripts and fix errors because every found mistake is penultimate.

Particular scripts

Manipulate with SDF:
Script Description
add_prefix Add a prefix to molecule names in SDF file.
extractsdf Extract molecule names and field values from input SDF.
extract_mol_by_name Extract molecules by name (partial name matching) to new SDF file.
insert_sdf Add data from a text file as additional fields to input SDF file.
remove_dupl_by_field Remove entries from SDF file having duplicated mol title or field value.
rename_mols Identify identical entries (conformers) and rename consistently.
sdf_field2title Insert field values into molecular title (or SMILES, or sequential titles).
sdf_title2field Insert molecular title into a given SDF field.
strip_blank_lines Remove empty lines in multi-line field values in input SDF.
Format and file (inter)conversion:
Script Description
cansmi Return canonical SMILES of input molecules.
frags2mols Save disconnected components as individual molecules with suffix in name.
molchemaxon2pdb Convert molecules to separate PDB files using RDKit & ChemAxon.
mols2pdb Convert molecules (SMI/SDF) to PDB, adding hydrogens and conformers.
pkl2sdf Convert PKL to SDF (e.g. conformers generated by gen_conf_rdkit).
sdf2mols Split SDF into multiple MOL files.
sdf2pkl Convert SDF to multi-conformer PKL (requires sequential mol titles).
smi2sdf Convert SMILES to SDF including extra fields if present.
split_pdb Split PDB by chains and save to separate PDB files.
Manipulate with Mol objects (calc properties, generate conformers/stereoisomers, filter compounds, etc):
Script Description
add_h Add hydrogens to molecules.
calc_center_rdkit Calculate geometric center of atoms.
compare_charged_centers Get SMILES patterns of charged centers in two sets of molecules.
count_undefined_stereocenters Count undefined stereocenters and print names + counts.
discard_compounds_rdkit Remove multi-component & non-organic molecules.
draw_mols Return PNG images of molecules.
filter_conf Filter conformers by RMS value.
filter_conf_adv Select representative conformers using clustering and advanced features.
gen_conf_rdkit Generate conformers.
gen_stereo_rdkit Enumerate stereoisomers (tetrahedral & double bond).
gen_stereo_rdkit_native Enumerate stereoisomers using RDKit’s built-in function.
get_map Calculate UMAP/t-SNE coordinates for input structures.
get_mol_center Return geometric center of molecule.
get_substr Filter molecules by SMARTS (supports multiple patterns & negative matches).
get_total_charge Calculate total formal charge.
keep_largest Keep largest fragment by heavy atom count.
mirror_mols Generate mirrored 3D structures (enantiomers).
murcko Return Murcko scaffolds ignoring stereochemistry.
neutralize Neutralize structures. unipka_protonate3.py
physchem_calc Calculate physicochemical properties (MW, logP, TPSA, QED, etc.).
pmapper_descriptors Calculate 3D pharmacophore descriptors (with pmapper).
remove_stereo Remove stereoconfiguration from all centers.
remove_dupl_rdkit Remove duplicates via InChi key comparison.
rmsd_rdkit Calculate RMSD (MCS if atom matching fails, with symmetry checks).
sanitize_rdkit Remove molecules with sanitization errors + annotate stereocenters, etc.
sphere_exclusion Select diverse subset of compounds.
test_pains Return list of molecules matching PAINS.
Supplementary scripts:
Script Description
binning Take a table with values and return binned values based on thresholds.
Happy RDKiiting! :)

About

rdkit scripts making life easier

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages