Skip to content

steinbeck/DeepOCSR

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 

Repository files navigation

DeepOCSR

DeepOCSR is a collaborative project to develop and maintain unified training and test data for deep-learning-based Optical Chemical Structure Recognition (OCSR). OCSR aims to translate bitmaps from 2D chemical structure diagrams (see figure 1) back into a machine-readable representation.

tranlating bitmaps of chemical structures back to connection tables with Deep Neural Netoworks

A growing number of deep-learning-based methods for deepOCSR is being published (see references). To increase comparability of methods and provide easier access for new developers to proven sets of training and test data, this repository will provide guidance. Most if not all methods published to far require millions of datapoints for training. As of today, no sizable datasets of real-world images from publications with annotated machine-readable structures exist, so that training data must be artificially generated and augmented. Augmentation in this context means deterioration of quality through image manipulation methods (adding noise, etc) to match the quality of scanned pages from the literature.

References

  1. Molecular Structure Extraction from Documents Using Deep Learning : https://pubs.acs.org/doi/full/10.1021/acs.jcim.8b00669 Joshua Staker*, Kyle Marshall*, Robert Abel, and Carolyn M. McQuaw

  2. Img2Mol - Accurate SMILES Recognition from Molecular Graphical Depictions : https://doi.org/10.26434/chemrxiv.14320907.v1 Djork-Arné Clevert, Tuan Le, Robin Winter, Floriane Montanari

  3. Image2SMILES: Transformer-based Molecular Optical Recognition Engine : https://doi.org/10.26434/chemrxiv.14602716.v1 Ivan Khokhlov, Lev Krasnov, Maxim Fedorov, Sergey Sosnin

  4. End-to-End Attention-based Image Captioning : https://arxiv.org/abs/2104.14721 Carola Sundaramoorthy, Lin Ziwen Kelvin, Mahak Sarin, Shubham Gupta

  5. ChemGrapher: Optical Graph Recognition of Chemical Compounds by Deep Learning: https://pubs.acs.org/doi/10.1021/acs.jcim.0c00459 Martijn Oldenhof*, Adam Arany*, Yves Moreau*, and Jaak Simm*

  6. ChemPix: Automated Recognition of Hand-drawn Hydrocarbon Structures Using Deep Learning : https://doi.org/10.26434/chemrxiv.14156957.v1 Hayley Weir, Keiran Thompson, Ben Choi, Amelia Woodward, Augustin Braun, Todd J. Martínez

  7. Rajan, Kohulan; Zielesny, Achim; Steinbeck, Christoph (2021): DECIMER 1.0: Deep Learning for Chemical Image Recognition using Transformers. ChemRxiv. Preprint. https://doi.org/10.26434/chemrxiv.14479287.v1

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors