-
Notifications
You must be signed in to change notification settings - Fork 0
FinalReview #8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
FinalReview #8
Conversation
NadineSchneider
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Most of it is only minor changes or are suggestions for changing the order of arguments
I'll continue with the "python part" later
| class ROMol; | ||
|
|
||
| struct AdditionalOutput { | ||
| // will review this structure once more fignerprint types are implemented |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor, but this comment could be deleted
| @@ -0,0 +1,272 @@ | |||
| #ifndef RD_FINGERPRINTGEN_H_2018_05 | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The copyright text should be added here and to all other files, see for example; rdkit@cb81660
| public: | ||
| FingerprintArguments(const bool countSimulation, | ||
| const std::vector<std::uint32_t> countBounds, | ||
| const std::uint32_t foldedSize); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we should avoid the word 'folded' here to and just take 'size'?
| *bitInfoMap; | ||
| // morgan fp | ||
| // maps bitId -> vector of (atomId, radius) | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we should have better names here for the bitInfoMap and bitInfo? Maybe just bitInfoMorgan and bitInfoRDKitFP?
| const std::vector<std::uint32_t> *atomInvariants = nullptr, | ||
| const std::vector<std::uint32_t> *bondInvariants = nullptr, | ||
| const bool hashResults = false) const = 0; | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if we should change the order of the arguments here to: mol, arguments, fromAtoms, addtionalOutput, atomInvariant, bondInvariant, ignoreAtoms, confId, hasResults
I would expect this rather reflects the "importance" of the arguments
| const std::uint32_t foldedSize = 2048, | ||
| const std::vector<std::uint32_t> countBounds = {1, 2, 4, 8}, | ||
| const bool ownsAtomInvGen = false, const bool ownsBondInvGen = false); | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here I would suggest to change the order of the arguments to: radius, foldedSize , includeChirality , useBondTypes, atomInvariantsGenerator, bondInvariantsGenerator, ownsAtomInvGen, ownsBondInvGen, countSimulation, onlyNonzeroInvariants, countBounds,
| template <typename OutputType> | ||
| std::vector<AtomEnvironment<OutputType> *> | ||
| MorganEnvGenerator<OutputType>::getEnvironments( | ||
| const ROMol &mol, FingerprintArguments<OutputType> *arguments, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could part of this code also be moved to the FingerprintUtils file since most of the code is the same as for the 'old' version?
| std::vector<AtomEnvironment<OutputType> *> getEnvironments( | ||
| const ROMol &mol, FingerprintArguments<OutputType> *arguments, | ||
| const std::vector<std::uint32_t> *fromAtoms, | ||
| const std::vector<std::uint32_t> *ignoreAtoms, const int confId, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would suggest moving ignoreAtoms and confId at the end of the arguments
| */ | ||
| template <typename OutputType> | ||
| FingerprintGenerator<OutputType> *getRDKitFPGenerator( | ||
| const unsigned int minPath = 1, const unsigned int maxPath = 7, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here I also would move 'foldedSize' to position 3
| AtomInvariantsGenerator *atomInvariantsGenerator = nullptr, | ||
| const bool countSimulation = true, | ||
| const std::vector<std::uint32_t> countBounds = {1, 2, 4, 8}, | ||
| const std::uint32_t foldedSize = 2048, const bool ownsAtomInvGen = false); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here also, I would suggest moving foldedSize to position 3 and ownsAtomInvGen after the atomInvariantGenerator
|
Thanks for implementing the bulkFP generator. Looks good to me! |
| fp = g.GetSparseCountFingerprint(m) | ||
| nz = fp.GetNonzeroElements() | ||
| self.assertEqual(len(nz), 1) | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the additional output available in Python? If yes, could we add a test for that too. If not, could we add it to Python?
In general we should try to have a few more tests in Python, like the mixing of different FPs.
| } | ||
|
|
||
| template <typename OutputType> | ||
| ExplicitBitVect *getFingerprint(const FingerprintGenerator<OutputType> *fpGen, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the additional output available in Python? It seems like it is missing, it would be important to add this to the Python interface.
| } | ||
| } | ||
|
|
||
| void testAtomPairFoldedBitvector() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we should change the name of this test.
| } | ||
|
|
||
| // Old version test name testHashedAtomPairs | ||
| void testFoldedAtomPairs() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here also, 'folded' should be deleted or replaced.
| BOOST_LOG(rdErrorLog) << " done" << std::endl; | ||
| } | ||
|
|
||
| void testChiralPairs() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would suggest to change the name of this test to something else like testAtomPairsIncludingChirality
| delete m1; | ||
| delete generator; | ||
| } | ||
| // additional outputs are not fully functional yet |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this comment still true? If yes, we should try to have it functional.
| "source": [ | ||
| "# Fingerprint Generators" | ||
| ] | ||
| }, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A bit introduction to this notebook would be great. Something mentioning that this is a new implementation of four different FPs and that it will allow more flexibility.
…es not affect count fingerprints
2969716 to
f8cd911
Compare
…rints_Final' into dev/GSOC2018_Generalized_Fingerprints_Final
GSoC 2018 - New fingerprinting API
This PR is the outcome of the Google Summer of Code project "Create a generalized fingerprinting API"
Summary
The RDKit already supports a broad variety of fingerprinting functionality but is limited by the fact that the current fingerprint implementations are all independent and have slightly different APIs. This makes things unnecessarily complicated for users and requires people who are interested in exploring new types of fingerprints to write far more code than it seems like they should have to. In this project we have created a generic decoupled fingerprinting module with a consistent and flexible API.
The changes contained here introduces this new fingerprinting API that makes it possible to use different types of molecular fingerprinting methods with a similar interface. It allows the creation of fingerprint generator objects with the same structure hence enabling users to easily substitute generator objects with each other. It also allows implementation of new fingerprinting methods by providing an abstract structure to work on and offering common fingerprint functionality such as size-limiting, count simulation, etc. out of the box.
The new API also offers more flexibility in the fingerprinting process: different fingerprint methods/algorithms can now be combined like using the Morgan atom invariants in the AtomPair fingerprint.
Outcomes
Tutorial
A Jupyter notebook is added to the repository exemplifying the usage of the new API and showing the combination possibilities using the new API. The tutorial can be found in: rdkit/docs/Notebooks/FingerprintGenerators.ipynb
Tests
Unit tests for the existing fingerprinting functions are duplicated with necessary adjustments for the new API. Some additional tests have been added for newly introduced components and the test cases for the functionality that does not currently exist in the new API is skipped.
Todo
Future work
Apart from adding the mentioned missing parts there are also possible improvements that can be built on the new API.