Performance improvements to pdep logging #1765
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Motivation or Problem
This PR brings substantial performance improvements to RMG pressure dependence jobs by cutting down unnecessary costs associated with logging.
Description of Changes
The key change is related to a subtlety in using the logging module. We commonly call logging with statements such as
When executing this, Python will first perform the string formatting and pass the result to
logging.debug. Then, regardless of whether or not the logging level requires this to be printed to the log file, the string formatting will have been performed.However, the intended usage of logging is by directly providing the arguments to be formatted:
Note that sprintf-style formatting is required. In this case, the logging module will perform the string formatting only if the logging level is such that this statement needs to be logged.
In the
rmgpy.pdepmodule, there were a number oflogging.debugcalls which were formatting Network objects and numpy arrays which took a substantial amount of time to convert to strings.This PR changes the calls which were taking the most time to use the more efficient syntax. Additionally, the logging levels for
were adjusted to clean up RMG.log a bit. The "Significant corrections" line was kept at warning while the others were changed to debug.
Testing
I tested this on a dodecane pyrolysis model which was set to terminate at 100 species. The model took 58 minutes to complete on master and 22 minutes on this branch, or ~2.6x faster. The runtime for
update_unimolecular_reaction_networkswas reduced by ~5x.Reviewer Tips
Try a pdep job and compare performance to master.