⚡️ Speed up function find_last_node by 21,185%
#226
+2
−1
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
📄 21,185% (211.85x) speedup for
find_last_nodeinsrc/algorithms/graph.py⏱️ Runtime :
93.7 milliseconds→440 microseconds(best of250runs)📝 Explanation and details
The optimized code achieves a 21,000% speedup by eliminating a severe O(n×m) algorithmic bottleneck through pre-computing a set of source node IDs.
Key Optimization:
The original implementation used a nested iteration pattern:
For each node, it checked all edges to verify the node wasn't a source. With
nnodes andmedges, this resulted in O(n×m) comparisons—catastrophic for larger graphs.The optimized version pre-computes a hash set of source IDs once:
This reduces the algorithm to O(m + n):
sourcesset (one pass through edges)Why This Works:
Python's set implementation uses hash tables, providing constant-time lookups versus the linear scan required by
all(). The line profiler shows the dramatic impact:Performance Characteristics:
The optimization excels particularly on large graphs:
Small graphs show modest improvements (30-95% faster) since overhead is dominated by Python's interpreter rather than the algorithm. The only slight regression is empty inputs (9-23% slower) where set creation overhead isn't amortized, but this is negligible at sub-microsecond scales.
Impact: If
find_last_nodeis called in graph processing pipelines or hot paths, this optimization will dramatically reduce execution time, especially for graphs with hundreds or thousands of nodes/edges.✅ Correctness verification report:
🌀 Click to see Generated Regression Tests
To edit these changes
git checkout codeflash/optimize-find_last_node-mjmt8n0land push.