Optimize Gap Analysis by Pruning Gap Analysis Traversal Using Tiered Neo4j Queries to Improve Performance#716
Conversation
- Introduce tiered gap analysis queries (strong → medium → wildcard) - Stop traversal early when strong or medium paths exist - Preserve existing scoring and semantics - Add unit test to verify Tier-3 traversal is skipped when not needed Fixes OWASP#506
PR 716: Performance BenchmarkHi @northdpole , Thanks for the feedback. I understand the need for safety when touching core functionality. To be absolutely sure, I ran a comparative benchmark on my local environment using the full OpenCRE dataset (18 standards). 1. The "Before vs After" MeasurementsI measured the execution time of gap_analysis() for the ASVS -> WSTG pair on both branches.
2. Methodology & RationaleWhy did we test "ASVS -> WSTG"?
Why "Bolt" Protocol? Environment:
This benchmark confirms the change is both safe (finds the high-quality data) and performant (>10x faster). |
…Neo4j Queries to Improve Performance (OWASP#716) Optimize gap analysis by pruning Neo4j path search early - Introduce tiered gap analysis queries (strong → medium → wildcard) - Stop traversal early when strong or medium paths exist - Preserve existing scoring and semantics - Add unit test to verify Tier-3 traversal is skipped when not needed Fixes OWASP#506
…Neo4j Queries to Improve Performance (OWASP#716) Optimize gap analysis by pruning Neo4j path search early - Introduce tiered gap analysis queries (strong → medium → wildcard) - Stop traversal early when strong or medium paths exist - Preserve existing scoring and semantics - Add unit test to verify Tier-3 traversal is skipped when not needed Fixes OWASP#506
The toggle added in PR OWASP#717 was being overridden by a duplicate gap_analysis method left over from PR OWASP#716. Removed the duplicate so the feature toggle actually works as intended. Also adds scripts/benchmark_gap.py which proved the optimized mode is 99.5% faster and uses 99.6% less memory than the original. Closes OWASP#587
…nchmark harness (#748) fix(gap-analysis): repair broken GAP_ANALYSIS_OPTIMIZED toggle The toggle added in PR #717 was being overridden by a duplicate gap_analysis method left over from PR #716. Removed the duplicate so the feature toggle actually works as intended. Also adds scripts/benchmark_gap.py which proved the optimized mode is 99.5% faster and uses 99.6% less memory than the original. Closes #587
…Neo4j Queries to Improve Performance (#716) Optimize gap analysis by pruning Neo4j path search early - Introduce tiered gap analysis queries (strong → medium → wildcard) - Stop traversal early when strong or medium paths exist - Preserve existing scoring and semantics - Add unit test to verify Tier-3 traversal is skipped when not needed Fixes #506
🚀 Prune Gap Analysis Search to Save Time and Memory
This PR implements a tiered pruning strategy for Gap Analysis to significantly reduce execution time and memory usage during map analysis.
The change directly addresses Issue #506 and aligns with the original design discussion around stopping early when strong or medium links are found.
🧠 Problem
Gap analysis currently performs an expensive wildcard Neo4j traversal:
This approach:
In practice, we are only interested in the strongest connections between standards.
✅ Solution: Tiered Pruning Strategy
The search is now executed in three tiers, with early exit once results are found.
Tier 1 – Strong Links
Executed first. If any paths are found, the search stops immediately.
Relationships included:
LINKED_TOAUTOMATICALLY_LINKED_TOSAMEThese correspond to the strongest connections (penalty = 0) and include equivalence (
SAME) relationships.Tier 2 – Medium Links
Executed only if Tier 1 returns no results.
Relationships included:
LINKED_TOAUTOMATICALLY_LINKED_TOSAMECONTAINSThis captures hierarchical relationships without falling back to a full wildcard traversal.
Tier 3 – Fallback (Wildcard)
Executed only if Tier 1 and Tier 2 return no paths.
This preserves existing behavior as a fallback to ensure no loss of coverage.
🧪 Testing
A new unit test has been added to verify pruning behavior:
Test command:
All existing gap analysis tests continue to pass.
📈 Impact
🔗 Related Issue
Prune map analysis search to save time and memory
Fixes #506
📝 Notes