Currently gap analysis requires 64gb of ram on a chunky server with an external neo4j cluster and an external redis in order to calculate ~10gb worth of graph shortest paths.
This crashes most commercial laptops and takes more than 24hours on a GCP medium machine.
There are many micro-optimizations we can do to make the gap analysis faster and less resource intensive such as:
- preload the relevant subgraphs only in neo4j
- re-use precalculated paths
- optimize the cypher queries and the redis usage
- for any standard pair avoid trying to calculate a path between every node of standard A and every node of standard B
- experiment with cutting out the standards and calculating a gap analysis between relevant CREs, since we only have 400 CREs this should be much faster than calculating gaps between thousands of standard nodes.
- Optimize the python code to not access memory repeatedly
- Reduce the information reported as the final result
etc.
Currently gap analysis requires 64gb of ram on a chunky server with an external neo4j cluster and an external redis in order to calculate ~10gb worth of graph shortest paths.
This crashes most commercial laptops and takes more than 24hours on a GCP medium machine.
There are many micro-optimizations we can do to make the gap analysis faster and less resource intensive such as:
etc.