draft autosolve#183
Conversation
| @@ -0,0 +1,204 @@ | |||
| """Batch and single-molecule retrosynthesis runners. | |||
| smiles: str, | ||
| *, | ||
| llm: str = "anthropic/claude-3-7-sonnet-20250219:adv", | ||
| az_model: str = "Pistachio_100+", |
There was a problem hiding this comment.
Separate PR we need aizynthfinder
|
|
||
| Returns | ||
| ------- | ||
| dict[str, Any] |
There was a problem hiding this comment.
More details on output format please
| from src.utils.job_context import logger as context_logger | ||
| from src.utils.parse import format_output | ||
|
|
||
| def _run(smiles, **kwargs): |
There was a problem hiding this comment.
Make it a public function; avoid "private" functions
| ) | ||
|
|
||
| stability_flag = str(stability_check) | ||
| semaphore = asyncio.Semaphore(max_concurrent) |
There was a problem hiding this comment.
You need to document this thoroughly
| hallucination_mode, hallucination_classifier, | ||
| ) | ||
|
|
||
| stability_flag = str(stability_check) |
| semaphore = asyncio.Semaphore(max_concurrent) | ||
| loop = asyncio.get_running_loop() | ||
|
|
||
| async def _process(smi: str) -> dict[str, Any]: |
| def get_pipeline(): | ||
| """Lazy-import the retrosynthesis pipeline and return a runner callable. | ||
|
|
||
| Imports are deferred so that ``import deepretro.algorithms.autosolve`` |
There was a problem hiding this comment.
how heavy is the aizynthfinder import?
also i'm not sure if the LLM libraries are heavy, they cause issues if they are not installed
shreyasvinaya
left a comment
There was a problem hiding this comment.
some simple fixes like usage examples, type annotations have to be added, rst docs to be added
some feedback on autosolve
| from deepretro.migration.job_context import logger as context_logger | ||
| from deepretro.migration.parse import format_output | ||
|
|
||
| def run_retrosynthesis(smiles, **kwargs): |
There was a problem hiding this comment.
usage example, type annotations
| def autosolve( | ||
| smiles: str, | ||
| *, | ||
| llm: str = "anthropic/claude-3-7-sonnet-20250219:adv", |
There was a problem hiding this comment.
switch this to opus 4.6 as default
| az_model: str = "Pistachio_100+", | ||
| stability_check: bool = True, | ||
| hallucination_mode: str = "heuristic", | ||
| hallucination_classifier: Any = None, |
There was a problem hiding this comment.
avoid using any for typechecks
| ) | ||
|
|
||
|
|
||
| async def autosolve_async( |
There was a problem hiding this comment.
do we need a async autosolve?
@rbharath do we want to handle this at a function level or do we want to do this at a script level?
i'm not sure how well async functions work on jupyter notebooks
| async def autosolve_async( | ||
| smiles_list: list[str], | ||
| *, | ||
| llm: str = "anthropic/claude-3-7-sonnet-20250219:adv", |
There was a problem hiding this comment.
switch default model to opus 4.6
| @@ -6,22 +6,34 @@ | |||
| """ | |||
There was a problem hiding this comment.
you can skip changes to this file, it is being fixed in #193
| Examples | ||
| -------- | ||
| >>> from deepretro.algorithms.autosolve import autosolve # doctest: +SKIP | ||
| >>> result = autosolve("c1ccccc1") # doctest: +SKIP |
There was a problem hiding this comment.
you might want to move the different ways of calling autosove to rst docs
| ... hallucination_classifier="model_out/", | ||
| ... ) # doctest: +SKIP | ||
| """ | ||
| run_retrosynthesis = get_pipeline() |
There was a problem hiding this comment.
do we need to initialize the pipeline and then call it like a class?
can it not be called directly?
ARY2260
left a comment
There was a problem hiding this comment.
left few comments, concerned about type mixing
| 4. Recurse on each reactant until all leaves are purchasable. | ||
| 5. Flatten the tree into a step-by-step synthesis plan. | ||
|
|
||
| Examples |
There was a problem hiding this comment.
please keep examples is main autosolve function docstring itself
| ``{"steps": [...], "dependencies": {...}}``. | ||
| When *return_image* is ``True``, also contains ``"image"``. | ||
| """ | ||
| import os |
There was a problem hiding this comment.
please keep imports outside the function in general
|
|
||
| if isinstance(classifier, (str, Path)): | ||
| clf = HallucinationClassifier() | ||
| clf.load(str(classifier)) |
There was a problem hiding this comment.
Why is not a deepchem style model with restore to reload the model?
There was a problem hiding this comment.
we do use restore, there is a thin wrapper on top to accommodate for threshold which we calculate separately
| from deepretro.utils.parse import format_output | ||
|
|
||
| hallucination_check, hallucination_checker_fn = resolve_hallucination( | ||
| hallucination_mode, hallucination_classifier, |
There was a problem hiding this comment.
hallucination_classifier type is quite mixed, I think its a path as well as callable? Why is this variation needed?
There was a problem hiding this comment.
Fixed, resolve_hallucination now returns a single callable (or None) instead of a (str, callable) tuple. The hallucination_classifier parameter accepts either a path to a saved model or a pre-loaded instance; resolve_hallucination normalises both into one ready-to-use checker function. The type variation is handled internally, not leaked to the caller.
| depth: int = 0, | ||
| max_depth: int = 50, | ||
| ) -> tuple[dict, bool]: | ||
| """Recursively solve a molecule via AZ, falling back to LLM. |
There was a problem hiding this comment.
Please add a detailed doctring explaining how the rec run works and add example.
| Returns a nested mol/reaction tree and a *solved* flag. | ||
| """ | ||
| from rdkit import Chem | ||
| from deepretro.algorithms.llm import llm_pipeline |
There was a problem hiding this comment.
please avoid internal imports, dependencies issues might creep in.
There was a problem hiding this comment.
understood, added
| return result_dict, solved | ||
|
|
||
|
|
||
| def unsolved_leaf(smiles: str) -> dict: |
There was a problem hiding this comment.
if there are a more of such retrosynthesis tree utils, please add them in utils file or something
| if isinstance(pathway, list): | ||
| all_solved = True | ||
| for smi in pathway: | ||
| res, stat = rec_run(molecule=smi, **recurse_kw) |
There was a problem hiding this comment.
there is logic type mixing for molecule ie, smi or pathway, why this mixing is required?
There was a problem hiding this comment.
there was a case in which either a list (multiple reactants) or a single element was passed, hence the mixing. We now perform a normalising step
|
|
||
|
|
||
| def rec_run( | ||
| molecule: str, |
There was a problem hiding this comment.
an AutoSolver class would be a better design to avoid parameter shuffle.
ARY2260
left a comment
There was a problem hiding this comment.
refactor to AutoSolver class would be better.
|
|
||
|
|
||
| class AutoSolver: | ||
| """Holds shared config""" |
There was a problem hiding this comment.
Expand docs significantly. This needs to be the core API object not just a config holder
| """Convert user-facing mode to a single checker callable (or None). | ||
|
|
||
| Returns None (skip checking), or a callable with signature | ||
| ``(product: str, pathways: list) -> (int, list)``. |
There was a problem hiding this comment.
Document arguments and add usage examples for every function
| @@ -0,0 +1,261 @@ | |||
| """Synthesis pathway visualization. | |||
| } | ||
|
|
||
|
|
||
| def mask_protecting_groups_multisymbol(smiles: str) -> str: |
There was a problem hiding this comment.
@shreyasvinaya can you please take this utility function in your PR and add a parameter for path of config if path is accessible to the module that will be using this function at runtime.
path can be none by default, which leads to a warning and then use the default dict PG_MAP
There was a problem hiding this comment.
protecting group will be ported over after the main llm.py is merged, it needs a thorough refactor, will include this in that refactor
|
|
||
| import joblib | ||
|
|
||
| from deepretro import logging as job_context |
There was a problem hiding this comment.
part of new upcoming PR along with parse.py
| import json | ||
| from pathlib import Path | ||
| from typing import Any, Dict | ||
|
|
There was a problem hiding this comment.
@redrodeo03 Please check if this function is needed any longer after refactor. If not, delete it from the PR.
There was a problem hiding this comment.
@shreyasvinaya i dont think we've used langfuse in a while, are we ok to remove this?
| from aizynthfinder.aizynthfinder import AiZynthFinder | ||
|
|
||
| finder = AiZynthFinder(configfile=config_filename) | ||
| finder.stock.select("zinc") |
There was a problem hiding this comment.
Put a separate small PR, and update the rst docs.
| from typing import Any | ||
|
|
||
|
|
||
| def build_ml_checker(clf: Any) -> Callable: |
There was a problem hiding this comment.
Next steps:
- please remove hallucination_helper from models folder.
- in algorithm create a folder named hallucination checks, that will have the base check class with a core method "check_hallucinations(react, product)", and subs classes from this base class for heuristicHallucinationChecker() and XBBoostHallucinationChecker()
Description
Fix #(issue)
we still need
cache.py
parse.py
rec_prithvi.py
prithvi.py
llm.py
from src/
Type of change
Please check the option that is related to your PR.
Checklist