Skip to content

RDKit Error when preparing the USPTO dataset #32

@nuistcz

Description

@nuistcz

I followed the instruction to install the TorchDrug properly and meet the Runtime Error when preparing the USPTO-50 dataset:
from torchdrug import datasets
reaction_dataset = datasets.USPTO50k("~/molecule-datasets/", node_feature="center_identification", kekulize=True)

The error log was:

reaction_dataset = datasets.USPTO50k("~/data/molecule-datasets/",
node_feature="center_identification",
kekulize=True)
Loading /home/masa/data/molecule-datasets/data_processed.csv: 100%|█| 50017/50017 [00:00<00:00, 11396
Constructing molecules from SMILES: 100%|█████████████████████| 50016/50016 [03:31<00:00, 236.07it/s]
Computing reaction centers: 0%| | 0/50016 [00:00<?, ?it/s]/home/masa/.conda/envs/td/lib/python3.7/site-packages/torchdrug-0.1.0-py3.7.egg/torchdrug/data/graph.py:411: UserWarning: This overload of nonzero is deprecated:
nonzero()
Consider using one of the following signatures instead:
nonzero(*, bool as_tuple) (Triggered internally at /pytorch/torch/csrc/utils/python_arg_parser.cpp:882.)
return match.nonzero().flatten()
[21:52:07]


Pre-condition Violation
getNumImplicitHs() called without preceding call to calcImplicitValence()
Violation occurred on line 188 in file /home/conda/feedstock_root/build_artifacts/rdkit_1629841762512/work/Code/GraphMol/Atom.cpp
Failed Expression: d_implicitValence > -1


Computing reaction centers: 0%| | 0/50016 [00:00<?, ?it/s]
Traceback (most recent call last):
File "", line 1, in
File "/home/masa/.conda/envs/td/lib/python3.7/site-packages/decorator.py", line 232, in fun
return caller(func, *(extras + args), **kw)
File "/home/masa/.conda/envs/td/lib/python3.7/site-packages/torchdrug-0.1.0-py3.7.egg/torchdrug/core/core.py", line 282, in wrapper
return init(self, *args, **kwargs)
File "/home/masa/.conda/envs/td/lib/python3.7/site-packages/torchdrug-0.1.0-py3.7.egg/torchdrug/datasets/uspto50k.py", line 83, in init
reactants, products = process_fn(reactant, product)
File "/home/masa/.conda/envs/td/lib/python3.7/site-packages/torchdrug-0.1.0-py3.7.egg/torchdrug/datasets/uspto50k.py", line 142, in _get_reaction_center
reactant_hs = torch.tensor([atom.GetTotalNumHs() for atom in reactant.to_molecule().GetAtoms()])
File "/home/masa/.conda/envs/td/lib/python3.7/site-packages/torchdrug-0.1.0-py3.7.egg/torchdrug/data/molecule.py", line 332, in to_molecule
Chem.AssignStereochemistry(mol)
RuntimeError: Pre-condition Violation
getNumImplicitHs() called without preceding call to calcImplicitValence()
Violation occurred on line 188 in file Code/GraphMol/Atom.cpp
Failed Expression: d_implicitValence > -1
RDKIT: 2021.03.5
BOOST: 1_74

My env setting was python=3.7, torch=1.7.1, CUDA=11.0, and RDKit=2021.03.5.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions