Skip to content

Add async data validator support for @check_output / @check_output_custom#1509

Merged
skrawcz merged 1 commit into
apache:mainfrom
SummitSG-LLC:2603/async
Mar 7, 2026
Merged

Add async data validator support for @check_output / @check_output_custom#1509
skrawcz merged 1 commit into
apache:mainfrom
SummitSG-LLC:2603/async

Conversation

@Dev-iL

@Dev-iL Dev-iL commented Mar 6, 2026

Copy link
Copy Markdown
Collaborator

Motivation

Hamilton's data quality validators (DataValidator, BaseDefaultValidator) are synchronous-only. When using AsyncDriver, the validation wrapper functions created by @check_output / @check_output_custom are always plain def — even if the underlying validator needs to perform async work. The AsyncGraphAdapter checks asyncio.iscoroutinefunction(fn) to decide whether to await a node callable, so a sync wrapper around an async validate() silently returns an unawaited coroutine instead of a ValidationResult, corrupting downstream results.

Minimal example

# my_module.py
from hamilton.data_quality.base import AsyncDataValidator, ValidationResult
from hamilton.function_modifiers import check_output_custom


class AsyncPositiveValidator(AsyncDataValidator):
    def __init__(self):
        super().__init__(importance="fail")

    def applies_to(self, datatype):
        return datatype == int

    def description(self):
        return "Value must be positive"

    @classmethod
    def name(cls):
        return "positive_validator"

    async def validate(self, dataset: int) -> ValidationResult:
        # async validation logic (e.g. await db_check(dataset))
        return ValidationResult(
            passes=dataset > 0,
            message=f"{dataset} is {'positive' if dataset > 0 else 'not positive'}",
        )


@check_output_custom(AsyncPositiveValidator())
async def doubled(input_value: int) -> int:
    return input_value * 2


# main.py
from hamilton import async_driver, base

dr = async_driver.AsyncDriver({}, my_module, result_builder=base.DictResult())
result = await dr.execute(final_vars=["doubled"], inputs={"input_value": 5})
# result == {"doubled": 10}

Changes

New base classes (hamilton/data_quality/base.py):

  • AsyncDataValidator — async variant of DataValidator with async def validate()
  • AsyncBaseDefaultValidator — async variant of BaseDefaultValidator for use with @check_output
  • is_async_validator() helper using inspect.iscoroutinefunction for robust detection

Async-aware wrapper generation (hamilton/function_modifiers/validation.py):

  • transform_node() now detects async validators and creates async def wrappers that await the validator's validate() call
  • Sync validators get a runtime guard that raises a clear TypeError if a coroutine is accidentally returned
  • Follows the established Hamilton pattern used in expanders.py, macros.py, and recursive.py

Documentation updates:

  • writeups/data_quality.md — new "Async Validators" section with full examples
  • docs/concepts/function-modifiers.rst — mentions async validator base classes
  • docs/reference/decorators/check_output.rst — API reference entries for async classes
  • docs/how-tos/run-data-quality-checks.rst — async validators subsection
  • examples/async/README.md — data quality with async section

How I tested this

  • Unit tests verifying async wrapper creation, mixed sync/async validators, and the misuse guard
  • End-to-end tests with AsyncDriver for async, sync, and mixed validator scenarios

Notes

  • AsyncDataValidator inherits from DataValidator, so all existing isinstance checks and type hints work unchanged.
  • Detection uses inspect.iscoroutinefunction(validator.validate) rather than isinstance, catching any validator with an async validate regardless of class hierarchy.
  • final_node_callable (which aggregates validation results) remains sync — the AsyncGraphAdapter awaits all kwargs before calling it, so results are already resolved.
  • All changes are additive with no breaking API modifications. Existing sync validators continue to work identically.

Checklist

  • PR has an informative and human-readable title (this will be pulled into the release notes)
  • Changes are limited to a single goal (no scope creep)
  • Code passed the pre-commit check & code is left cleaner/nicer than when first encountered.
  • Any change in functionality is tested
  • New functions are documented (with a description, list of inputs, and expected output)
  • Placeholder code is flagged / future TODOs are captured in comments
  • Project documentation has been updated if adding/changing functionality.

@Dev-iL Dev-iL requested review from elijahbenizzy, rwhitten577, skrawcz and zilto and removed request for rwhitten577 March 6, 2026 14:04

@skrawcz skrawcz left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

Would it make sense to have validators contain a sync and async function? I can't think of a case where we'd want that. But just throwing it out there.

@skrawcz skrawcz merged commit 06d1275 into apache:main Mar 7, 2026
5 of 6 checks passed
@Dev-iL Dev-iL deleted the 2603/async branch March 7, 2026 08:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants