Add async data validator support for @check_output / @check_output_custom#1509
Merged
Conversation
skrawcz
approved these changes
Mar 6, 2026
skrawcz
left a comment
Contributor
There was a problem hiding this comment.
LGTM.
Would it make sense to have validators contain a sync and async function? I can't think of a case where we'd want that. But just throwing it out there.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Motivation
Hamilton's data quality validators (
DataValidator,BaseDefaultValidator) are synchronous-only. When usingAsyncDriver, the validation wrapper functions created by@check_output/@check_output_customare always plaindef— even if the underlying validator needs to perform async work. TheAsyncGraphAdapterchecksasyncio.iscoroutinefunction(fn)to decide whether toawaita node callable, so a sync wrapper around an asyncvalidate()silently returns an unawaited coroutine instead of aValidationResult, corrupting downstream results.Minimal example
Changes
New base classes (
hamilton/data_quality/base.py):AsyncDataValidator— async variant ofDataValidatorwithasync def validate()AsyncBaseDefaultValidator— async variant ofBaseDefaultValidatorfor use with@check_outputis_async_validator()helper usinginspect.iscoroutinefunctionfor robust detectionAsync-aware wrapper generation (
hamilton/function_modifiers/validation.py):transform_node()now detects async validators and createsasync defwrappers thatawaitthe validator'svalidate()callTypeErrorif a coroutine is accidentally returnedexpanders.py,macros.py, andrecursive.pyDocumentation updates:
writeups/data_quality.md— new "Async Validators" section with full examplesdocs/concepts/function-modifiers.rst— mentions async validator base classesdocs/reference/decorators/check_output.rst— API reference entries for async classesdocs/how-tos/run-data-quality-checks.rst— async validators subsectionexamples/async/README.md— data quality with async sectionHow I tested this
AsyncDriverfor async, sync, and mixed validator scenariosNotes
AsyncDataValidatorinherits fromDataValidator, so all existingisinstancechecks and type hints work unchanged.inspect.iscoroutinefunction(validator.validate)rather thanisinstance, catching any validator with an asyncvalidateregardless of class hierarchy.final_node_callable(which aggregates validation results) remains sync — theAsyncGraphAdapterawaits all kwargs before calling it, so results are already resolved.Checklist