Skip to content

Airflow Operator for Starting Amazon Glue DataBrew Job #22037

Description

@hsrocks

Description

We can use the DataBrew integration to add data cleaning and data Normalization steps into our analytics and machine learning workflows. The operator will be used to trigger StartJobRun API https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/databrew.html#GlueDataBrew.Client.start_job_run in order to start the job run. Also we will provide an option to wait for completion like we did for other available operator in case someone wants to wait for completion before triggering next task

Use case/motivation

AWS Glue DataBrew is a visual data preparation tool that enables users to clean and normalize data without writing any code. With the help of this API once the Glue DataBrew project is setup for ML or analytics engineer . This API can add value for the use case like we have to normalise or clean data before triggering Sagemaker Training or inferencing job or once the cleaned data is present we want to do validation of results using Glue or Athena

Related issues

No response

Are you willing to submit a PR?

  • Yes I am willing to submit a PR!

Code of Conduct

Metadata

Metadata

Assignees

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions