TypeTransformer for reading and writing from TensorFlowRecord format#1240
Conversation
|
I havent added this as a plugin since the original issue description was to add this feature similar to format of pytorch transformer type |
566b3c9 to
8918108
Compare
8918108 to
12dff47
Compare
|
The unit test failures seem to be caused by
You should be able to add this to I'm excited to see more tensorflow support being contributed! |
49bcbe5 to
9342db3
Compare
| google-cloud-bigquery-storage | ||
| IPython | ||
| torch | ||
| tensorflow<=2.8.1 |
There was a problem hiding this comment.
There was a problem hiding this comment.
Ignore this - ive pinned grpcio-status<1.49.0 instead based on suggestion from @pingsutw in another PR, which fixed it !
There was a problem hiding this comment.
I recently created this PR #1248 that adds version constraints to grpcio and grpcio-status in requirements.in. You should be able to pull that change in now that it has been merged and avoid the constraint in dev-requirements.in.
Bug description here: flyteorg/flyte#3006
There was a problem hiding this comment.
I still get the same issue as above if pulling in from master . grpcio and grpcio-status versions in requirements.in are:
grpcio<=1.47.0
grpcio-status<=1.47.0
I think the it may need to be pinned grpcio-status<1.49.0 as @pingsutw had suggested (at least that was working for me) but not sure.
@dennisobrien thanks, i pushed the changes now. Ive also created a PR #1242 for keras model support ! |
Codecov Report
@@ Coverage Diff @@
## master #1240 +/- ##
==========================================
+ Coverage 68.83% 69.08% +0.24%
==========================================
Files 291 295 +4
Lines 26683 26922 +239
Branches 2140 2531 +391
==========================================
+ Hits 18368 18598 +230
- Misses 7817 7829 +12
+ Partials 498 495 -3
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. |
52625f6 to
f6fc331
Compare
|
@pingsutw pushed requested changes |
|
Writing feedback here for posterity. Draft Proposal
Why not just a type transformer for
|
|
@cosmicBboy, thanks for writing this up! I like the idea behind Concerning your questions:
As for the code structure, will this go into |
Right, I'm thinking for the @task
def consume_records(
dataset: Annotated[
tf.data.TFRecordDataset,
# configure kwargs to the constructor
# https://www.tensorflow.org/api_docs/python/tf/data/TFRecordDataset
TFRecordDatasetConfig(...)
]
):
dataset = (
dataset
.map(parse_tfrecord_fn, num_parallel_calls=AUTOTUNE)
.map(prepare_sample, num_parallel_calls=AUTOTUNE)
.shuffle(batch_size * 10)
.batch(batch_size)
.prefetch(AUTOTUNE)
)What do you think? If this looks good I can update the proposal
Yep! As long as we follow the same conventions as the |
|
@cosmicBboy looks good to me! @ryankarlos please read through the comments. |
Signed-off-by: Ryan Nazareth <ryankarlos@gmail.com>
Signed-off-by: Ryan Nazareth <ryankarlos@gmail.com>
Signed-off-by: Ryan Nazareth <ryankarlos@gmail.com>
Signed-off-by: Ryan Nazareth <ryankarlos@gmail.com>
Signed-off-by: Ryan Nazareth <ryankarlos@gmail.com>
…petransformer_tf_model
|
Can you import |
Signed-off-by: Ryan Nazareth <ryankarlos@gmail.com>
Signed-off-by: Ryan Nazareth <ryankarlos@gmail.com>
| from flytekit.types.directory import TFRecordsDirectory | ||
| from flytekit.types.file import TFRecordFile | ||
|
|
||
| T = TypeVar("T") |
There was a problem hiding this comment.
Are we using this anywhere?
| return uri, metadata | ||
|
|
||
|
|
||
| def to_tf_record_dataset_from_dir( |
There was a problem hiding this comment.
Can we merge this and the file functions with the to_python_val methods? Seems like these functions aren't being re-used anywhere else, right? So I think it's okay to have the code within the transformers.
Signed-off-by: Ryan Nazareth <ryankarlos@gmail.com>
| files = os.scandir(uri) | ||
| filenames = [os.path.join(local_dir, f.name) for f in files] |
There was a problem hiding this comment.
I think we need to get the file names from the local directory, not the remote path. In local case, it works, but when run on Flyte backend, it'll be a remote uri.
|
Amazing work, @ryankarlos! A few more comments. Sorry about incrementally reviewing the PR. :/ |
Signed-off-by: Ryan Nazareth <ryankarlos@gmail.com>
Thank you ! No thats fine, you have spotted a lot of my errors which is good ! |
Signed-off-by: Ryan Nazareth <ryankarlos@gmail.com>
…1240) * first commit Signed-off-by: Ryan Nazareth <ryankarlos@gmail.com> * add tensorflow example tf record transformer Signed-off-by: Ryan Nazareth <ryankarlos@gmail.com> * refactor Signed-off-by: Ryan Nazareth <ryankarlos@gmail.com> * correct tfexample description Signed-off-by: Ryan Nazareth <ryankarlos@gmail.com> * fix test_native.py Signed-off-by: Ryan Nazareth <ryankarlos@gmail.com> * add tensorflow docs and reqs Signed-off-by: Ryan Nazareth <ryankarlos@gmail.com> * add tensorflow docs and reqs1 Signed-off-by: Ryan Nazareth <ryankarlos@gmail.com> * tensorflow import in init Signed-off-by: Ryan Nazareth <ryankarlos@gmail.com> * fix failing tests Signed-off-by: Ryan Nazareth <ryankarlos@gmail.com> * add tensorflow pinned version to reqs Signed-off-by: Ryan Nazareth <ryankarlos@gmail.com> * pin grpcio-status to remove protobuf error Signed-off-by: Ryan Nazareth <ryankarlos@gmail.com> * add suggested changes Signed-off-by: Ryan Nazareth <ryankarlos@gmail.com> * redesign transformer Signed-off-by: Ryan Nazareth <ryankarlos@gmail.com> * remove old script Signed-off-by: Ryan Nazareth <ryankarlos@gmail.com> * fix type reference for TFREcordDataset Signed-off-by: Ryan Nazareth <ryankarlos@gmail.com> * refactor Signed-off-by: Ryan Nazareth <ryankarlos@gmail.com> * refactor Signed-off-by: Ryan Nazareth <ryankarlos@gmail.com> * spacing and uppercase Signed-off-by: Ryan Nazareth <ryankarlos@gmail.com> * redesign with tfdir and tfrecordfile subclass Signed-off-by: Ryan Nazareth <ryankarlos@gmail.com> * fix conflicts and typos Signed-off-by: Ryan Nazareth <ryankarlos@gmail.com> * address majority of comments Signed-off-by: Ryan Nazareth <ryankarlos@gmail.com> * refactor Signed-off-by: Ryan Nazareth <ryankarlos@gmail.com> * fix test with flytefile and metadata annotated Signed-off-by: Ryan Nazareth <ryankarlos@gmail.com> * fix check for example records in directory Signed-off-by: Ryan Nazareth <ryankarlos@gmail.com> * refactor and correct typing Signed-off-by: Ryan Nazareth <ryankarlos@gmail.com> * lint Signed-off-by: Ryan Nazareth <ryankarlos@gmail.com> * import annotated from typing_extensions Signed-off-by: Ryan Nazareth <ryankarlos@gmail.com> * tweak to tests to test case when Config not passed in as type Signed-off-by: Ryan Nazareth <ryankarlos@gmail.com> * add suggested changes Signed-off-by: Ryan Nazareth <ryankarlos@gmail.com> * add task for tfrecord dir with no config in test Signed-off-by: Ryan Nazareth <ryankarlos@gmail.com> * get filenames from local dir instead of remote Signed-off-by: Ryan Nazareth <ryankarlos@gmail.com> Signed-off-by: Ryan Nazareth <ryankarlos@gmail.com>
* Force flyteidl==1.2.9 Signed-off-by: Eduardo Apolinario <eapolinario@users.noreply.github.com> * Sanitize query template input in sqlite task (#1359) Signed-off-by: Eduardo Apolinario <eapolinario@users.noreply.github.com> Signed-off-by: Eduardo Apolinario <eapolinario@users.noreply.github.com> Co-authored-by: Eduardo Apolinario <eapolinario@users.noreply.github.com> * TypeTransformer for reading and writing from TensorFlowRecord format (#1240) * first commit Signed-off-by: Ryan Nazareth <ryankarlos@gmail.com> * add tensorflow example tf record transformer Signed-off-by: Ryan Nazareth <ryankarlos@gmail.com> * refactor Signed-off-by: Ryan Nazareth <ryankarlos@gmail.com> * correct tfexample description Signed-off-by: Ryan Nazareth <ryankarlos@gmail.com> * fix test_native.py Signed-off-by: Ryan Nazareth <ryankarlos@gmail.com> * add tensorflow docs and reqs Signed-off-by: Ryan Nazareth <ryankarlos@gmail.com> * add tensorflow docs and reqs1 Signed-off-by: Ryan Nazareth <ryankarlos@gmail.com> * tensorflow import in init Signed-off-by: Ryan Nazareth <ryankarlos@gmail.com> * fix failing tests Signed-off-by: Ryan Nazareth <ryankarlos@gmail.com> * add tensorflow pinned version to reqs Signed-off-by: Ryan Nazareth <ryankarlos@gmail.com> * pin grpcio-status to remove protobuf error Signed-off-by: Ryan Nazareth <ryankarlos@gmail.com> * add suggested changes Signed-off-by: Ryan Nazareth <ryankarlos@gmail.com> * redesign transformer Signed-off-by: Ryan Nazareth <ryankarlos@gmail.com> * remove old script Signed-off-by: Ryan Nazareth <ryankarlos@gmail.com> * fix type reference for TFREcordDataset Signed-off-by: Ryan Nazareth <ryankarlos@gmail.com> * refactor Signed-off-by: Ryan Nazareth <ryankarlos@gmail.com> * refactor Signed-off-by: Ryan Nazareth <ryankarlos@gmail.com> * spacing and uppercase Signed-off-by: Ryan Nazareth <ryankarlos@gmail.com> * redesign with tfdir and tfrecordfile subclass Signed-off-by: Ryan Nazareth <ryankarlos@gmail.com> * fix conflicts and typos Signed-off-by: Ryan Nazareth <ryankarlos@gmail.com> * address majority of comments Signed-off-by: Ryan Nazareth <ryankarlos@gmail.com> * refactor Signed-off-by: Ryan Nazareth <ryankarlos@gmail.com> * fix test with flytefile and metadata annotated Signed-off-by: Ryan Nazareth <ryankarlos@gmail.com> * fix check for example records in directory Signed-off-by: Ryan Nazareth <ryankarlos@gmail.com> * refactor and correct typing Signed-off-by: Ryan Nazareth <ryankarlos@gmail.com> * lint Signed-off-by: Ryan Nazareth <ryankarlos@gmail.com> * import annotated from typing_extensions Signed-off-by: Ryan Nazareth <ryankarlos@gmail.com> * tweak to tests to test case when Config not passed in as type Signed-off-by: Ryan Nazareth <ryankarlos@gmail.com> * add suggested changes Signed-off-by: Ryan Nazareth <ryankarlos@gmail.com> * add task for tfrecord dir with no config in test Signed-off-by: Ryan Nazareth <ryankarlos@gmail.com> * get filenames from local dir instead of remote Signed-off-by: Ryan Nazareth <ryankarlos@gmail.com> Signed-off-by: Ryan Nazareth <ryankarlos@gmail.com> * update ray plugin dependency (#1361) Signed-off-by: Kevin Su <pingsutw@apache.org> Signed-off-by: Kevin Su <pingsutw@apache.org> * Set default format of structured dataset to empty (#1159) * Set default format of structured dataset to empty Signed-off-by: Kevin Su <pingsutw@apache.org> * Fix tests Signed-off-by: Kevin Su <pingsutw@apache.org> * Fix tests Signed-off-by: Kevin Su <pingsutw@apache.org> * lint Signed-off-by: Kevin Su <pingsutw@apache.org> * last error (#1364) Signed-off-by: Yee Hing Tong <wild-endeavor@users.noreply.github.com> Signed-off-by: Kevin Su <pingsutw@apache.org> Signed-off-by: Yee Hing Tong <wild-endeavor@users.noreply.github.com> Co-authored-by: Yee Hing Tong <wild-endeavor@users.noreply.github.com> * Adds CLI reference for pyflyte (#1362) * Adds pyflyte CLI reference guide Signed-off-by: Samhita Alla <aallasamhita@gmail.com> * bump python version Signed-off-by: Samhita Alla <aallasamhita@gmail.com> * bump python version Signed-off-by: Samhita Alla <aallasamhita@gmail.com> * resolve docs error Signed-off-by: Samhita Alla <aallasamhita@gmail.com> * set nested to none Signed-off-by: Samhita Alla <aallasamhita@gmail.com> * remove flyteidl version constraint Signed-off-by: Samhita Alla <aallasamhita@gmail.com> * update requirements Signed-off-by: Samhita Alla <aallasamhita@gmail.com> Signed-off-by: Samhita Alla <aallasamhita@gmail.com> * Signaling (#1133) Signed-off-by: Yee Hing Tong <wild-endeavor@users.noreply.github.com> * Adding created and updated at to ExecutionClosure model (#1371) Signed-off-by: Yee Hing Tong <wild-endeavor@users.noreply.github.com> * Add Databricks config to Spark Job (#1358) Signed-off-by: Kevin Su <pingsutw@apache.org> * Add overwrite_cache option the to calls of remote and local executions (#1375) Signed-off-by: H. Furkan Vural <hfurkanvural@blackshark.ai> Implemented cache overwrite feature is added on flytekit as well for the completeness. In order to support the cache eviction RFC, an overwrite parameter was added, indicating the data store should replace an existing artifact instead of creating a new one on local calls. * Remove project/domain from being overridden with execution values in serialized context (#1378) Signed-off-by: Yee Hing Tong <wild-endeavor@users.noreply.github.com> * Use TaskSpec instead of TaskTemplate for fetch_task and avoid network when loading module (#1348) Signed-off-by: Ketan Umare <ketan.umare@gmail.com> * Register Databricks config (#1379) * Register databricks plugin Signed-off-by: Kevin Su <pingsutw@apache.org> * Update databricks plugin Signed-off-by: Kevin Su <pingsutw@apache.org> * register databricks Signed-off-by: Kevin Su <pingsutw@apache.org> * nit Signed-off-by: Kevin Su <pingsutw@apache.org> * nit Signed-off-by: Kevin Su <pingsutw@apache.org> Signed-off-by: Kevin Su <pingsutw@apache.org> Co-authored-by: Yee Hing Tong <wild-endeavor@users.noreply.github.com> * PodSpec should not require primary_container name (#1380) For Pod tasks, if the primary_container_name is not specified, it should default. Signed-off-by: Ketan Umare <ketan.umare@gmail.com> * fix(pyflyte): change -d to -D for --destination-dir as -d is already for --domain (#1381) Co-authored-by: Eduardo Apolinario <653394+eapolinario@users.noreply.github.com> * Handle Optional[FlyteFile] in Dataclass type transformer (#1393) * Add support for Optional to dataclass transformer Signed-off-by: Eduardo Apolinario <eapolinario@users.noreply.github.com> * Add one more test Signed-off-by: Eduardo Apolinario <eapolinario@users.noreply.github.com> * Add one more test Signed-off-by: Eduardo Apolinario <eapolinario@users.noreply.github.com> * Fix serialization of optional flyte types Signed-off-by: Eduardo Apolinario <eapolinario@users.noreply.github.com> Signed-off-by: Eduardo Apolinario <eapolinario@users.noreply.github.com> Co-authored-by: Eduardo Apolinario <eapolinario@users.noreply.github.com> * add FastSerializationSettings to docs (#1386) Signed-off-by: Niels Bantilan <niels.bantilan@gmail.com> Signed-off-by: Niels Bantilan <niels.bantilan@gmail.com> Co-authored-by: Kevin Su <pingsutw@apache.org> * Added more pod tests and an example pod task (#1382) * Added more pod tests and an example pod task Signed-off-by: Ketan Umare <ketan.umare@gmail.com> * fixing test and name Signed-off-by: Ketan Umare <ketan.umare@gmail.com> * updated Signed-off-by: Ketan Umare <ketan.umare@gmail.com> Signed-off-by: Ketan Umare <ketan.umare@gmail.com> * Convert default dict to json string in pyflyte run (#1399) Signed-off-by: Kevin Su <pingsutw@apache.org> Signed-off-by: Kevin Su <pingsutw@apache.org> Co-authored-by: Eduardo Apolinario <653394+eapolinario@users.noreply.github.com> * docs: update register help, non-fast version is supported (#1402) Signed-off-by: Patrick Brogan <pbrogan12@gmail.com> * Update log level for structured dataset (#1394) Signed-off-by: Kevin Su <pingsutw@apache.org> * Add Niels to code owners (#1404) Signed-off-by: Kevin Su <pingsutw@apache.org> * Signal use (#1398) Signed-off-by: Yee Hing Tong <wild-endeavor@users.noreply.github.com> * User Documentation Proposal (#1200) Signed-off-by: Kevin Su <pingsutw@apache.org> * Add support MLFlow plugin (#1274) * MLFlow plugin in progress Signed-off-by: Ketan Umare <ketan.umare@gmail.com> * wip Signed-off-by: Kevin Su <pingsutw@apache.org> * wip Signed-off-by: Kevin Su <pingsutw@apache.org> * update test Signed-off-by: Kevin Su <pingsutw@apache.org> * nit Signed-off-by: Kevin Su <pingsutw@apache.org> * nit Signed-off-by: Kevin Su <pingsutw@apache.org> * update readme Signed-off-by: Kevin Su <pingsutw@apache.org> * lint Signed-off-by: Kevin Su <pingsutw@apache.org> * wip Signed-off-by: Kevin Su <pingsutw@apache.org> * dwip Signed-off-by: Kevin Su <pingsutw@apache.org> * wip Signed-off-by: Kevin Su <pingsutw@apache.org> * nit Signed-off-by: Kevin Su <pingsutw@apache.org> * nit Signed-off-by: Kevin Su <pingsutw@apache.org> * change experiment name Signed-off-by: Kevin Su <pingsutw@apache.org> * nit Signed-off-by: Kevin Su <pingsutw@apache.org> * Add mlflow to index.rst Signed-off-by: Kevin Su <pingsutw@apache.org> * use experiment name that user provided Signed-off-by: Kevin Su <pingsutw@apache.org> * update doc-requirements.txt Signed-off-by: Kevin Su <pingsutw@apache.org> * Add backend plugin deployment Signed-off-by: Kevin Su <pingsutw@apache.org> * generate doc for method Signed-off-by: Kevin Su <pingsutw@apache.org> * lint Signed-off-by: Kevin Su <pingsutw@apache.org> * update docstring Signed-off-by: Niels Bantilan <niels.bantilan@gmail.com> * update docstring Signed-off-by: Niels Bantilan <niels.bantilan@gmail.com> * Update tracking.py Signed-off-by: Niels Bantilan <niels.bantilan@gmail.com> Signed-off-by: Ketan Umare <ketan.umare@gmail.com> Signed-off-by: Kevin Su <pingsutw@apache.org> Signed-off-by: Niels Bantilan <niels.bantilan@gmail.com> Co-authored-by: Kevin Su <pingsutw@apache.org> Co-authored-by: Niels Bantilan <niels.bantilan@gmail.com> * fix remote API reference (#1405) Signed-off-by: Niels Bantilan <niels.bantilan@gmail.com> Signed-off-by: Niels Bantilan <niels.bantilan@gmail.com> * Read structured dataset from a folder (#1406) * Read polars dataframe in a folder Signed-off-by: Kevin Su <pingsutw@apache.org> * Read polars dataframe in a folder Signed-off-by: Kevin Su <pingsutw@apache.org> * Load huggingface and spark plugin implicitly Signed-off-by: Kevin Su <pingsutw@apache.org> * nit Signed-off-by: Kevin Su <pingsutw@apache.org> * nit Signed-off-by: Kevin Su <pingsutw@apache.org> * Fix tests Signed-off-by: Kevin Su <pingsutw@apache.org> * remove _pyspark alias Signed-off-by: Yee Hing Tong <wild-endeavor@users.noreply.github.com> Signed-off-by: Kevin Su <pingsutw@apache.org> Signed-off-by: Yee Hing Tong <wild-endeavor@users.noreply.github.com> Co-authored-by: Yee Hing Tong <wild-endeavor@users.noreply.github.com> * Update default config to work out-of-the-box with flytectl demo (#1384) Signed-off-by: Niels Bantilan <niels.bantilan@gmail.com> * Add dask plugin #patch (#1366) * Add dummy task type to test backend plugin Signed-off-by: Bernhard Stadlbauer <b.stadlbauer@gmx.net> * Add docs page Signed-off-by: Bernhard Stadlbauer <b.stadlbauer@gmx.net> * Add dask models Signed-off-by: Bernhard Stadlbauer <b.stadlbauer@gmx.net> * Add function to convert resources Signed-off-by: Bernhard Stadlbauer <b.stadlbauer@gmx.net> * Add tests to `dask` task Signed-off-by: Bernhard Stadlbauer <b.stadlbauer@gmx.net> * Remove namespace Signed-off-by: Bernhard Stadlbauer <b.stadlbauer@gmx.net> * Update setup.py Signed-off-by: Bernhard Stadlbauer <b.stadlbauer@gmx.net> * Add dask to `plugin/README.md` Signed-off-by: Bernhard Stadlbauer <b.stadlbauer@gmx.net> * Add README.md for `dask` Signed-off-by: Bernhard Stadlbauer <b.stadlbauer@gmx.net> * Top level export of `JopPodSpec` and `DaskCluster` Signed-off-by: Bernhard Stadlbauer <b.stadlbauer@gmx.net> * Update docs for images Signed-off-by: Bernhard Stadlbauer <b.stadlbauer@gmx.net> * Update README.md Signed-off-by: Bernhard Stadlbauer <b.stadlbauer@gmx.net> * Update models after `flyteidl` change Signed-off-by: Bernhard Stadlbauer <b.stadlbauer@gmx.net> * Update task after `flyteidl` change Signed-off-by: Bernhard Stadlbauer <b.stadlbauer@gmx.net> * Raise error when less than 1 worker Signed-off-by: Bernhard Stadlbauer <b.stadlbauer@gmx.net> * Update flyteidl to >= 1.3.2 Signed-off-by: Bernhard Stadlbauer <b.stadlbauer@gmx.net> * Update doc requirements Signed-off-by: Bernhard Stadlbauer <b.stadlbauer@gmx.net> * Update doc-requirements.txt Signed-off-by: Bernhard Stadlbauer <b.stadlbauer@gmx.net> * Re-lock dependencies on linux Signed-off-by: Bernhard Stadlbauer <bernhard@pachama.com> * Update dask API docs Signed-off-by: Bernhard Stadlbauer <b.stadlbauer@gmx.net> * Fix documentation links Signed-off-by: Bernhard Stadlbauer <b.stadlbauer@gmx.net> * Default optional model constructor arguments to `None` Signed-off-by: Bernhard Stadlbauer <b.stadlbauer@gmx.net> * Refactor `convert_resources_to_resource_model` to `core.resources` Signed-off-by: Bernhard Stadlbauer <b.stadlbauer@gmx.net> * Use `convert_resources_to_resource_model` in `core.node` Signed-off-by: Bernhard Stadlbauer <b.stadlbauer@gmx.net> * Incorporate review feedback Signed-off-by: Eduardo Apolinario <eapolinario@users.noreply.github.com> * Lint Signed-off-by: Eduardo Apolinario <eapolinario@users.noreply.github.com> Signed-off-by: Bernhard Stadlbauer <b.stadlbauer@gmx.net> Signed-off-by: Bernhard Stadlbauer <bernhard@pachama.com> Signed-off-by: Eduardo Apolinario <eapolinario@users.noreply.github.com> Co-authored-by: Eduardo Apolinario <653394+eapolinario@users.noreply.github.com> Co-authored-by: Eduardo Apolinario <eapolinario@users.noreply.github.com> * Add support for overriding task configurations (#1410) Signed-off-by: Kevin Su <pingsutw@apache.org> * Warning if git is not installed (#1414) * warning if git is not installed Signed-off-by: Kevin Su <pingsutw@apache.org> * lint Signed-off-by: Kevin Su <pingsutw@apache.org> Signed-off-by: Kevin Su <pingsutw@apache.org> * Flip the settings for channel and logger (#1415) Signed-off-by: Yee Hing Tong <wild-endeavor@users.noreply.github.com> * Preserving Exception in the LazyEntity fetch (#1412) * Preserving Exception in the LazyEntity fetch Signed-off-by: Ketan Umare <ketan.umare@gmail.com> * updated lint error Signed-off-by: Ketan Umare <ketan.umare@gmail.com> * more tests Signed-off-by: Ketan Umare <ketan.umare@gmail.com> Signed-off-by: Ketan Umare <ketan.umare@gmail.com> * [Docs] SynchronousFlyteClient API reference #3095 (#1416) Signed-off-by: Peeter Piegaze <peeter@union.ai> Signed-off-by: Peeter Piegaze <peeter@union.ai> Co-authored-by: Peeter Piegaze <peeter@union.ai> Co-authored-by: Haytham Abuelfutuh <haytham@afutuh.com> * Return error code on fail (#1408) * AWS batch return error code once it fails Signed-off-by: Kevin Su <pingsutw@gmail.com> * AWS batch return error code once it fails Signed-off-by: Kevin Su <pingsutw@gmail.com> * update tests Signed-off-by: Kevin Su <pingsutw@gmail.com> * Update tests Signed-off-by: Kevin Su <pingsutw@apache.org> Signed-off-by: Kevin Su <pingsutw@gmail.com> Signed-off-by: Kevin Su <pingsutw@apache.org> * wrapping flyte entity in a task node in call to flyte node constructor, not sure if integration tests are actually running (#1422) Signed-off-by: Yee Hing Tong <wild-endeavor@users.noreply.github.com> Signed-off-by: Yee Hing Tong <wild-endeavor@users.noreply.github.com> * Sqlalchemy multiline query (#1421) * SQLAlchemyTask should handle multiline strings for query template Signed-off-by: Niels Bantilan <niels.bantilan@gmail.com> * sqlalchemy supports multi-line query Signed-off-by: Niels Bantilan <niels.bantilan@gmail.com> * update base sql task Signed-off-by: Niels Bantilan <niels.bantilan@gmail.com> * remove space Signed-off-by: Niels Bantilan <niels.bantilan@gmail.com> * fix snowflake tests Signed-off-by: Niels Bantilan <niels.bantilan@gmail.com> * fix lint Signed-off-by: Niels Bantilan <niels.bantilan@gmail.com> * fix test Signed-off-by: Niels Bantilan <niels.bantilan@gmail.com> Signed-off-by: Niels Bantilan <niels.bantilan@gmail.com> * Sklearn type transformer should be automatically loaded with import flytekit (#1423) * add flytekit.extras.sklearn to main __init__ import Signed-off-by: Niels Bantilan <niels.bantilan@gmail.com> * fix docs Signed-off-by: Niels Bantilan <niels.bantilan@gmail.com> * add temporary docs/requirements.txt for onnx plugins Signed-off-by: Niels Bantilan <niels.bantilan@gmail.com> --------- Signed-off-by: Niels Bantilan <niels.bantilan@gmail.com> * Bump isort to 5.12.0 (#1427) Signed-off-by: Eduardo Apolinario <eapolinario@users.noreply.github.com> Co-authored-by: Eduardo Apolinario <eapolinario@users.noreply.github.com> * Fixes guess type bug in UnionTransformer (#1426) Signed-off-by: Ketan Umare <ketan.umare@gmail.com> Co-authored-by: Eduardo Apolinario <653394+eapolinario@users.noreply.github.com> * Add `pod_template` and `pod_template_name` arguments for `PythonAutoContainerTask`, its downstream tasks, and `@task`. (#1425) * Add `pod_template` and `pod_template_name` arguments for `PythonAutoContainerTask`, its downstream tasks, and `@task` Signed-off-by: byhsu <byhsu@linkedin.com> * clean Signed-off-by: byhsu <byhsu@linkedin.com> * fix test Signed-off-by: byhsu <byhsu@linkedin.com> * Fix taskmetadata Signed-off-by: byhsu <byhsu@linkedin.com> * add kubernetes in setup.py Signed-off-by: byhsu <byhsu@linkedin.com> * address comments Signed-off-by: byhsu <byhsu@linkedin.com> * Regenerate requirements using python 3.7 Signed-off-by: Eduardo Apolinario <eapolinario@users.noreply.github.com> Signed-off-by: byhsu <byhsu@linkedin.com> * keep container validation Signed-off-by: byhsu <byhsu@linkedin.com> * bump idl version Signed-off-by: byhsu <byhsu@linkedin.com> * Regenerate requirements using python 3.7 Signed-off-by: Eduardo Apolinario <eapolinario@users.noreply.github.com> * Regenerate doc-requirements.txt Signed-off-by: Eduardo Apolinario <eapolinario@users.noreply.github.com> * fix Signed-off-by: byhsu <byhsu@linkedin.com> --------- Signed-off-by: byhsu <byhsu@linkedin.com> Signed-off-by: Eduardo Apolinario <eapolinario@users.noreply.github.com> Co-authored-by: byhsu <byhsu@linkedin.com> Co-authored-by: Eduardo Apolinario <eapolinario@users.noreply.github.com> * Auto Backfill workflow (#1420) * Fix primitive decoder when evaluating Promise (#1432) Signed-off-by: Samhita Alla <aallasamhita@gmail.com> * set maximum python version to 3.10 (#1433) * set maximum python version to 3.10 Signed-off-by: Niels Bantilan <niels.bantilan@gmail.com> * remove unneeded python version check Signed-off-by: Niels Bantilan <niels.bantilan@gmail.com> * fix lint Signed-off-by: Niels Bantilan <niels.bantilan@gmail.com> --------- Signed-off-by: Niels Bantilan <niels.bantilan@gmail.com> * Revert "Remove project/domain from being overridden with execution values in serialized context (#1378)" (#1460) * Revert "Remove project/domain from being overridden with execution values in serialized context (#1378)" This reverts commit b3bfef5. * Import os Signed-off-by: Eduardo Apolinario <eapolinario@users.noreply.github.com> * Lint Signed-off-by: Eduardo Apolinario <eapolinario@users.noreply.github.com> --------- Signed-off-by: Eduardo Apolinario <eapolinario@users.noreply.github.com> Co-authored-by: Eduardo Apolinario <eapolinario@users.noreply.github.com> * Support checkpointing in local mode from cached tasks (#1457) * support checkpointing in local mode from cached tasks * clear cache before tests --------- Co-authored-by: Stef Nelson-Lindall <stef@stripe.com> Co-authored-by: Eduardo Apolinario <653394+eapolinario@users.noreply.github.com> * Deprecate FlyteSchema (#1418) * Deprecate FlyteSchema Signed-off-by: Kevin Su <pingsutw@apache.org> * Remove version Signed-off-by: Kevin Su <pingsutw@apache.org> --------- Signed-off-by: Kevin Su <pingsutw@apache.org> Co-authored-by: Eduardo Apolinario <653394+eapolinario@users.noreply.github.com> * Use scarf images (#1434) * Use scarf images Signed-off-by: Eduardo Apolinario <eapolinario@users.noreply.github.com> * Use scarf names in tests. Signed-off-by: Eduardo Apolinario <eapolinario@users.noreply.github.com> --------- Signed-off-by: Eduardo Apolinario <eapolinario@users.noreply.github.com> Co-authored-by: Eduardo Apolinario <eapolinario@users.noreply.github.com> * add undocumented objects/functions to flytekit api ref (#1502) * add reference_launch_plan to flytekit api ref Signed-off-by: Niels Bantilan <niels.bantilan@gmail.com> * import in init, add docstrings Signed-off-by: Niels Bantilan <niels.bantilan@gmail.com> * add more to references Signed-off-by: Niels Bantilan <niels.bantilan@gmail.com> * fix lint Signed-off-by: Niels Bantilan <niels.bantilan@gmail.com> * update Signed-off-by: Niels Bantilan <niels.bantilan@gmail.com> * fix up docstrings Signed-off-by: Niels Bantilan <niels.bantilan@gmail.com> --------- Signed-off-by: Niels Bantilan <niels.bantilan@gmail.com> Co-authored-by: Eduardo Apolinario <653394+eapolinario@users.noreply.github.com> Co-authored-by: Samhita Alla <aallasamhita@gmail.com> * Use non-root user in default flytekit image (#1417) Signed-off-by: Kevin Su <pingsutw@apache.org> * Fix PyTorch transformer (#1510) Signed-off-by: Samhita Alla <aallasamhita@gmail.com> * Fix mypy errors (#1313) * wip Signed-off-by: Kevin Su <pingsutw@apache.org> * Fix mypy errors Signed-off-by: Kevin Su <pingsutw@apache.org> * Fix mypy errors Signed-off-by: Kevin Su <pingsutw@apache.org> * Fix tests Signed-off-by: Kevin Su <pingsutw@apache.org> * Fix tests Signed-off-by: Kevin Su <pingsutw@apache.org> * Fix tests Signed-off-by: Kevin Su <pingsutw@apache.org> * wip Signed-off-by: Kevin Su <pingsutw@apache.org> * wip Signed-off-by: Kevin Su <pingsutw@apache.org> * fix tests Signed-off-by: Kevin Su <pingsutw@apache.org> * fix tests Signed-off-by: Kevin Su <pingsutw@apache.org> * fix test Signed-off-by: Kevin Su <pingsutw@apache.org> * nit Signed-off-by: Kevin Su <pingsutw@apache.org> * Update type Signed-off-by: Kevin Su <pingsutw@apache.org> * Fix tests Signed-off-by: Kevin Su <pingsutw@apache.org> * Fix tests Signed-off-by: Kevin Su <pingsutw@apache.org> * Fix tests Signed-off-by: Kevin Su <pingsutw@apache.org> * nit Signed-off-by: Kevin Su <pingsutw@apache.org> * update dev-requirements.txt Signed-off-by: Kevin Su <pingsutw@apache.org> * Address comment Signed-off-by: Kevin Su <pingsutw@apache.org> * upgrade torch Signed-off-by: Kevin Su <pingsutw@apache.org> * nit Signed-off-by: Kevin Su <pingsutw@apache.org> * lint Signed-off-by: Kevin Su <pingsutw@apache.org> --------- Signed-off-by: Kevin Su <pingsutw@apache.org> Signed-off-by: Kevin Su <pingsutw@gmail.com> Co-authored-by: Yee Hing Tong <wild-endeavor@users.noreply.github.com> * Compile the workflow only at compile time (#1311) * wip Signed-off-by: Kevin Su <pingsutw@apache.org> * wip Signed-off-by: Kevin Su <pingsutw@apache.org> * wip Signed-off-by: Kevin Su <pingsutw@apache.org> * wip Signed-off-by: Kevin Su <pingsutw@apache.org> * wip Signed-off-by: Kevin Su <pingsutw@apache.org> * add tests Signed-off-by: Kevin Su <pingsutw@apache.org> * add tests Signed-off-by: Kevin Su <pingsutw@apache.org> * support dynamic task Signed-off-by: Kevin Su <pingsutw@apache.org> * test Signed-off-by: Kevin Su <pingsutw@apache.org> * test Signed-off-by: Kevin Su <pingsutw@apache.org> * nit Signed-off-by: Kevin Su <pingsutw@apache.org> * lazy compile Signed-off-by: Kevin Su <pingsutw@apache.org> * lint Signed-off-by: Kevin Su <pingsutw@apache.org> * add tests Signed-off-by: Kevin Su <pingsutw@apache.org> * nit Signed-off-by: Kevin Su <pingsutw@apache.org> * nit Signed-off-by: Kevin Su <pingsutw@apache.org> * test Signed-off-by: Kevin Su <pingsutw@apache.org> * test Signed-off-by: Kevin Su <pingsutw@apache.org> * lint Signed-off-by: Kevin Su <pingsutw@apache.org> * test Signed-off-by: Kevin Su <pingsutw@apache.org> * test Signed-off-by: Kevin Su <pingsutw@apache.org> * test Signed-off-by: Kevin Su <pingsutw@apache.org> * test Signed-off-by: Kevin Su <pingsutw@apache.org> * nit Signed-off-by: Kevin Su <pingsutw@apache.org> * nit Signed-off-by: Kevin Su <pingsutw@apache.org> * nit Signed-off-by: Kevin Su <pingsutw@apache.org> * update test Signed-off-by: Kevin Su <pingsutw@apache.org> --------- Signed-off-by: Kevin Su <pingsutw@apache.org> * Get the origin type when serializing dataclass (#1508) * Get the origin type when serializing dataclass Signed-off-by: Kevin Su <pingsutw@apache.org> * test Signed-off-by: Kevin Su <pingsutw@apache.org> * nit Signed-off-by: Kevin Su <pingsutw@apache.org> * update test Signed-off-by: Kevin Su <pingsutw@apache.org> * lint Signed-off-by: Kevin Su <pingsutw@apache.org> * nit Signed-off-by: Kevin Su <pingsutw@apache.org> --------- Signed-off-by: Kevin Su <pingsutw@apache.org> Co-authored-by: Niels Bantilan <niels.bantilan@gmail.com> * Fix bad merge Signed-off-by: Eduardo Apolinario <eapolinario@users.noreply.github.com> * Delay initialization of SynchronousFlyteClient in FlyteRemote (#1514) * Delay initialization of SynchronousFlyteClient in FlyteRemote Signed-off-by: Eduardo Apolinario <eapolinario@users.noreply.github.com> * Fix spark plugin flyteremote test. Signed-off-by: Eduardo Apolinario <eapolinario@users.noreply.github.com> * Lint Signed-off-by: Eduardo Apolinario <eapolinario@users.noreply.github.com> --------- Signed-off-by: Eduardo Apolinario <eapolinario@users.noreply.github.com> Co-authored-by: Eduardo Apolinario <eapolinario@users.noreply.github.com> * Set flytekit and flyteidl bounds in plugins tests Signed-off-by: Eduardo Apolinario <eapolinario@users.noreply.github.com> * Revert "Fix mypy errors (#1313)" This reverts commit 3798450. Signed-off-by: Eduardo Apolinario <eapolinario@users.noreply.github.com> * Fix requirements in dask and ray plugins Signed-off-by: Eduardo Apolinario <eapolinario@users.noreply.github.com> * Fix papermill tests requirements Signed-off-by: Eduardo Apolinario <eapolinario@users.noreply.github.com> * Fix doc-requirements Signed-off-by: Eduardo Apolinario <eapolinario@users.noreply.github.com> * dask plugin requirements Signed-off-by: Eduardo Apolinario <eapolinario@users.noreply.github.com> * Revert "Add dask plugin #patch (#1366)" This reverts commit 41a9c7a. Signed-off-by: Eduardo Apolinario <eapolinario@users.noreply.github.com> --------- Signed-off-by: Eduardo Apolinario <eapolinario@users.noreply.github.com> Signed-off-by: Ryan Nazareth <ryankarlos@gmail.com> Signed-off-by: Kevin Su <pingsutw@apache.org> Signed-off-by: Yee Hing Tong <wild-endeavor@users.noreply.github.com> Signed-off-by: Samhita Alla <aallasamhita@gmail.com> Signed-off-by: Ketan Umare <ketan.umare@gmail.com> Signed-off-by: Niels Bantilan <niels.bantilan@gmail.com> Signed-off-by: Patrick Brogan <pbrogan12@gmail.com> Signed-off-by: Bernhard Stadlbauer <b.stadlbauer@gmx.net> Signed-off-by: Bernhard Stadlbauer <bernhard@pachama.com> Signed-off-by: Peeter Piegaze <peeter@union.ai> Signed-off-by: Kevin Su <pingsutw@gmail.com> Signed-off-by: byhsu <byhsu@linkedin.com> Co-authored-by: Eduardo Apolinario <eapolinario@users.noreply.github.com> Co-authored-by: Ryan Nazareth <ryankarlos@gmail.com> Co-authored-by: Kevin Su <pingsutw@apache.org> Co-authored-by: Yee Hing Tong <wild-endeavor@users.noreply.github.com> Co-authored-by: Samhita Alla <aallasamhita@gmail.com> Co-authored-by: H. Furkan Vural <33652917+hfurkanvural@users.noreply.github.com> Co-authored-by: Ketan Umare <16888709+kumare3@users.noreply.github.com> Co-authored-by: mcloney-ddm <119345186+mcloney-ddm@users.noreply.github.com> Co-authored-by: Niels Bantilan <niels.bantilan@gmail.com> Co-authored-by: pbrogan12 <pbrogan12@gmail.com> Co-authored-by: bstadlbauer <11799671+bstadlbauer@users.noreply.github.com> Co-authored-by: Peeter Piegaze <peeter@piegaze.com> Co-authored-by: Peeter Piegaze <peeter@union.ai> Co-authored-by: Haytham Abuelfutuh <haytham@afutuh.com> Co-authored-by: ByronHsu <byronhsu1230@gmail.com> Co-authored-by: byhsu <byhsu@linkedin.com> Co-authored-by: Stef Lindall <bethebunny@gmail.com> Co-authored-by: Stef Nelson-Lindall <stef@stripe.com>


TL;DR
This flyte feature adds support for users to read and write from
.tfrecordfile formatsusing Tensorflow Example as a native type.
Type
Are all requirements met?
Complete description
TensorflowExampleTransformertype inflytekit/extras/tensorflow/records.pywhich uses the [tf.train.Example] (https://www.tensorflow.org/api_docs/python/tf/train/Example) message, and then serialize, write, and read tf.train.Example messages to and from.tfrecordfiles, following the examples in the Tensorflow docs https://www.tensorflow.org/tutorials/load_data/tfrecordtests/flytekit/unit/extras/tensorflow/test_transformations.pyTracking Issue
flyteorg/flyte#2571