Add GCS Support#6
Conversation
|
@matthewphsmith What is the status of this PR? Would you like us to review as well? Thanks. |
|
It's good to go from my perspective, I just need an approval on a review. |
|
+1 bump version plz |
|
|
||
| def download_directory(self, remote_path, local_path): | ||
| """ | ||
| :param Text remote_path: remote s3:// path |
|
I tried to the PR and the |
| raise ValueError("Not an GS Key. Please use FQN (GS ARN) of the format gs://...") | ||
|
|
||
| GCSProxy._check_binary() | ||
| cmd = [GCSProxy._GS_UTIL_CLI, "cp", "-r", local_path, remote_path] |
There was a problem hiding this comment.
I think we can change local_path to local_path + '/*' to just avoid upload the whole engine_dir folder.
There was a problem hiding this comment.
The current behaviour in this PR seems fine to me, because the expectation of this function is to upload the specified local path. Maybe it is the caller who should use ../* instead?
There was a problem hiding this comment.
We can confirm that gsutil and aws s3 behave differently when copying a dir recursively.
e.g:
$ tree test
test
└── test1
└── test.txt
1 directories, 1 file
$ gsutil cp -r test/* gs://flyte-test
Copying file://test/test1/a/b/test.txt [Content-Type=text/plain]...
/ [1 files][ 0.0 B/ 0.0 B]
Operation completed over 1 objects.
$ gsutil ls gs://flyte-test
gs://flyte-test//
gs://flyte-test/test1/ <--------
vs.
$ gsutil cp -r test gs://flyte-test
Copying file://test/test1/a/b/test.txt [Content-Type=text/plain]...
/ [1 files][ 0.0 B/ 0.0 B]
Operation completed over 1 objects.
$ gsutil ls gs://flyte-test
gs://flyte-test//
gs://flyte-test/test/ <--------
|
I think there is a caveat using |
|
Rebased and addressed the comments #41 |
* temp Signed-off-by: Yee Hing Tong <wild-endeavor@users.noreply.github.com> * stuff Signed-off-by: Yee Hing Tong <wild-endeavor@users.noreply.github.com> * temp Signed-off-by: Yee Hing Tong <wild-endeavor@users.noreply.github.com> * scaffolding areas mostly identified Signed-off-by: Yee Hing Tong <wild-endeavor@users.noreply.github.com> * add artifact to upload request Signed-off-by: Yee Hing Tong <wild-endeavor@users.noreply.github.com> * remove an unnecessary line in workflow Signed-off-by: Yee Hing Tong <wild-endeavor@users.noreply.github.com> * finish adding examples use cases maybe Signed-off-by: Yee Hing Tong <wild-endeavor@users.noreply.github.com> * add project/dom to get query Signed-off-by: Yee Hing Tong <wild-endeavor@users.noreply.github.com> * add from flyte idl Signed-off-by: Yee Hing Tong <wild-endeavor@users.noreply.github.com> * add project domain to as query Signed-off-by: Yee Hing Tong <wild-endeavor@users.noreply.github.com> * add condition in parameter to flyte idl Signed-off-by: Yee Hing Tong <wild-endeavor@users.noreply.github.com> * test stuff * Remove artifactID from literal oneof, add to metadata (#2) * Triggers (#6) * Minor changes to get time series example working #8 Signed-off-by: Yee Hing Tong <wild-endeavor@users.noreply.github.com> * switch channel (#10) Signed-off-by: Yee Hing Tong <wild-endeavor@users.noreply.github.com> * fix tests ignore - pr into other pr (#1858) Signed-off-by: Yee Hing Tong <wild-endeavor@users.noreply.github.com> * Artf/update idl ux (#1920) Signed-off-by: Yee Hing Tong <wild-endeavor@users.noreply.github.com> * Artf/trigger (#1948) * Add triggers * Remove bind_partition_time and just assume users won't use that. It's just time_partition in the normal call function now. Signed-off-by: Yee Hing Tong <wild-endeavor@users.noreply.github.com> * remove the now deleted artifact spec (#1984) Signed-off-by: Yee Hing Tong <wild-endeavor@users.noreply.github.com> * Literal metadata model update (#2089) Signed-off-by: Yee Hing Tong <wild-endeavor@users.noreply.github.com> * Separate time partition (#2114) Signed-off-by: Yee Hing Tong <wild-endeavor@users.noreply.github.com> * Split service code (#2136) Signed-off-by: Yee Hing Tong <wild-endeavor@users.noreply.github.com> * remove empty files Signed-off-by: Yee Hing Tong <wild-endeavor@users.noreply.github.com> * add noneness check to metadata and add test Signed-off-by: Yee Hing Tong <wild-endeavor@users.noreply.github.com> * remove sandbox test for now Signed-off-by: Yee Hing Tong <wild-endeavor@users.noreply.github.com> * Artf/cleanup (#2158) * add a test Signed-off-by: Yee Hing Tong <wild-endeavor@users.noreply.github.com> * try updates Signed-off-by: Yee Hing Tong <wild-endeavor@users.noreply.github.com> --------- Signed-off-by: Yee Hing Tong <wild-endeavor@users.noreply.github.com> * Use python 3.9 to run make doc-requirements.txt Signed-off-by: Eduardo Apolinario <eapolinario@users.noreply.github.com> * reasons not msg Signed-off-by: Yee Hing Tong <wild-endeavor@users.noreply.github.com> --------- Signed-off-by: Yee Hing Tong <wild-endeavor@users.noreply.github.com> Signed-off-by: Eduardo Apolinario <eapolinario@users.noreply.github.com> Co-authored-by: Eduardo Apolinario <eapolinario@users.noreply.github.com>
* temp Signed-off-by: Yee Hing Tong <wild-endeavor@users.noreply.github.com> * stuff Signed-off-by: Yee Hing Tong <wild-endeavor@users.noreply.github.com> * temp Signed-off-by: Yee Hing Tong <wild-endeavor@users.noreply.github.com> * scaffolding areas mostly identified Signed-off-by: Yee Hing Tong <wild-endeavor@users.noreply.github.com> * add artifact to upload request Signed-off-by: Yee Hing Tong <wild-endeavor@users.noreply.github.com> * remove an unnecessary line in workflow Signed-off-by: Yee Hing Tong <wild-endeavor@users.noreply.github.com> * finish adding examples use cases maybe Signed-off-by: Yee Hing Tong <wild-endeavor@users.noreply.github.com> * add project/dom to get query Signed-off-by: Yee Hing Tong <wild-endeavor@users.noreply.github.com> * add from flyte idl Signed-off-by: Yee Hing Tong <wild-endeavor@users.noreply.github.com> * add project domain to as query Signed-off-by: Yee Hing Tong <wild-endeavor@users.noreply.github.com> * add condition in parameter to flyte idl Signed-off-by: Yee Hing Tong <wild-endeavor@users.noreply.github.com> * test stuff * Remove artifactID from literal oneof, add to metadata (#2) * Triggers (#6) * Minor changes to get time series example working #8 Signed-off-by: Yee Hing Tong <wild-endeavor@users.noreply.github.com> * switch channel (#10) Signed-off-by: Yee Hing Tong <wild-endeavor@users.noreply.github.com> * fix tests ignore - pr into other pr (#1858) Signed-off-by: Yee Hing Tong <wild-endeavor@users.noreply.github.com> * Artf/update idl ux (#1920) Signed-off-by: Yee Hing Tong <wild-endeavor@users.noreply.github.com> * Artf/trigger (#1948) * Add triggers * Remove bind_partition_time and just assume users won't use that. It's just time_partition in the normal call function now. Signed-off-by: Yee Hing Tong <wild-endeavor@users.noreply.github.com> * remove the now deleted artifact spec (#1984) Signed-off-by: Yee Hing Tong <wild-endeavor@users.noreply.github.com> * Literal metadata model update (#2089) Signed-off-by: Yee Hing Tong <wild-endeavor@users.noreply.github.com> * Separate time partition (#2114) Signed-off-by: Yee Hing Tong <wild-endeavor@users.noreply.github.com> * Split service code (#2136) Signed-off-by: Yee Hing Tong <wild-endeavor@users.noreply.github.com> * remove empty files Signed-off-by: Yee Hing Tong <wild-endeavor@users.noreply.github.com> * add noneness check to metadata and add test Signed-off-by: Yee Hing Tong <wild-endeavor@users.noreply.github.com> * remove sandbox test for now Signed-off-by: Yee Hing Tong <wild-endeavor@users.noreply.github.com> * Artf/cleanup (#2158) * add a test Signed-off-by: Yee Hing Tong <wild-endeavor@users.noreply.github.com> * try updates Signed-off-by: Yee Hing Tong <wild-endeavor@users.noreply.github.com> --------- Signed-off-by: Yee Hing Tong <wild-endeavor@users.noreply.github.com> * Use python 3.9 to run make doc-requirements.txt Signed-off-by: Eduardo Apolinario <eapolinario@users.noreply.github.com> * reasons not msg Signed-off-by: Yee Hing Tong <wild-endeavor@users.noreply.github.com> --------- Signed-off-by: Yee Hing Tong <wild-endeavor@users.noreply.github.com> Signed-off-by: Eduardo Apolinario <eapolinario@users.noreply.github.com> Co-authored-by: Eduardo Apolinario <eapolinario@users.noreply.github.com> Signed-off-by: Eduardo Apolinario <eapolinario@users.noreply.github.com>
* remove cruft and add agent * clean up updated agent, get it running locally * switch to poetry to enable installing extras * cleanup and black formatting * dumb mypy change to pass ci * ruff * maybe fix test? * fix logging, set log level * black formatting and ruff fix * update agent to create pipeline job * cleanup agent, shuffle deps * update Pipfile.lock * black format * black + mypy * ruff * update testt * update test to stop testing DominoJobTask logic * get dominoHost from a secret * add os back * [DOM-52824] Add pipeline interface args to pipeline config sent to job api (#6) * add interfaces to pipelineConfig job api arg * ruff * update deps * disabled test for now * Re-enable test * Appease the testing gods * provide output paths * supply inputMetadataPrefix to jobs api * use same bucket for raw output * Build container image with nonroot support Build with: > docker build -t train-flyte-domino-agent-service -f ./build/docker/Dockerfile . - With the previous container image ❯ docker run -i --rm --user 1000 --cap-drop all docker.io/library/train-flyte-domino-agent-service Traceback (most recent call last): File "/usr/local/bin/pyflyte", line 5, in <module> from flytekit.clis.sdk_in_container.pyflyte import main File "/usr/local/lib/python3.9/site-packages/flytekit/__init__.py", line 208, in <module> from flytekit.core.base_sql_task import SQLTask File "/usr/local/lib/python3.9/site-packages/flytekit/core/base_sql_task.py", line 4, in <module> from flytekit.core.base_task import PythonTask, TaskMetadata File "/usr/local/lib/python3.9/site-packages/flytekit/core/base_task.py", line 31, in <module> from flytekit.core.context_manager import ( File "/usr/local/lib/python3.9/site-packages/flytekit/core/context_manager.py", line 33, in <module> from flytekit.core.data_persistence import FileAccessProvider, default_local_file_access_provider File "/usr/local/lib/python3.9/site-packages/flytekit/core/data_persistence.py", line 513, in <module> data_config=DataConfig.auto(), File "/usr/local/lib/python3.9/site-packages/flytekit/configuration/__init__.py", line 606, in auto config_file = get_config_file(config_file) File "/usr/local/lib/python3.9/site-packages/flytekit/configuration/file.py", line 259, in get_config_file if current_location_config.exists(): File "/usr/local/lib/python3.9/pathlib.py", line 1424, in exists self.stat() File "/usr/local/lib/python3.9/pathlib.py", line 1232, in stat return self._accessor.stat(self) PermissionError: [Errno 13] Permission denied: 'flytekit.config' - With the updated container build ❯ docker run -i --rm --user 1000 --cap-drop all docker.io/library/train-flyte-domino-agent-service Starting up the server to expose the prometheus metrics... Starting the agent service... * add datasetSnapshots to agent -> domino comms * drop Pipfile in favor of poetry * multi stage build * add Pipfile back * defer to nucleus for start pipeline job validation * update readme * add some debugging instructions * black * fix get domino host * fix lint test --------- Co-authored-by: integration-test integration-test <test-notifs+integration-test@dominodatalab.com> Co-authored-by: ddl-ryan-connor <106334081+ddl-ryan-connor@users.noreply.github.com> Co-authored-by: ddl-ebrown <ethan.brown@dominodatalab.com> Co-authored-by: Ryan Connor <ryan.connor@dominodatalab.com>
This adds support for uploading and downloading data to/from GCS.