Skip to content

Azure-compatibility#610

Open
violetbrina wants to merge 11 commits into
mainfrom
azure-compatibility
Open

Azure-compatibility#610
violetbrina wants to merge 11 commits into
mainfrom
azure-compatibility

Conversation

@violetbrina

Copy link
Copy Markdown
Collaborator

Changes to the analysis runner to enable Azure compatability.

@violetbrina violetbrina self-assigned this Mar 28, 2023
@violetbrina violetbrina requested a review from illusional March 28, 2023 03:08

@illusional illusional left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking like awesome progress!

'-c',
'--cloud',
required=False,
default=DEFAULT_CLOUD_ENVIRONMENT,

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we omit instead of provide a default? Which lets the analysis-runner decide the default

Comment thread server/util.py
Comment on lines +124 to +138
if environment == 'gcp':
# do this to check access-members cache
gcp_project = dataset_config.get('gcp', {}).get('projectId')

if not gcp_project:
raise web.HTTPBadRequest(
reason=f'The analysis-runner does not support checking group members for the {environment} environment'
)
elif environment == 'azure':
azure_resource_group = dataset_config.get('azure', {}).get('resourceGroup')

if not azure_resource_group:
raise web.HTTPBadRequest(
reason=f'The analysis-runner does not support checking group members for the {environment} environment'
)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can remove this, group member checks are not in secrets, therefore no gcp_project ID is needed anymore (I think):

Suggested change
if environment == 'gcp':
# do this to check access-members cache
gcp_project = dataset_config.get('gcp', {}).get('projectId')
if not gcp_project:
raise web.HTTPBadRequest(
reason=f'The analysis-runner does not support checking group members for the {environment} environment'
)
elif environment == 'azure':
azure_resource_group = dataset_config.get('azure', {}).get('resourceGroup')
if not azure_resource_group:
raise web.HTTPBadRequest(
reason=f'The analysis-runner does not support checking group members for the {environment} environment'
)

Comment thread server/util.py Outdated
if environment == 'gcp':
output_dir = f'gs://cpg-{dataset}-{cpg_namespace(access_level)}/{output_prefix}'
elif environment == 'azure':
# TODO: need a way for analysis runner to know where to save metadata

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It follows the same sort of convention right, where storage-account is cpg{datasetWithoutTabs}

azure://{storage-account}/{main,test}/{output_prefix}

Comment thread test/hail_batch_job.py
import hailtop.batch as hb


@click.command()

@illusional illusional Mar 28, 2023

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are a couple of test workflows in examples/batch, can you use them or move this one to there?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants