Skip to content

Stop using docker manifest to check for image presence #17883

Merged
potiuk merged 1 commit into
apache:mainfrom
potiuk:avoid_raace_condition_on_configuring_experimental_docker
Sep 21, 2021
Merged

Stop using docker manifest to check for image presence #17883
potiuk merged 1 commit into
apache:mainfrom
potiuk:avoid_raace_condition_on_configuring_experimental_docker

Conversation

@potiuk

@potiuk potiuk commented Aug 28, 2021

Copy link
Copy Markdown
Member

We need to set the "experimental" flag in CI in order to use
docker manifest command to check for presence of the images
in ghcr.io. In order to use them we need to enable experimental
features via ~/.docker/config.json.

Sometimes, very rarely, we had the case that the config file
got broken and the problem turned out to be that we tried
to do this experimental replacement in parallel by several
running "wait image" commands (🤦 here for myself)
that were apparenlty overriding the same config.json
at the same time in non-atomic way, which (very rarely)
led to corrupted file.

However for quite some time we pulled the image immediately
after it was available, in order to verify the image,
so rather than checking if the image
is there via manifest, we can simply pull the image
and effect will be the same - if it fails, the image is not
there, if it has been pulled - we can immediately verify
it. We do not need experimental flag at all for that
so no messing around with .docker/config.json is needed
at all.

@potiuk

potiuk commented Aug 28, 2021

Copy link
Copy Markdown
Member Author

One more fix to accidentally failing tests . Such a 🤦 on me.

@potiuk potiuk changed the title Avoids race condition on parallel builds Avoid race condition when setting experimental docker flag Aug 28, 2021
@potiuk potiuk force-pushed the avoid_raace_condition_on_configuring_experimental_docker branch 7 times, most recently from 75041d2 to e9116b4 Compare August 30, 2021 18:40
@uranusjr uranusjr closed this Aug 31, 2021
@uranusjr uranusjr reopened this Aug 31, 2021
@potiuk potiuk closed this Aug 31, 2021
@potiuk potiuk reopened this Aug 31, 2021
@potiuk

potiuk commented Aug 31, 2021

Copy link
Copy Markdown
Member Author

It were bad - downgraded constraints (the downgrade should be fixed by #17939 )

Comment thread scripts/ci/images/ci_prepare_prod_image_on_ci.sh Outdated
@potiuk potiuk force-pushed the avoid_raace_condition_on_configuring_experimental_docker branch 3 times, most recently from c4e9391 to 3fb4213 Compare September 10, 2021 20:41
@potiuk potiuk force-pushed the avoid_raace_condition_on_configuring_experimental_docker branch 3 times, most recently from 3369726 to 916dcec Compare September 18, 2021 11:31
@potiuk potiuk changed the title Avoid race condition when setting experimental docker flag Stop using docker manifest to check for image presence Sep 18, 2021
@potiuk

potiuk commented Sep 18, 2021

Copy link
Copy Markdown
Member Author

I removed completely the "experimental" flag @uranusjr - this only caused troubles and I actually need to pull the image anyway (for verification) once it is ready - so rather than checking if image is there via manifest I am pulling it straight away now - this will be cleaner, less experimental and simpler (and it will work better I hope - sometimes the config.json got irreparably broken.

@potiuk potiuk force-pushed the avoid_raace_condition_on_configuring_experimental_docker branch from 916dcec to 0abef5f Compare September 18, 2021 12:34
@potiuk potiuk closed this Sep 18, 2021
@potiuk potiuk reopened this Sep 18, 2021
@potiuk

potiuk commented Sep 18, 2021

Copy link
Copy Markdown
Member Author

Rebuilding to see if the failures are transient.

@potiuk potiuk force-pushed the avoid_raace_condition_on_configuring_experimental_docker branch from d086793 to 55d4a9b Compare September 18, 2021 19:44
@potiuk potiuk closed this Sep 18, 2021
@potiuk potiuk reopened this Sep 18, 2021
@potiuk

potiuk commented Sep 18, 2021

Copy link
Copy Markdown
Member Author

Just MSSQL tests wich has to be fixed anywyay. But it's good

@potiuk potiuk force-pushed the avoid_raace_condition_on_configuring_experimental_docker branch 5 times, most recently from 6d0c620 to 5015b58 Compare September 20, 2021 09:26
@potiuk

potiuk commented Sep 20, 2021

Copy link
Copy Markdown
Member Author

All Green. This will remove another source of random CI failures.

@potiuk potiuk requested review from BasPH and turbaszek September 20, 2021 15:00
We need to set the "experimental" flag in CI in order to use
`docker manifest` command to check for presence of the images
in ghcr.io. In order to use them we need to enable experimental
features via ~/.docker/config.json.

Sometimes, very rarely, we had the case that the config file
got broken and the problem turned out to be that we tried
to do this experimental replacement in parallel by several
running "wait image" commands (:facepalm: here for myself)
that were apparenlty overriding the same config.json
at the same time in non-atomic way, which (very rarely)
led to corrupted file.

However for quite some time we pulled the image immediately
after it was available, in order to verify the image,
so rather than checking if the image
is there via manifest, we can simply pull the image
and effect will be the same - if it fails, the image is not
there, if it has been pulled - we can immediately verify
it. We do not need experimental flag at all for that
so no messing around with .docker/config.json is needed
at all.
@potiuk potiuk force-pushed the avoid_raace_condition_on_configuring_experimental_docker branch from 5015b58 to c766e4b Compare September 20, 2021 15:03
@potiuk potiuk merged commit a8184e4 into apache:main Sep 21, 2021
@potiuk potiuk deleted the avoid_raace_condition_on_configuring_experimental_docker branch September 21, 2021 09:46
potiuk added a commit to potiuk/airflow that referenced this pull request Sep 22, 2021
The change apache#17883 missed re-tagging images after downloading
which caused unnecessary image rebuilding for PRs as well as using
wrong image for doc building - previous version of docs were used
when building docs!
potiuk added a commit that referenced this pull request Sep 22, 2021
The change #17883 missed re-tagging images after downloading
which caused unnecessary image rebuilding for PRs as well as using
wrong image for doc building - previous version of docs were used
when building docs!
potiuk added a commit to potiuk/airflow that referenced this pull request Sep 22, 2021
Recently we had a problem that our CI got broken because
pulled images were not tagged properly after apache#17883 missed image
tagging. This has been fixed in apache#18433 but the problem is that this
might happen in the future and mignt not get noticed on time.

This PR prevents from similar situations happnening. Whenever we
try to run doc building or tests we set --pull policy to never
for both docker and docker compose which should simply fail if
the images were not pulled and tagged properly rather than
fail over to pulling latest `main` image.
potiuk added a commit that referenced this pull request Sep 22, 2021
Recently we had a problem that our CI got broken because
pulled images were not tagged properly after #17883 missed image
tagging. This has been fixed in #18433 but the problem is that this
might happen in the future and mignt not get noticed on time.

This PR prevents from similar situations happnening. Whenever we
try to run doc building or tests we set --pull policy to never
for both docker and docker compose which should simply fail if
the images were not pulled and tagged properly rather than
fail over to pulling latest `main` image.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants