Skip to content

fix(ug_util): prioritize user-data.users over the default user config#6860

Open
mostafaCamel wants to merge 11 commits into
canonical:mainfrom
mostafaCamel:eng/PR-6703
Open

fix(ug_util): prioritize user-data.users over the default user config#6860
mostafaCamel wants to merge 11 commits into
canonical:mainfrom
mostafaCamel:eng/PR-6703

Conversation

@mostafaCamel
Copy link
Copy Markdown
Contributor

@mostafaCamel mostafaCamel commented May 3, 2026

Proposed Commit Message

fix(ug_util): prioritize user-data.users over the default user config

Fixes GH-6703

Additional Context

Test Steps

Unit tests

tox -e py3 -- tests/unittests/distros/test_user_data_normalize.py::TestUGNormalize

Integration tests

  • Create a new ~/.config/pycloudlib.toml file and put [lxd] in this file

CLOUD_INIT_OS_IMAGE='noble' CLOUD_INIT_PLATFORM=lxd_container tox -e integration-tests -- tests/integration_tests/modules/test_users_groups.py tests/integration_tests/modules/test_set_password.py

Merge type

  • Squash merge using "Proposed Commit Message"
  • Rebase and merge unique commits. Requires commit messages per-commit each referencing the pull request number (#<PR_NUM>)

@mostafaCamel mostafaCamel changed the title Add unit test to test precedence behavior between user-data.users and… fix(ug_util): prioritize user-data.users over the default user config May 3, 2026
@mostafaCamel
Copy link
Copy Markdown
Contributor Author

mostafaCamel commented May 3, 2026

The first commit I pushed is failing as expected as I only added the unit test

The expected failure

TOTAL                                                          33026   6291  12224   1597    79%
=========================== short test summary info ============================
FAILED tests/unittests/distros/test_user_data_normalize.py::TestUGNormalize::test_users_dict_override_default_attribute - assert True is False
= 1 failed, 5641 passed, 5 skipped, 13 xfailed, 2 xpassed, 84 warnings in 118.45s (0:01:58) =
py3: exit 1 (129.32 seconds) /home/runner/work/cloud-init/cloud-init> .tox/py3/bin/python -m pytest -vv --cov=cloudinit --cov-branch --color=yes pid=2384
  py3: FAIL code 1 (135.99=setup[6.67]+cmd[129.32] seconds)
  evaluation failed :( (136.04 seconds)
Error: Process completed with exit code 1.

@mostafaCamel
Copy link
Copy Markdown
Contributor Author

#6703

@mostafaCamel
Copy link
Copy Markdown
Contributor Author

After pushing the code fix, all the unit tests now pass

Copy link
Copy Markdown
Member

@holmanb holmanb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add an integration test.

@mostafaCamel
Copy link
Copy Markdown
Contributor Author

mostafaCamel commented May 6, 2026

Note to myself about the integration tests (will work on it the weekend):

  • Declare 2 contansts the python integration: a plain-text password and its hash
    • I first need to run openssl passwd -6 "yourplainpassword" on your terminal to generate the hash and pass it as a constant in the python script
  • 2 ubuntu lxc containers A (with cloud-config.users ["default", {"name": "ubuntu", "hashed_passwd": "superpassword"}]) and B (with cloud-config.users ["default", {"name": "ubuntu", "hashed_passwd": "superpassword", "lock_passwd": False}])
  • confirm that the 2 contianers are up and running ( I don't know if the test helpers have such a function... can skip it given the following steps)
  • grep -E "^ubuntu:!<hashedpassword>: /etc/shadow for container A (and that there is only one occurence) ... notice the exclamation amrk as it indicates the password is locked
  • grep -E "^ubuntu:!<hashedpassword>: /etc/shadow for container B (and that there is only one occurence)
    • No need to confirm there is there is only one occurrence in grep as ^ubuntu: is enough as usernames are unique
  • Console access to container A should fail
  • Console access to container B should succeed

… default user config

Signed-off-by: Mostafa Abdelwahab <mostafa.abdelwahab@canonical.com>
Signed-off-by: Mostafa Abdelwahab <mostafa.abdelwahab@canonical.com>
@mostafaCamel mostafaCamel force-pushed the eng/PR-6703 branch 2 times, most recently from 74bc21d to 151d862 Compare May 15, 2026 15:30
@mostafaCamel
Copy link
Copy Markdown
Contributor Author

Pushed a new commit with the integration tests:

  • added a new test_hashed_password_without_lock_passwd_override_is_locked in tests/integration_tests/modules/test_set_password.py to confirm the locked_password:True nehabior (excalamtion mark in /etc/shadow)
  • added a new test_default_user_settings_override method (NOTE: marked it as a ci test) in tests/integration_tests/modules/test_users_groups.py to confirm the override of defaultsettings actually works (shell and lock_passwd)
  • added a new test_default_user_settings (NOTE: did not mark it as a ci test) in tests/integration_tests/modules/test_users_groups.py as "a negative control" to confirm the default settings of the ubuntu user. I don't think it is worth of running in CI hence I did not put the ci pytest mark
  • moved the function fetch_and_parse_etc_shadow to tests/integration_tests/util.py as it is now used by multiple test files

The integration test succeeded locally CLOUD_INIT_OS_IMAGE='noble' CLOUD_INIT_PLATFORM=lxd_container tox -e integration-tests -- tests/integration_tests/modules/test_users_groups.py::test_default_user_settings_override
Howeve, the same test fails in CI .... the assertion fails as there is a dicrepancy between the expected (the override /bin/sh) and the actual which seems to still be the default)...
it is as if my code fix is not being ingested in the deb package used for the CI

I ran again the integration test locally qith questing to amtch the CI and the local test still succeeds CLOUD_INIT_OS_IMAGE='questing' tox -e integration-tests -- tests/integration_tests/modules/test_users_groups.py::test_default_user_settings_override

@mostafaCamel
Copy link
Copy Markdown
Contributor Author

mostafaCamel commented May 15, 2026

I confirmed that the debian package generated in the CI task (link in the comment above) contains my change

  • went into this step Archive debs as artifacts and I download the zip from the outputted downlaod url
  • Unzipped the file which created the .deb file in cloud-init-questing-deb/cloud-init-base_26.1-1~bddeb~25.10.1_all.deb on my local
  • Ran dpkg-deb -x cloud-init-questing-deb/cloud-init-base_26.1-1~bddeb~25.10.1_all.deb cloud-init-questing-deb to extract the files
  • Confirmed that my code fix is in the file cloud-init-questing-deb/usr/lib/python3/dist-packages/cloudinit/distros/ug_util.py

So normally this should lead to the code being ingested in the CI tests given that the Run integration tests step is running CLOUD_INIT_CLOUD_INIT_SOURCE="$(ls /home/runner/work/_temp/cloud-init-base*.deb)" CLOUD_INIT_OS_IMAGE=questing CLOUD_INIT_LOCAL_LOG_PATH=./cloudinit_logs tox -e integration-tests-ci -- --color=yes tests/integration_tests/

@mostafaCamel
Copy link
Copy Markdown
Contributor Author

Pushed a commit with some additional prints. Confirmed that I am running the debain package

/usr/bin/cloud-init 26.1-1~bddeb~25.10.1

Package: cloud-init-base
Status: install ok installed
Priority: optional
Section: admin
Installed-Size: 2994
Maintainer: Ubuntu Developers <ubuntu-devel-discuss@lists.ubuntu.com>
Architecture: all
Source: cloud-init
Version: 26.1-1~bddeb~25.10.1
Replaces: cloud-init (<< 25.1~), cloud-init-base
Depends: cloud-guest-utils | cloud-utils, dhcpcd-base, iproute2, netcat-openbsd, netplan.io, procps, python3, python3-debconf, python3-requests
Recommends: eatmydata, gdisk, gnupg, python3-apt, software-properties-common
Suggests: openssh-server, ssh-import-id
Breaks: cloud-init (<< 25.1~), cloud-init-base
Conffiles:
 /etc/cloud/cloud.cfg eeaeae68aa0cf2a59134fe341491e769
 /etc/cloud/cloud.cfg.d/05_logging.cfg a7649f3a7332b9e9e0511dfa195d0212
 /etc/cloud/cloud.cfg.d/README f5175bd4df5c37ce781f93d20f59561a
 /etc/cloud/templates/chef_client.rb.tmpl a0844ddc9a42776d41a03d62d10ea139
 /etc/cloud/templates/chrony.conf.ubuntu.tmpl 981b6e81533cbfe2011a72cf83f0f858
 /etc/cloud/templates/hosts.debian.tmpl 941773df489d046d87ae491c3c95d8ec
 /etc/cloud/templates/ntp.conf.ubuntu.tmpl c4917fcac8096871f30136d046d2dade
 /etc/cloud/templates/sources.list.ubuntu.deb822.tmpl 4531340ef6e63ffeda6c35e8cf219bfd
 /etc/cloud/templates/timesyncd.conf.tmpl 9c6b3af8058efc219987863f45bb9198
 /etc/profile.d/Z99-cloud-locale-test.sh 2a8d91f29f0510d142fd03ede447d6b7
 /etc/profile.d/Z99-cloudinit-warnings.sh f45ec83cecfebd00c6071e56895c20bc
 /etc/rsyslog.d/21-cloudinit.conf d4cf2e5d3cb9914cf7e6cdc08d298339
 /etc/logrotate.d/cloud-init-base 1136c49642289d614837572cac18d9f6 obsolete
Description: initialization and customization tool for cloud instances
 Cloud-init with minimal dependencies, refer to cloud-init for more
 information.
Homepage: https://cloud-init.io/
/usr/bin/cloud-init

@mostafaCamel
Copy link
Copy Markdown
Contributor Author

Also, worth noting that this behavior also applies to the override of the password locking. CI failed as well when I commented out the shell assertion

        # Check shell
        # shell_set = (
        #     client.execute(["getent", "passwd", "ubuntu"])
        #     .stdout.strip()
        #     .split(":")[-1]
        # )
        # assert "/bin/sh" == shell_set
        # Check password is not locked
        passwd_status = client.execute(["passwd", "-S", "ubuntu"]).stdout
>       assert re.search(r"^ubuntu\s+P\b", passwd_status)
E       AssertionError: assert None
E        +  where None = <function search at 0x7f12d4d5d260>('^ubuntu\\s+P\\b', 'ubuntu L 2026-05-15 0 99999 7 -1')
E        +    where <function search at 0x7f12d4d5d260> = re.search

Signed-off-by: Mostafa Abdelwahab <mostafa.abdelwahab@canonical.com>
@mostafaCamel
Copy link
Copy Markdown
Contributor Author

Still failing with the same error even after I added a commit to cloud-init clean then reboot then starting the test assertions.

My guess the only proper way to test the behavior is to have CLOUD_INIT_SOURCE as None (use the code in the image) or IN_PLACE (inject the changes into lxd containers.I filed #6885

@mostafaCamel mostafaCamel requested a review from holmanb May 16, 2026 08:01
@mostafaCamel
Copy link
Copy Markdown
Contributor Author

mostafaCamel commented May 16, 2026

Hi again @holmanb , tldr: I added integration tests. One of them is failing CI (but succeeding locally), I suspect the failure is because default user settings cannot be changed post-first-boot. The CI test is even failing after I added a cloud-init clean then reboot then doing the test assertions.

I guess my options are one of:

You can verify that this test works locally by checking out my branch and running CLOUD_INIT_OS_IMAGE='questing' tox -e integration-tests -- tests/integration_tests/modules/test_users_groups.py::test_default_user_settings_override on your local .... assuming your ~/.config/pycloudlib.toml local file just contains [lxd]

@mostafaCamel
Copy link
Copy Markdown
Contributor Author

The following commit (where I clean seed before the reboot and the test assertion did not help. Same assertion failure

diff --git a/tests/integration_tests/modules/test_users_groups.py b/tests/integration_tests/modules/test_users_groups.py
index f4c32d0ea..3642b8dad 100644
--- a/tests/integration_tests/modules/test_users_groups.py
+++ b/tests/integration_tests/modules/test_users_groups.py
@@ -219,7 +219,7 @@ def test_default_user_settings_override(client: IntegrationInstance):
     # Github CI does not use cloud_init_source in place but rather installs
     # the cloud-init-base package after the instance boot so default user is
     # not overriden until cloud-initn is cleaned and the instance rebooted.
-    clean_cloud_init_and_restart_instance(client)
+    clean_cloud_init_and_restart_instance(client, remove_seed_directory=True)
     # Check shell
     shell_set = (
         client.execute(["getent", "passwd", "ubuntu"])
diff --git a/tests/integration_tests/util.py b/tests/integration_tests/util.py
index cfc104982..8c4f03bb6 100644
--- a/tests/integration_tests/util.py
+++ b/tests/integration_tests/util.py
@@ -710,13 +710,20 @@ def fetch_and_parse_etc_shadow(client):
     return users, dupes
 
 
-def clean_cloud_init_and_restart_instance(client):
+def clean_cloud_init_and_restart_instance(client, remove_seed_directory=False):
     """Clean cloud-init and restart the instance
 
     This function cleans the cloud-init state and restarts the instance,
     waiting for cloud-init to complete its initialization.
+
+    remove_seed_directory: False by default. If True, rruns
+    `cloud-init clean --seed --logs` which also removes the seed directory
     """
     client.instance.clean()
+    if remove_seed_directory:
+        # Use client.execute until https://github.com/canonical/pycloudlib/issues/502
+        # is resolved
+        client.execute(["cloud-init", "clean", "--logs", "--seed"])
     client.instance.restart()
     wait_for_cloud_init(client).stdout.strip()
     client.execute("cloud-init status --wait")

@mostafaCamel
Copy link
Copy Markdown
Contributor Author

I will stop trying for now. Could it be that self.instance.restart() does not actually do any restart?

@mostafaCamel
Copy link
Copy Markdown
Contributor Author

hopefully final TLDR:

  • test keeps succeeding locally ( in place source, lxd container) but fails in github CI (debian package source, seems lxd_container)
  • I added client.instance.clean to clean cloud-init and client.restart() and I am still getting the CI failure
  • I confirmed locally with KEEP_INSTANCE=true (and exiting mid-test) that client.clean() ends up removing the directory /var/lib/cloud/instance/sem from the container
  • I added a debug to the integration-tests-ci target . The debug logs kind of confirm that the _do_restart of LXDInstance (child of BaseInstance) gets called. Also the 4 second difference between 2026-05-17 15:04:18 and 2026-05-17 15:04:22 suggests there is an actual restart. and then the test assertion fails as usual.

My wild guess at this point that even after cloud-init clean --logs and the removal of the sem directory then (presumably) a reboot, cloud-init is unable to apply users_groups (at least on the default user)

I atttached the bug CI logs below because I do not know the retention time of the CI job

------------------------------ Captured log call -------------------------------
2026-05-17 15:04:14 INFO      pycloudlib.instance:instance.py:285 executing: sh -c 'sudo cloud-init clean --logs'
2026-05-17 15:04:14 DEBUG     pycloudlib.instance:instance.py:289 executing: sh -c 'sudo cloud-init clean --logs'
2026-05-17 15:04:15 INFO      pycloudlib.instance:instance.py:285 executing: sh -c 'sudo echo '"'"'uninitialized'"'"' > /etc/machine-id'
2026-05-17 15:04:15 DEBUG     pycloudlib.instance:instance.py:289 executing: sh -c 'sudo echo '"'"'uninitialized'"'"' > /etc/machine-id'
2026-05-17 15:04:15 INFO      pycloudlib.instance:instance.py:285 executing: sh -c 'sudo rm -rf /var/log/syslog'
2026-05-17 15:04:15 DEBUG     pycloudlib.instance:instance.py:289 executing: sh -c 'sudo rm -rf /var/log/syslog'
2026-05-17 15:04:15 INFO      integration_testing:instances.py:85 Restarting instance and waiting for boot
2026-05-17 15:04:15 INFO      pycloudlib.instance:instance.py:285 executing: sh -c sync
2026-05-17 15:04:15 DEBUG     pycloudlib.instance:instance.py:289 executing: sh -c sync
2026-05-17 15:04:18 DEBUG     pycloudlib.instance:instance.py:132 Pre-reboot boot_id: de41cf7e-bba2-44bb-965e-8b1f198a11a6
2026-05-17 15:04:18 DEBUG     pycloudlib.instance:instance.py:362 restarting cloudinit-0517-150406snkpcmpd
2026-05-17 15:04:22 INFO      pycloudlib.instance:instance.py:512 _wait_for_execute to complete
2026-05-17 15:04:22 DEBUG     pycloudlib.instance:instance.py:103 Unable to find valid IP. Found network: {'eth0': {'addresses': [{'address': 'fe80::216:3eff:fe22:d3dc', 'family': 'inet6', 'netmask': '64', 'scope': 'link'}], 'counters': {'bytes_received': 272, 'bytes_sent': 266, 'errors_received': 0, 'errors_sent': 0, 'packets_dropped_inbound': 0, 'packets_dropped_outbound': 0, 'packets_received': 2, 'packets_sent': 3}, 'host_name': 'vethed221a87', 'hwaddr': '00:16:3e:22:d3:dc', 'mtu': 1500, 'state': 'up', 'type': 'broadcast'}, 'lo': {'addresses': [{'address': '127.0.0.1', 'family': 'inet', 'netmask': '8', 'scope': 'local'}, {'address': '::1', 'family': 'inet6', 'netmask': '128', 'scope': 'local'}], 'counters': {'bytes_received': 0, 'bytes_sent': 0, 'errors_received': 0, 'errors_sent': 0, 'packets_dropped_inbound': 0, 'packets_dropped_outbound': 0, 'packets_received': 0, 'packets_sent': 0}, 'host_name': '', 'hwaddr': '', 'mtu': 65536, 'state': 'up', 'type': 'loopback'}}
2026-05-17 15:04:23 INFO      paramiko.transport:transport.py:1786 Connected (version 2.0, client OpenSSH_10.0p2)
2026-05-17 15:04:23 INFO      paramiko.transport:transport.py:1786 Authentication (publickey) successful!
2026-05-17 15:04:24 INFO      pycloudlib.instance:instance.py:532 _wait_for_cloudinit to complete
2026-05-17 15:04:24 INFO      pycloudlib.instance:instance.py:285 executing: sh -c 'command -v systemctl'
2026-05-17 15:04:24 DEBUG     pycloudlib.instance:instance.py:289 executing: sh -c 'command -v systemctl'
2026-05-17 15:04:25 INFO      pycloudlib.instance:instance.py:285 executing: cloud-init status --wait --long
2026-05-17 15:04:25 DEBUG     pycloudlib.instance:instance.py:287 waiting for start
2026-05-17 15:04:25 INFO      pycloudlib.instance:instance.py:285 executing: sudo -- sh -c 'cloud-init status'
2026-05-17 15:04:25 DEBUG     pycloudlib.instance:instance.py:289 executing: sudo -- sh -c 'cloud-init status'
2026-05-17 15:04:26 INFO      pycloudlib.instance:instance.py:285 executing: sudo -- sh -c 'cloud-init status --wait'
2026-05-17 15:04:26 DEBUG     pycloudlib.instance:instance.py:289 executing: sudo -- sh -c 'cloud-init status --wait'
2026-05-17 15:04:26 INFO      pycloudlib.instance:instance.py:285 executing: sudo -- getent passwd ubuntu

@mostafaCamel
Copy link
Copy Markdown
Contributor Author

Silly me. The problem was that I was using passwd insteaf of hashed_passwd . According to the documentation, most of the keys cannot be updated for existing users (except a few keys like hashed_passwd and `lock_passwd) . I my case, distros/init.py did not update lock_passwd because it couldn't find a non-empty password (because I kept using passwd instead of hashed_passwd)

@mostafaCamel
Copy link
Copy Markdown
Contributor Author

Ok, final comment: tests now succeed locally and github CI. No need for a cloud-init clean then restart.

hashed_passwd setting works fine
I condidtionalized the assertion for the shell setting as it would be adjusted for IN_PLACE and not updated when the debian package is used

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants