Skip to content

Improve the performance when using enumeration#8395

Open
aplopez wants to merge 6 commits intoSSSD:masterfrom
aplopez:enumerate
Open

Improve the performance when using enumeration#8395
aplopez wants to merge 6 commits intoSSSD:masterfrom
aplopez:enumerate

Conversation

@aplopez
Copy link
Contributor

@aplopez aplopez commented Jan 21, 2026

This PR includes:

  • Removal of an unused function.
  • Stop logging a possibly extremely long filter.
  • Fixes a wrong condition invalidating an optimization.
  • Adds a test case for an existing test.

Enumeration, specially when there are 15,000+ users, is slow. This fix helps, but it doesn't work miracles.
In my test environment, the enumeration went from 8 minutes to about 1.

It is important to know that, with such an amount of users, many operations time out. It is necessary to increment the timeout in[nss] and for the domain, but also set large values for ldap_enumeration_refresh_timeout and ldap_search_timeout in the domain. I used these values to avoid any timeout (YMMV):

[domain/ldap.test]
ldap_enumeration_refresh_timeout = 30000
ldap_search_timeout = 6000
timeout = 6000
...

[nss]
timeout = 6000
...

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request effectively improves performance by optimizing logging, removing an unused function, and correcting a condition related to enumeration. The changes are well-aligned with the stated goals of enhancing enumeration performance, especially for large user bases. The addition of a new test case for the general enumeration scenario ensures that the modified logic is adequately covered.

@alexey-tikhonov
Copy link
Member

Mistype in the commit message: "We must look into de TS cache"

@aplopez
Copy link
Contributor Author

aplopez commented Jan 22, 2026

Mistype in the commit message: "We must look into de TS cache"

Fixed.

@alexey-tikhonov
Copy link
Member

alexey-tikhonov commented Jan 23, 2026

I think fix is correct in the sense it fixes a bug.

But I think logic of sysdb_enumpwent_filter() can and should be improved in general to avoid a case when dn_filter expands to entire db.

In particular, if addtl_filter isn't set, then sysdb_search_ts_users(enum_filter(NULL)) is expected to return entire db, right? And using this as additional filter results in the same as '*' but extremely slow.
Or do I miss something?

dn_filter, &ts_cache_res);
if (ret != EOK && ret != ENOENT) {
goto done;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this go out immediately from else branch?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which else branch?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if (ts_res.count > 0) {} else {go-out}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand we cannot go to out because we still need to proceed with the code that follows the big if {} block.

Copy link
Member

@alexey-tikhonov alexey-tikhonov Feb 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why?

  • search is by name
  • timestamp cache has nothing for this name pattern

Imo, this means main cache also has nothing. No?
Could you provide an example where ts-search finds nothing but main cache finds something?

Copy link
Contributor Author

@aplopez aplopez Feb 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not simply ignored. It is ignored if no entry is found, but if entries are found, it is used (and only those entries are returned).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In addition, one of the test falls in this case and fails if I add the goto-out as you proposed.
Maybe there is an error in the test setup. I need to confirm this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not simply ignored. It is ignored if no entry is found, but if entries are found, it is used (and only those entries are returned).

Do I understand correctly it returns outdated entries in this case, despite filter clearly set?

Copy link
Member

@alexey-tikhonov alexey-tikhonov Feb 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Btw, other than this issue discussed in this thread, patch set looks good to me.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not simply ignored. It is ignored if no entry is found, but if entries are found, it is used (and only those entries are returned).

Do I understand correctly it returns outdated entries in this case, despite filter clearly set?

Yes. That's what happens. But this happens in known cases. When you call FindByAttr() it will return only new entries. New calls will return newer entries only until there is none, in which case, all the entries will be returned.

Maybe @sumit-bose can explain why?

@aplopez aplopez marked this pull request as ready for review February 24, 2026 13:19
Function sysdb_enumpwent() is not used.
It was replaced by sysdb_enumpwent_filter().
When there are too many users (17,000+) this message can be too long.
Limit it to the first 50 characters.

Resolves: SSSD#6951
We must look into the TS cache only when a name is provided.
Using the TS cache on an unfiltered enumeration is useless.

Resolves: SSSD#6951
Added a case that was not checked before. It is the case
when `attr`, `attr_name` and `addtl_filter` are all `NULL`.
Create the filter to retrieve only the requested entries.

Do not create a new filter and search for matches if there is
no results from the previous search. The called functions
handle this case correctly but why waisting time calling them?
Function cache_req_user_by_filter_lookup() will set or not the recent
filter depending on whether data->name.attr is set or not. As mentioned
in the comment, it should be done base on whether the refernced
attribute is name or not.
@alexey-tikhonov alexey-tikhonov added the coverity Trigger a coverity scan label Feb 24, 2026
@alexey-tikhonov
Copy link
Member

Note: Covsan is green so far.

@alexey-tikhonov alexey-tikhonov removed the coverity Trigger a coverity scan label Feb 24, 2026
@alexey-tikhonov
Copy link
Member

Hm,
F44:

FAILED tests/test_infopipe.py::test_infopipe__list_by_name (ldap) - AssertionError: ListByName('user-*', 0) is missing element 10002
assert '/org/freedesktop/sssd/infopipe/Users/test/10002' in ['/org/freedesktop/sssd/infopipe/Users/test/10001', '/org/freedesktop/sssd/infopipe/Users/test/10003']

Looks relevant, but why f44 only... race condition?

@aplopez
Copy link
Contributor Author

aplopez commented Feb 25, 2026

Looks relevant, but why f44 only... race condition?

I reran the tests and a different test failed. 😮‍💨
Locally, on my PC (Fedora 43, though) the test passes every time.

@aplopez
Copy link
Contributor Author

aplopez commented Feb 26, 2026

And now all the tests passed. There is some instability in F44, but not related to this PR.

@alexey-tikhonov
Copy link
Member

And now all the tests passed. There is some instability in F44, but not related to this PR.

It is very suspicious that it was test_infopipe__list_by_name that I didn't see failing before.
Can there be a race condition in the test itself that is triggered by slow runner?

@aplopez
Copy link
Contributor Author

aplopez commented Feb 26, 2026

It is very suspicious that it was test_infopipe__list_by_name that I didn't see failing before. Can there be a race condition in the test itself that is triggered by slow runner?

I thought the same until I noticed this test failed once and never again. The second time a completely different test failed. The third time, the latest, none.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants