Skip to content

service: Support for server-side pagination (partial listings)#188

Merged
filak-sap merged 1 commit into
SAP:masterfrom
rettichschnidi:rs/upstream/support-_next
Feb 15, 2022
Merged

service: Support for server-side pagination (partial listings)#188
filak-sap merged 1 commit into
SAP:masterfrom
rettichschnidi:rs/upstream/support-_next

Conversation

@rettichschnidi

@rettichschnidi rettichschnidi commented Jan 2, 2022

Copy link
Copy Markdown
Contributor

Reading the spec, using the __next field, not $top/$skip/__count seems
to be the pristine way of fetching an entity collection in full:

In response payloads, representing Collections of Entries, if the
server does not include an object for every Entry in the Collection of
Entries identified by the request URI then the response represents a
partial listings of the Collection. In this case, "__next" name/value
pair is included to indicate the response represents a partial
listing. The value of the name/value pair is a URI which identifies
the next partial set of entities from the originally identified
complete set.

However, I am not sure if this the right way to implement this functionality. Feedback greatly appreciated.

Main question:

In case the maintainers agree with the direction of this PR I will work on getting it out of the draft state:

  • Adding some example usage/documentation
  • Adding a changelog entry

@rettichschnidi rettichschnidi force-pushed the rs/upstream/support-_next branch from ebc21e5 to 4103ec8 Compare January 2, 2022 21:00
@phanak-sap phanak-sap added enhancement New feature or request question Further information is requested labels Jan 2, 2022
@phanak-sap

phanak-sap commented Jan 2, 2022

Copy link
Copy Markdown
Contributor

Hi @rettichschnidi - you are very productive today with questions and PRs. :) Thanks for that, frankly we need that from userbase.

You must understand the history of this package. Sadly there is no "odata v2 / v4 service, covering all test cases in the respective specification" that could be integration tests pointed against, to check if pyodata adhers to odata specification in its entirety (feature coverage). So everything goes "bottom-up". Pyodata was primary created for usage inside SAP - for integration testing of Fiori Apps, because no python package was usable at that time. Then open-sourced, "as-is".

I guess, there is no big design decision for _"why this project went with $top/$skip", more likely it is just __next is not implemented at all, since we covered our use cases with top and skip so far :)

My question - did you considered Batch requests/response?

Comment thread pyodata/v2/service.py Outdated
Comment thread pyodata/v2/service.py
@filak-sap

Copy link
Copy Markdown
Contributor

Impressive. I like this draft PR. Please, move it to "ready" PR.

What are the reasons that this project went with $top/$skip (unable to fetch data from the next page (more than 1000 records) #78) until now?

Just because I am a lame OData user who knows only the bare minimum to be able to accomplish my job :)

@rettichschnidi

Copy link
Copy Markdown
Contributor Author

My question - did you considered Batch requests/response?

I did not. My use case it "give me the whole database". I do not think batch requests can help me with that?

@rettichschnidi

Copy link
Copy Markdown
Contributor Author

Pyodata was primary created for usage inside SAP - for integration testing of Fiori Apps, because no python package was usable at that time. Then open-sourced, "as-is".

I am glad you did this. Thanks!

@rettichschnidi rettichschnidi force-pushed the rs/upstream/support-_next branch from 4103ec8 to 9970d64 Compare January 11, 2022 22:30
@codecov-commenter

codecov-commenter commented Jan 11, 2022

Copy link
Copy Markdown

Codecov Report

Merging #188 (43de733) into master (0599db2) will increase coverage by 0.03%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #188      +/-   ##
==========================================
+ Coverage   92.66%   92.70%   +0.03%     
==========================================
  Files           6        6              
  Lines        2768     2783      +15     
==========================================
+ Hits         2565     2580      +15     
  Misses        203      203              
Impacted Files Coverage Δ
pyodata/v2/service.py 90.81% <100.00%> (+0.15%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 0599db2...43de733. Read the comment docs.

@rettichschnidi rettichschnidi marked this pull request as ready for review January 12, 2022 02:06
Comment thread docs/usage/querying.rst Outdated
break

# We got a partial answer - continue with next page
employees = northwind.entity_sets.Employees.get_entities().next_url(employees.next_url).execute()

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, I didn't realize users will have to do this kind of exercise. Could you please create some Iterator wrapper? My goal was to create a library that integrates into python standard.

@phanak-sap phanak-sap Jan 12, 2022

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@filak-sap I understand this does not look great.

However, I would see also benefits in this approach. IMHO there is more control over the URLs sent, how it will look like (possible additional parameters) and how many requests are sent, when exactly and how big of response we receive. If hide subsequent requests behind iterator, that would yield entities based on presence of _next attribute, wouldn't we in the end transfer the entire DB to the memory in the model.py (for that particular entityset), regardless yielding of entity after entity out of the pyodata library to the client script? It is gut-feeling concern, I am not able to point to a code line that would do that.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However, I would see also benefits in this approach. IMHO there is more control over the URLs sent, how it will look like (possible additional parameters) and how many requests are sent, when exactly and how big of response we receive.

+1 - We definitely should give the user control over this.

If hide subsequent requests behind iterator, that would yield entities based on presence of _next attribute, wouldn't we in the end transfer the entire DB to the memory in the model.py (for that particular entityset), regardless yielding of entity after entity out of the pyodata library to the client script?

Returning a generator which yields individual entities (instead of a single ListWithTotalCount) could easily be implemented to not issue a 2nd request towards the OData server until all available entities have been consumed. This keeps the memory consumption on the level we have currently.

To avoid API breakage, how about this: Add a method recursive() to class QueryRequest. It could take a parameter strategy which controls whether to return a list of class ListWithTotalCount or a generator which either returns a generator of ListWithTotalCount or entity objects. Default would be the "strategy" which is currently implemented: Leaving it up to the user.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rettichschnidi Thank you for the proposal! Why not to make the comfortable strategy the default when you will add another method anyways? Could you please post some pseudo code?

@rettichschnidi rettichschnidi Jan 13, 2022

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd have to either break the API (switch to a generator) or keep all elements in memory (so that I can assemble a ListWithTotalCount which contains truly all elements) and therefore risking some kind of resource exhaustion. Not a good default IMHO.

Tell me which you you prefer and I will post some code.

@rettichschnidi rettichschnidi Jan 13, 2022

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about passing ListWithTotalCount? Something like this:

employees = northwind.entity_sets.Employees.get_entities().next_page(employees).execute()

In case there is no next_url set in employees, the resulting set would be empty and this could be used as indication for the user that all data has been returned for real.

@phanak-sap phanak-sap Jan 13, 2022

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMHO throwing exception to in essence valid scenario is even worse than if employees.next_url is None: in a loop. Especially if the exception is related to odata protocol details that must be known.

Option experimental
We can take this in a way as an experimental API, (like they exists in Python 3.10.0 itself) - merge, not document, not in changelog, with follow-up issue that will either stick to it (and add documentation and release notes), change the API to some proposed in comments, or rollback the change.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can take this in a way as an experimental API, (like they exists in Python 3.10.0 itself) - merge, not document, not in changelog, with follow-up issue that will either stick to it (and add documentation and release notes), change the API to some proposed in comments, or rollback the change.

I could roll with this. @phanak-sap: Any chance you'd accept this for version 1.8?

@rettichschnidi rettichschnidi Jan 23, 2022

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Realized that 1.8.0 is already released.

I just pushed a new version without change note and no documentation change anyway. Maybe merging this and letting (interested) people play with it brings some fresh, useful ideas somewhen down the road?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

side note: well, they can always checkout the current code from your fork for this PR to play with as I did. Does not need to be merged to master for that. And by the force-push

Where this PR will be merged as currently is, or not, there are tasks outside this PR to track. I created issue #198 for that, so far just to link, will add task-list after this PR is resolved.

@jfilak What is your preference here? Merge current code to master? Request Changes? This is PR that I am not comfortable to approve or disapprove just on my own.

@rettichschnidi rettichschnidi force-pushed the rs/upstream/support-_next branch from 9970d64 to eb6e692 Compare January 13, 2022 14:23
@phanak-sap

Copy link
Copy Markdown
Contributor

My question - did you considered Batch requests/response?
I did not. My use case it "give me the whole database". I do not think batch requests can help me with that?

I was not exact enough. I ment consider in the solution - whatever it will be - several responses with _next field in one batch response. Should not be problem in current implementation IMHO, but perhaps with the iterator approach.

E.g. modified this test:

def test_batch_request(service):

@phanak-sap phanak-sap requested a review from filak-sap January 17, 2022 17:16
@phanak-sap phanak-sap removed the question Further information is requested label Jan 20, 2022
@phanak-sap phanak-sap added this to the 1.9.0 milestone Jan 20, 2022
@phanak-sap

phanak-sap commented Jan 20, 2022

Copy link
Copy Markdown
Contributor

The more I look at the docs/usage/querying.rst update, the more I must agree with @jfilak that the example of _next inside loop with if employees.next_url is None: somewhat strange to whatever else we've got in pyodata.

I simply waited for his re-review, but I am inclined to move a bit forward, by leaving this out of upcoming 1.8.0 version - which I will do soon for what we already have stack up pending release. Better to be really sure how the new API should look like. I have marked this PR to be part of next MINOR version.

@rettichschnidi rettichschnidi force-pushed the rs/upstream/support-_next branch from eb6e692 to 43de733 Compare January 23, 2022 16:54
@rettichschnidi

Copy link
Copy Markdown
Contributor Author

I was not exact enough. I ment consider in the solution - whatever it will be - several responses with _next field in one batch response. Should not be problem in current implementation IMHO, but perhaps with the iterator approach.

Now I get it - yes, certainly something to keep in mind, whatever the solution will look like.

@filak-sap

Copy link
Copy Markdown
Contributor

Could you please rebase to the latest master (I don't like merge commits).

@rettichschnidi rettichschnidi force-pushed the rs/upstream/support-_next branch from 43de733 to 9321044 Compare January 28, 2022 18:39
@rettichschnidi rettichschnidi changed the title service: Support for dealing with partial listings service: Support for server-side pagination(partial listings) Jan 31, 2022
@rettichschnidi rettichschnidi changed the title service: Support for server-side pagination(partial listings) service: Support for server-side pagination (partial listings) Jan 31, 2022
Reading the spec, using the __next field, not $top/$skip/__count seems
to be the pristine way of fetching an entity collection in full:

> In response payloads, representing Collections of Entries, if the
> server does not include an object for every Entry in the Collection of
> Entries identified by the request URI then the response represents a
> partial listings of the Collection. In this case, "__next" name/value
> pair is included to indicate the response represents a partial
> listing. The value of the name/value pair is a URI which identifies
> the next partial set of entities from the originally identified
> complete set.
@rettichschnidi rettichschnidi force-pushed the rs/upstream/support-_next branch from 9321044 to d22fd1f Compare January 31, 2022 17:52
@rettichschnidi

Copy link
Copy Markdown
Contributor Author

Could you please rebase to the latest master (I don't like merge commits).

I did so. Anything else I can do to get this PR forward?

@phanak-sap

Copy link
Copy Markdown
Contributor

Hi @rettichschnidi - basically I am waiting for answer from @jfilak / @filak-sap on my comment, how to proceed forward: #188 (comment).

Because of the while true: if employees.next_url is None discussion, I do not want currently to merge it to master just by myself - even undocumented, it would be de-facto pyodata API for foreseeable future. It can be played locally even without being on pypi, just by cloning your fork (and for your project you probably are already monkey patching).

Note, it is definitely a valid enhancement request, it is tracked under #198 even if this PR would somehow ended closed without merging. I opted for being undecided on this PR for some more time.

@filak-sap filak-sap merged commit 41abb98 into SAP:master Feb 15, 2022
@filak-sap

Copy link
Copy Markdown
Contributor

Well done. Thank you very much!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants