service: Support for server-side pagination (partial listings)#188
Conversation
ebc21e5 to
4103ec8
Compare
|
Hi @rettichschnidi - you are very productive today with questions and PRs. :) Thanks for that, frankly we need that from userbase. You must understand the history of this package. Sadly there is no "odata v2 / v4 service, covering all test cases in the respective specification" that could be integration tests pointed against, to check if pyodata adhers to odata specification in its entirety (feature coverage). So everything goes "bottom-up". Pyodata was primary created for usage inside SAP - for integration testing of Fiori Apps, because no python package was usable at that time. Then open-sourced, "as-is". I guess, there is no big design decision for _"why this project went with $top/$skip", more likely it is just My question - did you considered Batch requests/response? |
|
Impressive. I like this draft PR. Please, move it to "ready" PR.
Just because I am a lame OData user who knows only the bare minimum to be able to accomplish my job :) |
I did not. My use case it "give me the whole database". I do not think batch requests can help me with that? |
I am glad you did this. Thanks! |
4103ec8 to
9970d64
Compare
Codecov Report
@@ Coverage Diff @@
## master #188 +/- ##
==========================================
+ Coverage 92.66% 92.70% +0.03%
==========================================
Files 6 6
Lines 2768 2783 +15
==========================================
+ Hits 2565 2580 +15
Misses 203 203
Continue to review full report at Codecov.
|
| break | ||
|
|
||
| # We got a partial answer - continue with next page | ||
| employees = northwind.entity_sets.Employees.get_entities().next_url(employees.next_url).execute() |
There was a problem hiding this comment.
Oops, I didn't realize users will have to do this kind of exercise. Could you please create some Iterator wrapper? My goal was to create a library that integrates into python standard.
There was a problem hiding this comment.
@filak-sap I understand this does not look great.
However, I would see also benefits in this approach. IMHO there is more control over the URLs sent, how it will look like (possible additional parameters) and how many requests are sent, when exactly and how big of response we receive. If hide subsequent requests behind iterator, that would yield entities based on presence of _next attribute, wouldn't we in the end transfer the entire DB to the memory in the model.py (for that particular entityset), regardless yielding of entity after entity out of the pyodata library to the client script? It is gut-feeling concern, I am not able to point to a code line that would do that.
There was a problem hiding this comment.
However, I would see also benefits in this approach. IMHO there is more control over the URLs sent, how it will look like (possible additional parameters) and how many requests are sent, when exactly and how big of response we receive.
+1 - We definitely should give the user control over this.
If hide subsequent requests behind iterator, that would yield entities based on presence of _next attribute, wouldn't we in the end transfer the entire DB to the memory in the model.py (for that particular entityset), regardless yielding of entity after entity out of the pyodata library to the client script?
Returning a generator which yields individual entities (instead of a single ListWithTotalCount) could easily be implemented to not issue a 2nd request towards the OData server until all available entities have been consumed. This keeps the memory consumption on the level we have currently.
To avoid API breakage, how about this: Add a method recursive() to class QueryRequest. It could take a parameter strategy which controls whether to return a list of class ListWithTotalCount or a generator which either returns a generator of ListWithTotalCount or entity objects. Default would be the "strategy" which is currently implemented: Leaving it up to the user.
There was a problem hiding this comment.
@rettichschnidi Thank you for the proposal! Why not to make the comfortable strategy the default when you will add another method anyways? Could you please post some pseudo code?
There was a problem hiding this comment.
I'd have to either break the API (switch to a generator) or keep all elements in memory (so that I can assemble a ListWithTotalCount which contains truly all elements) and therefore risking some kind of resource exhaustion. Not a good default IMHO.
Tell me which you you prefer and I will post some code.
There was a problem hiding this comment.
How about passing ListWithTotalCount? Something like this:
employees = northwind.entity_sets.Employees.get_entities().next_page(employees).execute()In case there is no next_url set in employees, the resulting set would be empty and this could be used as indication for the user that all data has been returned for real.
There was a problem hiding this comment.
IMHO throwing exception to in essence valid scenario is even worse than if employees.next_url is None: in a loop. Especially if the exception is related to odata protocol details that must be known.
Option experimental
We can take this in a way as an experimental API, (like they exists in Python 3.10.0 itself) - merge, not document, not in changelog, with follow-up issue that will either stick to it (and add documentation and release notes), change the API to some proposed in comments, or rollback the change.
There was a problem hiding this comment.
We can take this in a way as an experimental API, (like they exists in Python 3.10.0 itself) - merge, not document, not in changelog, with follow-up issue that will either stick to it (and add documentation and release notes), change the API to some proposed in comments, or rollback the change.
I could roll with this. @phanak-sap: Any chance you'd accept this for version 1.8?
There was a problem hiding this comment.
Realized that 1.8.0 is already released.
I just pushed a new version without change note and no documentation change anyway. Maybe merging this and letting (interested) people play with it brings some fresh, useful ideas somewhen down the road?
There was a problem hiding this comment.
side note: well, they can always checkout the current code from your fork for this PR to play with as I did. Does not need to be merged to master for that. And by the force-push
Where this PR will be merged as currently is, or not, there are tasks outside this PR to track. I created issue #198 for that, so far just to link, will add task-list after this PR is resolved.
@jfilak What is your preference here? Merge current code to master? Request Changes? This is PR that I am not comfortable to approve or disapprove just on my own.
9970d64 to
eb6e692
Compare
I was not exact enough. I ment consider in the solution - whatever it will be - several responses with _next field in one batch response. Should not be problem in current implementation IMHO, but perhaps with the iterator approach. E.g. modified this test: python-pyodata/tests/test_service_v2.py Line 1338 in ec56995 |
|
The more I look at the docs/usage/querying.rst update, the more I must agree with @jfilak that the example of _next inside loop with I simply waited for his re-review, but I am inclined to move a bit forward, by leaving this out of upcoming 1.8.0 version - which I will do soon for what we already have stack up pending release. Better to be really sure how the new API should look like. I have marked this PR to be part of next MINOR version. |
eb6e692 to
43de733
Compare
Now I get it - yes, certainly something to keep in mind, whatever the solution will look like. |
|
Could you please rebase to the latest master (I don't like merge commits). |
43de733 to
9321044
Compare
Reading the spec, using the __next field, not $top/$skip/__count seems to be the pristine way of fetching an entity collection in full: > In response payloads, representing Collections of Entries, if the > server does not include an object for every Entry in the Collection of > Entries identified by the request URI then the response represents a > partial listings of the Collection. In this case, "__next" name/value > pair is included to indicate the response represents a partial > listing. The value of the name/value pair is a URI which identifies > the next partial set of entities from the originally identified > complete set.
9321044 to
d22fd1f
Compare
I did so. Anything else I can do to get this PR forward? |
|
Hi @rettichschnidi - basically I am waiting for answer from @jfilak / @filak-sap on my comment, how to proceed forward: #188 (comment). Because of the Note, it is definitely a valid enhancement request, it is tracked under #198 even if this PR would somehow ended closed without merging. I opted for being undecided on this PR for some more time. |
|
Well done. Thank you very much! |
Reading the spec, using the __next field, not $top/$skip/__count seems
to be the pristine way of fetching an entity collection in full:
However, I am not sure if this the right way to implement this functionality. Feedback greatly appreciated.
Main question:
In case the maintainers agree with the direction of this PR I will work on getting it out of the draft state: