Skip to content

Comments

Fix $select clause behavior for Find Requests#1697

Merged
severussundar merged 21 commits intomainfrom
dev/shyamsundarj/fix-select-clause-behavior
Sep 15, 2023
Merged

Fix $select clause behavior for Find Requests#1697
severussundar merged 21 commits intomainfrom
dev/shyamsundarj/fix-select-clause-behavior

Conversation

@severussundar
Copy link
Contributor

@severussundar severussundar commented Sep 11, 2023

Why make this change?

For the below GET request,

GET https://localhost:5001/api/Book?$select=title

the response contains the id field as well. If a table contains composite PK, all the Pks are returned in the response.

Response:

{
    "value": [
        {
            "title": "New title #1",
            "id": 1
        },
        {
            "title": "Also Awesome book",
            "id": 2
        },
        {
            "title": "Great wall of china explained",
            "id": 3
        },
        ...
    ]
}

When both $select and $orderby are used together, the response contains $orderby fields in addition to the ones in $select

GET https://localhost:5001/api/Book?$select=id,publisher_id&$orderby=name

the response contains the name field as well.

Response:

{
    "value": [
        {
            "id": 2,
            "publisher_id": 1234,
            "name": "Also Awesome book"
        },
       ...
       ]
}    

Similar behavior was observed with views, where the configured key-fields were returned in the response. If multiple key-fields are configured, all of them were returned.

image

What is this change?

  • To support pagination and $first with Find requests, it is necessary to provide the nextLink field in the response. For the calculation of nextLink, the fields such as primary keys, fields in $orderby clause are retrieved from the database in addition to the fields requested in the $select clause. The nextLink calculation is done at DAB layer after executing the database query because it is impossible to know beforehand whether a table/view would contain more than 100 records (or more entries than requested in the $first clause if that is applicable).

  • Seeing that these additional fields needs to be retrieved from the database, the database queries cannot be modified to select only the fields present in $select clause. So, the additional fields are removed at the DAB layer before returning the response.

Why can't the database queries by modifed to select only the fields present in $select query string?

When the table contains more than 100 records, the nextLink URL field is returned in the response. The nextLink URL contains $after which is a base 64 encoded value. $after is calculated using all the primary key and $orderby column values of the 100th record (or nth when $first=n is used).

The row values from the 100th (or nth when $first=n is used) record is necessary because of two reasons

a) The last record's values are necessary to correctly fetch the subsequent set of records. Let's say the first GET request fetches 100 records. To be able to fetch the next set of rows: 101-200 in the right ordering, the database query will include the condition ~ WHERE ( ( [orderby_column] > [orderby_value_of_row100] ) OR ( [orderby_column] == [orderby_value_of_row100] && [pk_column] > [pk_value_of_row100]). Here orderby_value_of_row100 and pk_value_of_row100 refers to the values of orderby column of the 100th row and pk column of the 100th row respectively.
b) $orderby can be performed in ASC or DESC order. The 100th record will be different in both the ordering and we rely on the value of the 100th row to be able to select the next set of rows (101-200, etc.)

Logic for $after field calculation: SqlPaginationUtil.MakeCursorFromJsonElement()

Granted, all of this is applicable only when nextLink field is necessary in the response, it is not possible to know beforehand whether a table contains more than 100 records ( or n records when first n records is selected using $first clause). We select 101 records (or n+1 records for $first=n) and then determine at DAB layer if nextLink field is necessary. If the database query returns 101 (or n+1 records), then we conclude that a nextLink field is necessary. Essentailly, not being able to know beforehand the number of records in the table forces to select the additional fields.

How was this tested?

  • Integration Tests
  • Manual Tests

Sample Request(s)

  • Primary Key fields are not returned in the response when absent in $select query string
    image

  • Orderby fields are not returned in the response
    image

Copy link
Contributor

@seantleonard seantleonard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not following why we have to do all the introduced post processing when the SQL query could be modified to return the correct results (which exclude the pk fields/select/orderby clause fields.

That would be helpful to have in the pr description. In your tests, you make adjustments to the validation queries to address this ask, so why not in the actual engine code?

@severussundar
Copy link
Contributor Author

I'm not following why we have to do all the introduced post processing when the SQL query could be modified to return the correct results (which exclude the pk fields/select/orderby clause fields.

That would be helpful to have in the pr description. In your tests, you make adjustments to the validation queries to address this ask, so why not in the actual engine code?

When the table contains more than 100 records, the nextLink URL field is returned in the response. The nextLink URL contains $after which is a base 64 encoded value. $after is calculated using all the primary key and $orderby column values of the 100th record (or nth when $first=n is used).

The row values from the 100th (or nth when $first=n is used) record is necessary because of two reasons

a) The last record's values are necessary to correctly fetch the subsequent set of records. Let's say the first GET request fetches 100 records. To be able to fetch the next set of rows: 101-200 in the right ordering, the database query will include the condition ~ WHERE ( ( [orderby_column] > [orderby_value_of_row100] ) OR ( [orderby_column] == [orderby_value_of_row100] && [pk_column] > [pk_value_of_row100]). Here orderby_value_of_row100 and pk_value_of_row100 refers to the values of orderby column of the 100th row and pk column of the 100th row respectively.
b) $orderby can be performed in ASC or DESC order. The 100th record will be different in both the ordering and we rely on the value of the 100th row to be able to select the next set of rows (101-200, etc.)

Logic for $after field calculation: SqlPaginationUtil.MakeCursorFromJsonElement()

Granted, all of this is applicable only when nextLink field is necessary in the response, it is not possible to know beforehand whether a table contains more than 100 records ( or n records when first n records is selected using $first clause). We select 101 records (or n+1 records for $first=n) and then determine at DAB layer if nextLink field is necessary. If the database query returns 101 (or n+1 records), then we conclude that a nextLink field is necessary. Essentailly, not being able to know beforehand the number of records in the table forces to select the additional fields.

In your tests, you make adjustments to the validation queries to address this ask, so why not in the actual engine code?

The tests primarily validate that no extra columns in addition to those requested through $select are returned in the response. To construct the expected response, I've used database queries to pick the necessary fields. These are not the exact database queries that DAB generates for the corresponding Find API requests. And, importantly, they cannot be the same queries as DAB selects additional fields to ensure correct ordering [The earlier section explains this in greater detail]. Since, there is no change in the nextLink generation logic, no new tests are added to validate that. The existing pagination and $first clause tests validate the following.
a) nextLink field is returned in the response when necessary
b) The correctness of the nextLink field.

@seantleonard
Copy link
Contributor

I have updated the nuget feed so that it can pull the correct version of packages.

Copy link
Contributor

@seantleonard seantleonard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One last nit on List usage. Otherwise, thanks for making these changes!

@severussundar severussundar merged commit 29c46c2 into main Sep 15, 2023
@severussundar severussundar deleted the dev/shyamsundarj/fix-select-clause-behavior branch September 15, 2023 09:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working rest

Projects

None yet

Development

Successfully merging this pull request may close these issues.

REST - Using $select always returns the primary key fields and $orderby fields in the response

3 participants