Skip to content

Fixed PostgresToGCSOperator fail on empty resultset for use_server_side_cursor=True#21307

Merged
potiuk merged 2 commits into
apache:mainfrom
kazanzhy:pg_server_side_cursor_on_empty_20007
Feb 15, 2022
Merged

Fixed PostgresToGCSOperator fail on empty resultset for use_server_side_cursor=True#21307
potiuk merged 2 commits into
apache:mainfrom
kazanzhy:pg_server_side_cursor_on_empty_20007

Conversation

@kazanzhy

@kazanzhy kazanzhy commented Feb 3, 2022

Copy link
Copy Markdown
Contributor

Attribute description of simple psycopg2 cursor is available after .execute() method.
But for the named cursor (server-side cursor) any .fetch* method call is needed.

In case of an empty result set like .execute("SELECT 1 LIMIT 0"):

  • for the simple cursor, there are initialized .description and an empty iterator
  • for the server-side cursor, there are not initialized .descriptionand an empty original iterator.
    But also we have self.rows=[None], as a result of fetchone() on an empty iterator of the server-side cursor.
    This None makes _PostgresServerSideCursorDecorator non-empty iterator.

I decided to remove rows attribute because it could consist of a maximum of one row, and also psycopg2 server-side cursor have .scroll() method and it is safe to call .scroll() on empty resultset even without .fetchone()

closes #20007

@boring-cyborg boring-cyborg Bot added area:providers provider:google Google (including GCP) related issues labels Feb 3, 2022

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If .scroll(-1) is more clear then I change

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like your current approach better.

@kazanzhy kazanzhy Feb 4, 2022

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the misleading.

First of all server-side cursor doesn't support relative mode by default. It should be initialized like conn.cursor(name='...', scrollable=True). But absolute mode worked without initialization.

But both scrolls cause a reset of .description attribute. So, unfortunately, they could not be used.

@kazanzhy

This comment was marked as outdated.

@kazanzhy kazanzhy force-pushed the pg_server_side_cursor_on_empty_20007 branch from 80c506a to e6db1c7 Compare February 3, 2022 22:53
@kazanzhy kazanzhy changed the title Fixed error in PostgresToGCSOperator on the empty resultset Fixed PostgresToGCSOperator(use_server_side_cursor=True) fail on empty resultset Feb 3, 2022
@kazanzhy kazanzhy changed the title Fixed PostgresToGCSOperator(use_server_side_cursor=True) fail on empty resultset Fixed PostgresToGCSOperator fail on empty resultset for use_server_side_cursor=True Feb 3, 2022
@uranusjr

uranusjr commented Feb 4, 2022

Copy link
Copy Markdown
Member

Actually, I think _PostgresServerSideCursorDecorator could be removed if

Are these existing tests to verify the behaviour of use_server_side_cursor? This suggested change makes sense to me but maybe I missed some cases expected by users.

@kazanzhy kazanzhy marked this pull request as ready for review February 4, 2022 17:57
@kazanzhy kazanzhy requested a review from turbaszek as a code owner February 4, 2022 17:57
@kazanzhy

kazanzhy commented Feb 4, 2022

Copy link
Copy Markdown
Contributor Author

Today I made few tests:

Current version (code from main branch):

  • Query returns rows and use_server_side_cursor=False: file with data
  • Query returns rows and use_server_side_cursor=True: file with data
  • Query returns nothing and use_server_side_cursor=False: empty file
  • Query returns nothing and use_server_side_cursor=True: ERROR

Updated version (this PR code):

  • Query returns rows and use_server_side_cursor=False: file with data
  • Query returns rows and use_server_side_cursor=True: file with data
  • Query returns nothing and use_server_side_cursor=False: empty file
  • Query returns nothing and use_server_side_cursor=True: empty file

The last one is not what is expected in the issue but I think creating the empty file is the right behavior

I'm expected, that operator on empty table not creating file and no upload it on Google Cloud.

@kazanzhy kazanzhy force-pushed the pg_server_side_cursor_on_empty_20007 branch 2 times, most recently from 512f28b to 1584d42 Compare February 5, 2022 23:28
@potiuk potiuk closed this Feb 6, 2022
@potiuk potiuk reopened this Feb 6, 2022
@kazanzhy kazanzhy force-pushed the pg_server_side_cursor_on_empty_20007 branch from 1584d42 to 6321d54 Compare February 9, 2022 20:35
@kazanzhy kazanzhy force-pushed the pg_server_side_cursor_on_empty_20007 branch from 6321d54 to ad1d827 Compare February 11, 2022 10:24
@potiuk potiuk merged commit 2eb1056 into apache:main Feb 15, 2022
@kazanzhy kazanzhy deleted the pg_server_side_cursor_on_empty_20007 branch February 16, 2022 09:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:providers provider:google Google (including GCP) related issues

Projects

None yet

Development

Successfully merging this pull request may close these issues.

PostgresToGCSOperator fail on empty table and use_server_side_cursor=True

3 participants