Skip to content

Support timestamp literals with precision specifier #7249

@westonpace

Description

@westonpace

Is your feature request related to a problem or challenge?

Postgres supports an optional precision specifier in timestamp literals (e.g. timestamp (3) '2021-01-01 00:00:00.123' . The postgres spec technically only allows 0-6 but given that Arrow timestamps support nanoseconds it would probably be best to support 0-9.

Describe the solution you'd like

For my purposes, It would be sufficient to only support precision values of 0, 3, 6, and 9 (seconds, milliseconds, microseconds, and nanoseconds) though it should be possible to support values that aren't a multiple of 3 since the expectation is that this value is only used for parsing the literal and it is not a constraint on the type at all (e.g. a timestamp(5) could be stored at microsecond resolution as long as the string is parsed correctly).

Ideally, output would look like the following:

❯ select arrow_typeof(timestamp (6) '2021-01-01 00:00:00.123456789');
+-------------------------------------------+
| arrow_typeof(Utf8("2021-01-01 00:00:00.123456")) |
+-------------------------------------------+
| Timestamp(Microsecond, None)              |
+-------------------------------------------+

❯ select arrow_typeof(timestamp (3) '2021-01-01 00:00:00.123456789');
+-------------------------------------------+
| arrow_typeof(Utf8("2021-01-01 00:00:00.123")) |
+-------------------------------------------+
| Timestamp(Millisecond, None)              |
+-------------------------------------------+

❯ select arrow_typeof(timestamp (0) '2021-01-01 00:00:00.123456789');
+-------------------------------------------+
| arrow_typeof(Utf8("2021-01-01 00:00:00")) |
+-------------------------------------------+
| Timestamp(Second, None)                   |
+-------------------------------------------+

❯ select arrow_typeof(timestamp (9) '2021-01-01 00:00:00.123456789');
+-------------------------------------------+
| arrow_typeof(Utf8("2021-01-01 00:00:00.123456789")) |
+-------------------------------------------+
| Timestamp(Nanosecond, None)               |
+-------------------------------------------+

Example postgres output: https://www.db-fiddle.com/f/oiHdDy1v78mC1zKbCFvWdV/0

Describe alternatives you've considered

A pretty usable workaround at the moment is to cast:

# These should be equivalent
timestamp (6) '2021-01-01 00:00:00'
arrow_cast(timestamp '2021-01-01 00:00:00', 'Timestamp(Microsecond, None)')

Unfortunately, this requires df-specific functions (arrow_cast) and it would also break backwards compatibility with Lance's current SQL parsing.

Additional context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions