Skip to content

[Python] Reading JSON with explicit schema is ignoring not null constraint #13177

@Kowol

Description

@Kowol

Hey!

I'm trying to read json using explicit schema as so:
Input file (issue.json):

{"id": "value", "nested": {"value": 1}}
{"id": "value", "nested": {"value": 1}}

Code:

import pyarrow.json as pj
import pyarrow as pa

schema = pa.schema([
    pa.field("id", pa.string(), nullable=False),
    pa.field("nested", pa.struct([pa.field("value", pa.int64(), nullable=False)]))
])

table = pj.read_json('./issue.json', parse_options=pj.ParseOptions(explicit_schema=schema))

print(schema)
print(table.schema)

But the table schema is different - it doesn't contain the not null constraint.

Provided explicit schema:

id: string not null
nested: struct<value: int64 not null>
  child 0, value: int64 not null

Table schema:

id: string
nested: struct<value: int64>
  child 0, value: int64

I was trying also casting the table schema (table.cast(schema) and it works for top level not null constraint but for nested struct it throws an error:

pyarrow.lib.ArrowTypeError: cannot cast nullable field to non-nullable field: struct<value: int64> struct<value: int64 not null>

Is there another way to force the schema?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions