Hey!
I'm trying to read json using explicit schema as so:
Input file (issue.json):
{"id": "value", "nested": {"value": 1}}
{"id": "value", "nested": {"value": 1}}
Code:
import pyarrow.json as pj
import pyarrow as pa
schema = pa.schema([
pa.field("id", pa.string(), nullable=False),
pa.field("nested", pa.struct([pa.field("value", pa.int64(), nullable=False)]))
])
table = pj.read_json('./issue.json', parse_options=pj.ParseOptions(explicit_schema=schema))
print(schema)
print(table.schema)
But the table schema is different - it doesn't contain the not null constraint.
Provided explicit schema:
id: string not null
nested: struct<value: int64 not null>
child 0, value: int64 not null
Table schema:
id: string
nested: struct<value: int64>
child 0, value: int64
I was trying also casting the table schema (table.cast(schema) and it works for top level not null constraint but for nested struct it throws an error:
pyarrow.lib.ArrowTypeError: cannot cast nullable field to non-nullable field: struct<value: int64> struct<value: int64 not null>
Is there another way to force the schema?
Hey!
I'm trying to read json using explicit schema as so:
Input file (
issue.json):{"id": "value", "nested": {"value": 1}} {"id": "value", "nested": {"value": 1}}Code:
But the table schema is different - it doesn't contain the not null constraint.
Provided explicit schema:
Table schema:
I was trying also casting the table schema (
table.cast(schema) and it works for top level not null constraint but for nested struct it throws an error:Is there another way to force the schema?