[Rust] StructArray: handling duplicate field names

The arrow spec leaves the solution of `duplicate field names` to implementors.

The C++'s solution: ignore or raise error, the Java's solution: ignore, append, replace or raise error. Both use ignore as the default. Here is the references:
- <https://github.com/apache/arrow/blob/57376d28cf433bed95f19fa44c1e90a780ba54e8/cpp/src/arrow/type.cc>
- <https://github.com/apache/arrow/blob/25c736d48dc289f457e74d15d05db65f6d539447/java/vector/src/main/java/org/apache/arrow/vector/complex/AbstractStructVector.java>
  
  I'm not expert at database or data science, but as far as I know, in the traditional RDBMS domain, it's unusual to allow duplicate field names. Further more, in the data analysis domain, perhaps it's usual to normalize/clean various kind of bad/dirty data **interactively** with tools like `pandas`?
  
  Back to the problem, I have an example: given duplicate field names A A A B B, the user who knows actual data MAY choose to: replace first A with second A and append third A, and ignore second B. Or the duplication was just mistake?
  
  Quote from [~nevi_me]: "I also prefer raising an error by default, as that'll make users aware very quickly". Is not acceptable if we silently append/ignore/replace duplicate fields, resulting unexpected results that user does not aware at all.
  
  If we choose to support `replace`, `ignore` or `append`, at least we must let user control the exact behavior.  For IPC data, perhaps custom metadata (for file, message and field) is the only choice. I suggest just record this problem here, keep raising error until it's really necessary to support other solutions.

**Reporter**: [Qingyou Meng](https://issues.apache.org/jira/browse/ARROW-11178) / @mqy

<sub>**Note**: *This issue was originally created as [ARROW-11178](https://issues.apache.org/jira/browse/ARROW-11178). Please see the [migration documentation](https://github.com/apache/arrow/issues/14542) for further details.*</sub>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Rust] StructArray: handling duplicate field names #27083

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[Rust] StructArray: handling duplicate field names #27083

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions