Recursive non-JSON dataclass serialization#1565
Conversation
|
Thank you for opening this pull request! 🙌 These tips will help get your PR across the finish line:
|
|
I'd like to better understand in what casees/where this functionality is used However, I think python should probably be the source of truth for a Dataclass schema, and not the wire format. In general I struggle to see how reconstructing the type from a wire serialized schema will ever be safe. You could do something like this (even without JSONSchema). But you won't be able to capture, e.g. functions defined on the dataclass, defaults values, special constructors etc. Also it seems like a good number of transformers won't implement Instead, it seems much safer to simply make a new dataclass TypeTransformer for each dataclass that you encounter: Afaict the only risk this runs is if someone subclasses a dataclass, you could accidently use the Transformer for the superclass, rather than creating a new transformer for the subclass. This is easily solved by making the This has the added advantage of actually making the generic typing of Again, this may not be necessary if we can just dump |
|
There doesn't appear to be a valid EDIT: My guess is the JSON Schema is used by the UI and CLI to generate forms/input yamls. But I can't find where this happens. |
929bef6 to
2303d2b
Compare
Yeah I think we will need to update both the console (for displaying and building input forms), as well as Would love someone more familiar with these side of things to take this on, or at least provide pointers to what needs to be changed. |
|
yup. I'll help. let's get it merged first. |
|
Another note: I needed to rework In reality there's no way to serialize an int-key map in protobuf (structs always use string keys). There were even a couple tests that codified this behavior (creating a dict with an int key, and expecting it to be a string key), but this felt really quite confusing to users, so I removed it. Technically this is a breaking change, but I kinda doubt anyone is relying on this functionality given how weird it is. Might be a way of restoring this behavior in a more principled way. |
wild-endeavor
left a comment
There was a problem hiding this comment.
hey @elibixby - thinking about it a bit more, is it a big deal from here to preserve the current behavior for dataclass_json? that is, if it's a dataclass_json, still use the existing code. If it's a dataclass dataclass, then use the new logic.
I'm concerned about breaking existing users. Currently a dataclass_json gets encoded to a literal Scalar Generic. In the new world it would be a Literal with a map, and also represents a scalar object in Python as a map object in Flyte. Let me think about this a bit more, it feels a bit awkward right?
cc @pingsutw - what do you think?
|
@wild-endeavor I agree that leaving old logic in place for backwards compatibility is worth considering especially if there are concerns with flytectl or console version mismatch. However I don't agree that this is an awkward use of the python <-> proto mapping. A dataclass is much closer to a map than it is to a scalar. A dataclass is defined by having a map EDIT: The only slightly awkward thing is the stuffing of the type schema into a struct when it could just as easily be represented with a map field on |
|
Okay that sounds good. We can make this new transformer operate as a map/dictionary type. But yeah if we could preserve the existing dataclass_json behavior, I think that will be good. So no change for dataclass_json, this new logic for pure dataclass. is that possible? And then we can do a generic json transformer, and think about how to pull out the logic from run.py as a second/third step. what do you think? |
|
@elibixby let's not remove the code in current dataclass transformer for now. @dataclass_json
@dataclass
class DatasetStruct(object): # -> use dataclassJsonTransformer
...
@dataclass
class DatasetStruct(object): # -> use dataclassTransformer
...This way we can make sure it will not break current users. |
|
Any update on this? I think this would help resolve an issue I'm seeing with the existing transformer! Happy to pair/help on any remaining outstanding issues! |
Hey, I just haven't had time to restore the original json dataclass transformer to work along side. I've been focused on a couple more pressing issues with the type system that came out of this CL (namely writing an escape hatch for the type system in #1615 and changing pickling to be a universal default rather than only for top level types) I think everything that is needed for merged is here, just some old code/tests need to be restored alongside the new stuff. If you'd be interested in helping with that I'd appreciate it! |
pingsutw
left a comment
There was a problem hiding this comment.
We need to rename current DataclassTransformer to DataclassJsonTransformer, and move the code that @elibixby added to DataclassTransformer. last, we need to update this line. if python_type has @dataclass_json decorator, it should return DataclassJsonTransformer; otherwise, it should return DataclassTransformer
@ggydush would you like to help updating this PR
|
@pingsutw taking a look today! Will see what I can do! |
…ydush/dataclass-transform
…ydush/dataclass-transform
…ydush/dataclass-transform
TL;DR
Recursively call out to
TypeEngineto serialize/deserialize dataclasses asLiteralMaps.Type
Are all requirements met?
Complete description
This solves two problems:
nn.Modulefield, nor with a field for a type where a user writes a customTypeTransformer)With some additional backend work, this could also solve: flyteorg/flyte#1670, since the dataclass would be serialized as a proto and not a json-struct, individual submessages would be accessible to the Go backend without parsing arbitrary json. If this were implemented we could potentially eliminate the need for separate
typing.NamedTuplesupport as dataclass would support a superset of the functionality of atyping.NamedTuple.Concerns:
LiteralTypedoesn't support any good annotation to indicate a non-univariate map. This information can be stuffed insidemetadataas Is currently done with json dataclasses, but the "Right" way to do this is likely to add a mapping field toLiteralTypewhich maps to subtypes and represents the schema of a multivariate map. As a bonus, this would remove some of the custom logic used for serializing function signatures.flytectlas well as the Flyte UI, currently usemetadatato perform yaml file generation, and UI form generation respectively.Tracking Issue
flyteorg/flyte#3359
Follow-up issue
NA
OR
https://github.com/flyteorg/flyte/issues/