Conversation
undefined Signed-off-by: Hammad Khan <480812+hammadk373@users.noreply.github.com>
Signed-off-by: Dixon Whitmire <dixonwh@gmail.com> Signed-off-by: Hammad Khan <480812+hammadk373@users.noreply.github.com>
Signed-off-by: Dixon Whitmire <dixonwh@gmail.com> Signed-off-by: Hammad Khan <480812+hammadk373@users.noreply.github.com>
Signed-off-by: Hammad Khan <480812+hammadk373@users.noreply.github.com>
dixonwhitmire
left a comment
There was a problem hiding this comment.
@hammadk373 thank you very much for these updates! The documentation, including docstrings, is really helpful!
My comments are limited to documentation and test case suggestions. I will go ahead and approve since the core implementation looks sound.
Thanks!
| <td> | ||
| <pre> | ||
| { | ||
| "name": "add_constant", |
There was a problem hiding this comment.
It may be helpful to change the task name here to "validate_email" or something similar
There was a problem hiding this comment.
oh my bad.. that task is called validate_value because it not just limited to emails.
| </td> | ||
| <td> | ||
| <b>secondary_data_source:</b> path to the secondary data file. Can be relative to the data-contract dictionary or absolute.<br> | ||
| <b>join_type:</b> {'left', 'right', 'outer', 'inner', 'cross'} which correspond roughly to the RDB join types of the same name.<br> |
There was a problem hiding this comment.
Since the RDB acronym isn't defined it may be helpful to instead use the term "relational database".
|
|
||
| :param data_frame: The input DataFrame | ||
| :param secondary_data_source: path to the secondary data file. Can be relative to the data-contract dictionary or absolute. | ||
| Also supports http, ftp, s3 and gs paths |
There was a problem hiding this comment.
Confirming that pandas is supporting http, ftp, s3, etc paths and that the new "optimized-streaming" extra is not required.
| :param data_frame: The input DataFrame | ||
| :param secondary_data_source: path to the secondary data file. Can be relative to the data-contract dictionary or absolute. | ||
| Also supports http, ftp, s3 and gs paths | ||
| :param join_type: {'left', 'right', 'outer', 'inner', 'cross'} which correspond roughly to the RDB join types of the same name. |
There was a problem hiding this comment.
relational database join types may be more descriptive than RDB
| assert actual["input1"][1] == "FailedToMatch" | ||
|
|
||
|
|
||
| def test_join_data(): |
There was a problem hiding this comment.
It may be helpful to include additional test "cases" for the inner and right join conditions. For the outer join conditions we should also have a separate record or two which do not have a matched value, just so we can verify it.
There was a problem hiding this comment.
added a simple test each for inner, left, right and outer. I will trust that more thorough testing of these "joins" is being done in pandas :).
Signed-off-by: Hammad Khan <480812+hammadk373@users.noreply.github.com>
|
I have address all the review comments. @klwhaley can we cut a new beta release when you get a chance (after this PR is merged.) |
uses the new
parse_uri_schemefunction from PR #51. so that should be merged firstfixes: #49
fixes: #52