Distributed datacontract and regex support#48
Conversation
Signed-off-by: Hammad Khan <480812+hammadk373@users.noreply.github.com>
Signed-off-by: Hammad Khan <480812+hammadk373@users.noreply.github.com>
Signed-off-by: Hammad Khan <480812+hammadk373@users.noreply.github.com>
Signed-off-by: Hammad Khan <480812+hammadk373@users.noreply.github.com>
Signed-off-by: Hammad Khan <480812+hammadk373@users.noreply.github.com>
|
|
||
| ### FileDefinition | ||
| The top-level key within a FileDefinition serves as the FileDefinition name. This name is matched against the input CSV file (case-insensitive) | ||
| The top-level key within a FileDefinition serves as the FileDefinition name. This name is matched against the input CSV file using either string match (case-insensitive) or regex (case-sensitive) [see general.regexFilenames setting] |
There was a problem hiding this comment.
nit: can you add periods to complete the sentences in the next two lines
|
|
||
|
|
||
| ## Alternative Datacontract locations | ||
| CsvToFHIR uses the [smart_open](https://github.com/RaRe-Technologies/smart_open) library to read the Datacontract and any referenced file definitions within it. |
There was a problem hiding this comment.
is this now used for all reading of the data contract? even when not in external cloud storage...is that what we want?
There was a problem hiding this comment.
It looks like the library provides an "open" function which effectively replaces the "open" function from the base library. We can update our setup here to make cloud storage an "opt-in" feature and then check for it's existence at runtime.
I know that this complicates things a bit, but in the event that we have issues with the library we will still be able to function.
There was a problem hiding this comment.
Once the other issues, if any, are addressed, I will be happy to submit some changes for this PR to support the "optional" open.
| assert d is not None | ||
|
|
||
|
|
||
| def test_datacontract_with_invalid_external_config( |
There was a problem hiding this comment.
are we missing a test for data contract with external config that is successful?
There was a problem hiding this comment.
The test for a successful external config is here: https://github.com/LinuxForHealth/CsvToFHIR/blob/distributed_datacontract/tests/test_converter.py#L388. Its a more thorough test that checks both the successful loading and successful conversion of the csv all-in-one.
Having said that I think a more directed test at the functionality is a good idea. so I added one.
Signed-off-by: Hammad Khan <480812+hammadk373@users.noreply.github.com>
Signed-off-by: Hammad Khan <480812+hammadk373@users.noreply.github.com>
|
@klwhaley are there any further changes you want me to look at? @dixonwhitmire will you be able to tackle the optional use of the library or do you want me to try and implement that? |
Hi @hammadk373 - if @klwhaley feels the issues are addressed I can take a look at the optional use of the library later today. At a high level I was going to update the "setup" to make it an optional component like we've done with jupyter and then use try/except to ensure our "open" call works a-ok with or without the dependency. If you would prefer to get started on that, that's fine with me. Just let me know. |
|
I'm going to approve and merge. And Dixon can do his 'optional' change for the open in another PR. Then we can cut a release. Thanks! |
|
Tracking smart_open as an "optional" component in #50 |
undefined Signed-off-by: Hammad Khan <480812+hammadk373@users.noreply.github.com>
* Distributed datacontract and regex support (#48) undefined Signed-off-by: Hammad Khan <480812+hammadk373@users.noreply.github.com> * Refactoring smart_open usage as an optional depdendency Signed-off-by: Dixon Whitmire <dixonwh@gmail.com> Signed-off-by: Hammad Khan <480812+hammadk373@users.noreply.github.com> * removing unused imports Signed-off-by: Dixon Whitmire <dixonwh@gmail.com> Signed-off-by: Hammad Khan <480812+hammadk373@users.noreply.github.com> * new tasks Signed-off-by: Hammad Khan <480812+hammadk373@users.noreply.github.com> * Update tasks.py Signed-off-by: Hammad Khan <480812+hammadk373@users.noreply.github.com> * address review comments Signed-off-by: Hammad Khan <480812+hammadk373@users.noreply.github.com> Signed-off-by: Hammad Khan <480812+hammadk373@users.noreply.github.com> Signed-off-by: Dixon Whitmire <dixonwh@gmail.com> Co-authored-by: Dixon Whitmire <dixonwh@gmail.com>
Changes:
generalsection of the data contractregexFilenames. It defaults to false to ensure that legacy behavior is the default.open(...)function for reading data contract to using the smart_open library. This has the exact same behavior (in fact it internally uses python'sopenfunction) when dealing with files on the local file system. However, it also add support for cloud storage solutions such as S3, and Azure Object Storage. See doc updates in the PR for more informationFixes: #47