Fix local (Windows) tests by cswartzvi · Pull Request #1302 · apache/hamilton

cswartzvi · 2025-04-06T02:57:21Z

I have been encountering errors and/or failures when running the test suite, locally, on a Windows machine. This pull request includes several changes aimed at allowing the test suite to pass. All changes, expect the first one, are confined to the offending tests.

Changes

File Handling Improvements:

hamilton/io/utils.py: Enhanced the get_file_metadata function to correctly handle Windows drive paths where the scheme from parse.urlparse may include the Windows drive letter.

Testing Fixture Updates:

tests/caching: Because the metadata_store and result_store used the same temporary directory, deletions during clean-up were running into Window's file share locking. Switched to the tmp_path_factory fixture and decoupled the paths for the metadata_store and result_store

Environment Variable Mocking:

tests/plugins/test_pandas_extensions.py: Added mocking for the TZDIR environment variable in the test_pandas_orc_reader test. Note this is due to how Windows interacts with the IANA timezone database.
tests/test_telemetry.py: Added mocking for the HAMILTON_TELEMETRY_ENABLED environment variable in telemetry configuration tests. Previous tests were changing os.environ directly leading to issues if the user already had ``HAMILTON_TELEMETRY_ENABLED` set.

Platform-Specific Test Adjustments:

tests/plugins/test_plotly_extensions.py: Added a platform check to skip the test_plotly_static_writer test on Windows. There are some issue with using the plotly dependency kaleido to generate static images on Windows.

How I tested this

N/A

Notes

N/A

Checklist

PR has an informative and human-readable title (this will be pulled into the release notes)
Changes are limited to a single goal (no scope creep)
Code passed the pre-commit check & code is left cleaner/nicer than when first encountered.
Any change in functionality is tested
New functions are documented (with a description, list of inputs, and expected output)
Placeholder code is flagged / future TODOs are captured in comments
Project documentation has been updated if adding/changing functionality.

Note: Colocation of the results and cache store w was causing issues with Window's file share locking.

Note reading the ORC format with pyarrow (through pandas) requires the IANA timezone database to be installed (`tzdata`) and thc `TZDIR` environment variables to be set.

Note: Plotly uses `kaleido` on windows to generate static images which has known issues. See plotly/Kaleido#110 or plotly/Kaleido#41 for more info.

ellipsis-dev

❌ Changes requested. Reviewed everything up to 6a9e7c1 in 1 minute and 23 seconds

More details

Looked at 108 lines of code in 6 files
Skipped 0 files when reviewing.
Skipped posting 14 drafted comments based on config settings.

1. hamilton/io/utils.py:37

Draft comment:
Review the condition for Windows drive paths. Relying on parsed.scheme.isalpha() might catch non-local schemes (e.g., 'http') if os.path.exists returns false, but it could be clearer by explicitly checking for a single-letter drive (e.g., len(scheme)==1).
Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 20% vs. threshold = 50%
The current code uses isalpha() which would match any alphabetic string, while Windows drive letters are always single characters. The comment's suggestion to use len(scheme)==1 is more precise for Windows drives. However, the code already has os.path.exists() as a second condition, which would prevent false positives from other schemes. The change would be marginally more precise but doesn't fix a real problem.
The current code already works correctly due to the os.path.exists check, and the suggested change doesn't fix any actual issues. The comment might be making the code more complex without adding value.
While the suggestion is technically more precise, the current implementation is already safe and functional. The added complexity isn't justified by any real-world benefit.
The comment should be deleted as it suggests a change that doesn't fix any actual problems and could make the code unnecessarily more complex.

2. tests/caching/metadata_store/test_base.py:48

Draft comment:
The fixture now uses tmp_path_factory.mktemp('result_store'). Verify that the folder name is appropriate for metadata_store; using 'result_store' might be confusing. Consider renaming for clarity if needed.
Reason this comment was not posted:
Confidence changes required: 50% <= threshold 50%
None

3. tests/caching/result_store/test_base.py:19

Draft comment:
The result_store fixture now uses tmp_path_factory for directory isolation. This change looks good for avoiding file lock issues on Windows.
Reason this comment was not posted:
Confidence changes required: 0% <= threshold 50%
None

4. tests/plugins/test_pandas_extensions.py:244

Draft comment:
Good use of mock.patch.dict to set the TZDIR environment variable. It might be helpful to add a comment documenting why this is required on Windows.
Reason this comment was not posted:
Confidence changes required: 0% <= threshold 50%
None

5. tests/plugins/test_plotly_extensions.py:16

Draft comment:
The skipif decorator for Windows in test_plotly_static_writer is appropriate given the issues with kaleido. Consider adding a short inline comment referencing the known dependency limitation.
Reason this comment was not posted:
Confidence changes required: 0% <= threshold 50%
None

6. tests/test_telemetry.py:69

Draft comment:
Using mock.patch.dict for TELEMETRY environment variable is a good practice. Documentation in Sphinx under telemetry configuration should note this approach for isolating environment settings during tests.
Reason this comment was not posted:
Confidence changes required: 0% <= threshold 50%
None

7. tests/caching/metadata_store/test_base.py:49

Draft comment:
Consider using a more descriptive and unique temporary directory name for the metadata_store fixture (e.g., 'metadata_store' instead of 'result_store') to avoid potential collisions with other fixtures.
Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 10% vs. threshold = 50%
The temporary directory name is only used for tests and doesn't affect functionality. While matching names might be slightly clearer, this is a very minor stylistic suggestion. The current name 'result_store' is not incorrect or misleading - the metadata store does store results. The comment about "avoiding collisions" seems speculative since pytest handles temp directory isolation.
The suggestion could improve code readability by maintaining consistent naming. The collision concern might be valid if multiple tests run in parallel.
The naming improvement is too minor to warrant a comment, and tmp_path_factory already handles directory isolation, so collisions aren't a real concern.
This comment should be deleted as it suggests a minor stylistic change that doesn't meaningfully improve the code.

8. tests/plugins/test_pandas_extensions.py:244

Draft comment:
Good use of mock.patch.dict to set the TZDIR environment variable for Windows compatibility. Consider adding a note in the Sphinx documentation (e.g., under a platform-specific testing section) about this workaround.
Reason this comment was not posted:
Confidence changes required: 0% <= threshold 50%
None

9. tests/plugins/test_plotly_extensions.py:16

Draft comment:
Using pytest.mark.skipif to skip the static image export test on Windows is a good practice. Consider documenting the known limitation regarding Plotly's Kaleido on Windows in the test documentation for future maintainers.
Reason this comment was not posted:
Confidence changes required: 0% <= threshold 50%
None

10. tests/test_telemetry.py:69

Draft comment:
Mocking the HAMILTON_TELEMETRY_ENABLED environment variable with @mock.patch.dict improves test isolation compared to direct os.environ changes. This pattern is effective for ensuring consistent telemetry behavior in tests.
Reason this comment was not posted:
Confidence changes required: 0% <= threshold 50%
None

11. hamilton/io/utils.py:19

Draft comment:
Typographical error: The phrase 'come chat so that we can make sure we don't break your code' is used in the docstring (line 19) of get_file_metadata. Consider rephrasing it (e.g., 'please contact us' or 'reach out to us') for clarity. This issue also appears in other docstrings and should be corrected consistently.
Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.

12. tests/caching/result_store/test_base.py:14

Draft comment:
Typo in the error message: the error string references _instantiate_metadata_store(), but should correctly reference _instantiate_result_store() to match the function name. Please update this to avoid confusion.
Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.

13. tests/plugins/test_pandas_extensions.py:197

Draft comment:
Typo alert: In the test_pandas_feather_writer function, the assertion is using PandasStataWriter.applicable_types() instead of PandasFeatherWriter.applicable_types(). Also, the file path uses the extension '.dta', which is typically for Stata files. Please update these to reference the Feather writer and use an appropriate file extension (e.g., '.feather').
Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.

14. tests/plugins/test_pandas_extensions.py:153

Draft comment:
Typo/consistency note: In the test_pandas_html_writer function, the output file is named 'test.xml'. Since this test is for an HTML writer, consider renaming the file to use an '.html' extension for clarity and consistency.
Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.

Workflow ID: wflow_qSWPBSAukA6rKurJ

Want Ellipsis to fix these issues? Tag @ellipsis-dev in a comment. You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

elijahbenizzy

This looks good, and all tests pass. Unfortunately we don't hav ea windows github action VM, although I'd be surprised if it was hard. But I think this "does no harm" and likely makes dev easier. Shipping!

cswartzvi added 5 commits April 5, 2025 15:55

Patch os.environ for local tests

0ff0280

Allow for Windows drive in schema

b3311b3

Separate results and cache store locations

57d5383

Note: Colocation of the results and cache store w was causing issues with Window's file share locking.

Set time zone env for Windows

1e429af

Note reading the ORC format with pyarrow (through pandas) requires the IANA timezone database to be installed (`tzdata`) and thc `TZDIR` environment variables to be set.

Skip plotly static image generation on windows

6a9e7c1

Note: Plotly uses `kaleido` on windows to generate static images which has known issues. See plotly/Kaleido#110 or plotly/Kaleido#41 for more info.

ellipsis-dev Bot reviewed Apr 6, 2025

View reviewed changes

Comment thread hamilton/io/utils.py Outdated

Refactor get_file_metadata to improve Windows drive handling

0d55de9

elijahbenizzy approved these changes Apr 7, 2025

View reviewed changes

elijahbenizzy merged commit 25d1881 into apache:main Apr 7, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix local (Windows) tests#1302

Fix local (Windows) tests#1302
elijahbenizzy merged 6 commits into
apache:mainfrom
cswartzvi:fix_local_tests

cswartzvi commented Apr 6, 2025

Uh oh!

ellipsis-dev Bot left a comment

Uh oh!

Uh oh!

elijahbenizzy left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

cswartzvi commented Apr 6, 2025

Changes

File Handling Improvements:

Testing Fixture Updates:

Environment Variable Mocking:

Platform-Specific Test Adjustments:

How I tested this

Notes

Checklist

Uh oh!

ellipsis-dev Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

elijahbenizzy left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants