feat: implement InMemoryCatalog as a subclass of SqlCatalog#1140
feat: implement InMemoryCatalog as a subclass of SqlCatalog#1140kevinjqliu merged 3 commits intoapache:mainfrom
Conversation
21d6d2f to
aa6efc6
Compare
aa6efc6 to
6089ca2
Compare
6089ca2 to
b6af81a
Compare
|
@kevinjqliu I applied what you suggested in the comment above, could you recheck it now? |
kevinjqliu
left a comment
There was a problem hiding this comment.
added some nit comments. Thanks for working on this! The in-memory catalog is used at a bunch of places in the tests, so changing it has cascading effects
pyiceberg/catalog/memory.py
Outdated
| This is useful for test, demo, and playground but not in production as data is not persisted. | ||
| """ | ||
|
|
||
| def __init__(self, name: str, warehouse: str = "file:///tmp/warehouse", **kwargs: str) -> None: |
There was a problem hiding this comment.
nit: let's use something like /tmp/iceberg/warehouse to not conflict with other tmp directories. Also I'm not sure if this works when the warehouse directory is not created yet.
There was a problem hiding this comment.
nit: i'd like to keep this as test_base because I want to parameterize all tests to make sure all the catalogs have the same behaviors (see #813)
| DROP_NOT_EXISTING_NAMESPACE_ERROR = "Namespace does not exist: \\('com', 'organization', 'department'\\)" | ||
| NO_SUCH_NAMESPACE_ERROR = "Namespace com.organization.department does not exists" |
There was a problem hiding this comment.
This should be done in a separate PR
tests/cli/test_console.py
Outdated
| result = runner.invoke(run, ["--output=json", "properties", "get", "table", "doesnotexist"]) | ||
| assert result.exit_code == 1 | ||
| assert result.output == """{"type": "NoSuchTableError", "message": "Table does not exist: ('doesnotexist',)"}\n""" | ||
| assert result.output == """{"type": "ValueError", "message": "Empty namespace identifier"}\n""" |
There was a problem hiding this comment.
nit: shouldnt this be NoSuchTableError? maybe the namespace needs to be created first
kevinjqliu
left a comment
There was a problem hiding this comment.
added some nit comments. Thanks for working on this! The in-memory catalog is used at a bunch of places in the tests, so changing it has cascading effects
|
@Fokko wydt of this change? i remember we had past discussions on adding a "new" catalog implementation |
|
@hussein-awala Thanks for working on this 🚀 @kevinjqliu Regarding the new catalogs, my main concern was a proliferation of new catalogs, and that they would lack maintenance. I do like this change for two reasons:
I'm positive about this change. The only consideration I could make is that we hide the |
I think it would be good to document the InMemoryCatalog, perhaps in the catalog section of the configuration page. |
|
hey @hussein-awala would you like to make the above changes on docs? This PR is almost ready! |
yes, I will make it ready ASAP |
fa2321e to
7f274a0
Compare
There was a problem hiding this comment.
Thanks for the PR! And the great docs.
I like that we can replace the old implementation in test but I'm on the fence about whether we should expose/advertise this as a catalog type. It is useful for certain situations and for testing, but im not sure how much value there is to allow users to do
load_catalog()
and
catalog:
default:
type: in-memory
warehouse: /tmp/pyiceberg/warehouse
WDYT @Fokko ?
|
Or just: catalog = load_catalog('default', 'type'='in-memory', 'warehouse'='/tmp/pyiceberg/warehouse')I agree that this catalog impl is mostly focussed on testing/demonstration. If you would use a Jupyter notebook, each time you restart the kernel, then you end up with a fresh catalog (don't have to clean up any old stuff lingering around). |
|
Thanks for the contribution @hussein-awala and thanks for the review @Fokko |
closes: #1110
This PR implement a new catalog
InMemoryCatalogas a subclass ofSqlCatalogwith SQLite in-memory.