[opt](catalog) support using loading cache for db/table list in external catalog (#33610)#34596
Merged
morningman merged 1 commit intoapache:branch-2.1from May 9, 2024
Merged
Conversation
|
Thank you for your contribution to Apache Doris. Since 2024-03-18, the Document has been moved to doris-website. |
…nal catalog (apache#33610) 1. **Master FE** uniformly retrieves table information and generates the corresponding `id -> name` mapping. 2. The `id -> name` mapping is stored in Doris's metadata and persisted. 3. **Master FE** synchronizes this information with other FEs via **EditLog**. 4. To update the table information, a `refresh` command must be executed or the metadata synchronized through an **HMS event**. - **Advantage**: All FEs can see a consistent list of tables as the information is uniformly obtained from the Master FE, preventing discrepancies in table visibility across different FEs. - **Disadvantage**: There is an inability to promptly perceive changes in the tables. For example, a new table on the Hive side is not immediately visible on the Doris side and requires a refresh or periodic metadata refresh for visibility. - **Catalog** adds a new property `use_meta_cache`. Default is `false`. If set to `true`, it will use an independent caching method to synchronize table information. - Once enabled, table information will no longer be uniformly obtained by Master FE but will instead be independently fetched by each FE. - Each FE has its own cache of the Database and Table list, implemented using the **Caffeine library**. - This cache synchronously loads table information when accessed. If the cache does not exist, it will directly access HMS for table information. - **Behaviors**: - Different FEs may see different table information due to different loading times, but they will eventually be consistent. - New tables created on the Hive side can be queried directly in Doris, but may not be visible in `show databases` or `show tables`. - Tables deleted on the Hive side will still appear in `show databases/tables` but will be inaccessible. - All caches will refresh at most every 10 minutes. - **Compatibility**: - For already created catalog, after upgrade, the `use_meta_cache` is `false`. - For newly created catalog, if `use_meta_cache` is not set, set it as `false`. - Can not modify `use_meta_cache` after being created. - **MetaCache**: - A general Cache class responsible for caching Database/Table information, including two LoadingCaches for storing "name lists" and "name-to-object" caches. - **ID Generation Rules**: - As table information is no longer uniformly fetched by the Master FE, a consistent rule must exist to ensure that each FE generates the same ID for the same table. Here, we use the absolute value of the top 8 bits of the sha256 hash of the table name as the object's ID. This way, the same name generates the same ID, but the ID is no longer globally unique and is unique only within the Catalog or Database level. - Remove some unused methods such as `getIdToTable()` and `getIdToDb()` I have run the p0 for both `use_meta_cache = true` and `use_meta_cache=false`
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
bp #33610