-
Notifications
You must be signed in to change notification settings - Fork 1.5k
WIP - Version of PouchDB with different storage format #4984
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
a9e1ee2 to
6392f32
Compare
|
Thoughts:
I'm still wondering if there's a case to be made for another design for this adapter, which would work across IE/Safari/Firefox/Chrome by avoiding secondary indexes and using stringification for keys on a single objectSTore, but it's incumbent upon me to implement such a thing and compare the perf relative to your implementation, and frankly it's just unlikely I'll have the time to do so. Luckily we have a plugin architecture that allows for third-party adapters, so in the future we can always make these adapters pluggable. |
So yeh, one of the benefits of map/reduce is that it only uses the core storage api so it will work against this and hopefully be faster but certainly wont be able to benefit from built in indexes, what you described was definitely along my thinking that this could work very well with mango selectors / without map reduce. Going to defer most decisions about how to release until we see if this experiment works although I am fairly sure we wont want to add yet another adapter to core, its possible we could do concurrent releases and a migration plugin could work, but pretty much all depends on what this looks like when its finished, so will work on that (currently passes anything from |
f287a91 to
b7fffee
Compare
|
Just a heads up on the progress of this, on holiday at the moment so going a bit slower but made good progress beforehand, the basic api is complete, bulkDocs (put / post / delete), alldocs, get, changes are all working pretty much, I think the only major things still to do are attachments and compaction. Going to avoid testing performance differences until they are functionally identical but I am now fairly confident that its possible to have a working implementation in this structure. I am considering a different approach for attachments to our current setup, right now we have 3 tables involved in any work with attachments and I am considering removing moving them into the actual document records, this removes the optimisation of a single attachment being used in multiple documents being deduplicated, however that is an optimisation for a almost an antipattern and it severely complicates our implementation, we could do something like |
|
Yep, I agree with that approach for attachments. It's a premature optimization and severely complicates the code. If somebody wants to avoid excessive attachment storage, then they should use compaction. Also this brings us in line with CouchDB's implementation, which does not dedupe. OTOH you will need to change many tests in |
|
ok ran up against my first issue, https://github.com/pouchdb/pouchdb/blob/master/src/adapter.js#L671 only sends enough information for the adapter to read seperately stored digests, not read binaries stored within a document. Will redo the arguments in a way thats compatible for both methods |
|
Attachments issue is fixed in the latest version and attachments are about half done, this and compaction then we are good to do some comparisons :) |
777a7d3 to
1322a8f
Compare
|
passes: 1557 failures: 0 duration: 322.94s :) Few notes to remind myself:
Currently the performance tests are showing this as being slower which is super confusing, its doing far far less, its possible i lost some optimisations, fairly confident either something is up with my testing or there is something big missing in the code, looking at what the code did before vs what it does now I am still confident this will be significantly faster, also for size comparison its |
e495ad2 to
1f82351
Compare
|
So not on a lot of peoples radad but this is now passing 100% in chrome + fx so taking a look at what native support for In an ideal world we are compatible with current pouchdb find, which looks like User defines an index which operates on existing data at runtime then has the ability to use that index for queries. The issue here is that index creation in indexedDB is done at schema creation time and requires version update to the schema which we currently use in pouch to handle data format migrations. Supporting that will require us closing idb on index creation and doing a schema upgrade (in the cases we need one), figuring it all out is hurting my head so right now I want to focus on the more basic case of predefined indexes and if we get that working, figuring out if we can / want to backport that to get runtime definition support (not 100% sure we will) So by predefined indexes I mean code that looks like We can store some version / index information in the database, however the issue is that we need to know the version of the database we are opening before we open it in order to trigger schema upgrades. Right now the only proposal I can think of is Proposal 1. Multiple databasesRight now (in idb-next) we use 2 core objectStores, Cons:
|
|
Small update, this is passing 100% on firefox (so chrome and firefox are 100%), safari passes but intermitently, I havent found a consistent failure yet but on Safari 9.1.1 the test.basics pass like 50% of the time, on the Tech Preview they are far more stable which makes me fairly confident that we can have this working / using indexeddb in safari UPDATE: once I let the full suite run, passes: 1540 failures: 13 duration: 1700.55s in Safari tech preview, all tests were timeouts on high level things / retry etc, so it will pass 100% in the tech preview for certain |
06a893f to
d084368
Compare
|
Awesome!! 🎉 Any chance the performance issues were sorted out too? As for pouchdb-find, I'm still of the mindset that we can't really solve the problem of dynamic indexes by closing/reopening the database, because I suspect it will fail in multi-tab environments or worker/serviceworker environments (race conditions). I also imagine this will have wonky effects in Edge due to race conditions in opening/closing DBs. I say we should punt on pouchdb-find/mapreduce for now and just continue with dependentDbs etc., for the time being anyway. Also once the monorepo is merged, we can ship this adapter as an optional adapter and hopefully start to get feedback from early adopters. Also I'm not against the idea of multi-DB - it's a nice way to take advantage of parallellism, and I think inside of |
|
Right now this is marginally faster which is still somewhat of a surprise, right now every document read is a single idb read, document write = 1 idb read, 1 idb write (apart from when adapter.js gets in the way). I expected this to be significantly faster, theres still a few optimisations lost that I will take a look at, but a little faster is nice, its still far less code, less buggy and significantly cleaner. mapreduce just works since none of that is adapter specific, once monorepo is in I will get the changes outside the adapter stuff merged then we can see what merging this looks like, in the meantime I will work on pouchdb-find support with secondary indexes, in an ideal world we wouldnt need multiple databases, but I cant think of any way at the moment |
|
Closed, got a new rebased PR and opened issues for discussion |
Keeping this here to track progress (I have started this like 3 times and ended up losing my local branch).
The main idea here is to improve our storage format, our original format naively copied CouchDB's btree storage structure, CouchDB's structure is fine for Couch but not so great for us mostly because 1. Joins are slow and expensive, 2. Mistakes in compaction can easily lead to leaks.
The idea here is instead of having a document metadata table that points towards a document data table, we store the document data (including revisions) along with the metadata. Some of the possible benefits include:
doc.type, we can use underlying indexes fordoc.data.typeThis is a very very early prototype, right now I am going through the basics test suite and getting a test passing at a time, anyone is more than welcome to join in if they want :)