WIP - Version of PouchDB with different storage format #4984

daleharvey · 2016-03-20T23:52:05Z

Keeping this here to track progress (I have started this like 3 times and ended up losing my local branch).

The main idea here is to improve our storage format, our original format naively copied CouchDB's btree storage structure, CouchDB's structure is fine for Couch but not so great for us mostly because 1. Joins are slow and expensive, 2. Mistakes in compaction can easily lead to leaks.

The idea here is instead of having a document metadata table that points towards a document data table, we store the document data (including revisions) along with the metadata. Some of the possible benefits include:

No joins, so everything is just faster in general
Secondary indexes, with a single table format, if the user wants an index on doc.type, we can use underlying indexes for doc.data.type
Its much harder to leak during compaction, and easier to fix our mistakes if we do.
Various improvements over legacy, backwards compatibility with PouchDB and wider browser compat are not issues at this time, it will target latest firefox and chrome. There are some things we will be able to get right first time, like a stateless constructor some things will be less clear, but its a priority that this remain a very clean implementation.

This is a very very early prototype, right now I am going through the basics test suite and getting a test passing at a time, anyone is more than welcome to join in if they want :)

nolanlawson · 2016-03-22T14:58:37Z

Thoughts:

This sounds great. We can have a new adapter that works flawlessly on Firefox/Chrome, then falls back to old idb/websql adapters for Safari/IE/oldAndroid. A separate pouchdb-migration plugin could also migrate user data.
However, a third adapter will balloon the core size. Puts more pressure on someone (me probably 😅) to get custom builds in finally.
This is a good time to just drop map/reduce from core entirely. It won't work with what you describe anyway (doc.type magically indexing on doc.data.type). We can't allow arbitrary user map() functions if we want to use built-in secondary indexes.
That said, it's highly optimistic to say that pouchdb-find/Mango will map 1-to-1 to IndexedDB secondary indexes. I'm sure there will be edge cases and we'll need to decide whether to support them or just note them as a difference between us and Couch.

I'm still wondering if there's a case to be made for another design for this adapter, which would work across IE/Safari/Firefox/Chrome by avoiding secondary indexes and using stringification for keys on a single objectSTore, but it's incumbent upon me to implement such a thing and compare the perf relative to your implementation, and frankly it's just unlikely I'll have the time to do so. Luckily we have a plugin architecture that allows for third-party adapters, so in the future we can always make these adapters pluggable.

daleharvey · 2016-03-22T15:24:09Z

This is a good time to just drop map/reduce from core entirely. It won't work with what you
describe anyway (doc.type magically indexing on doc.data.type). We can't allow
arbitrary user map() functions if we want to use built-in secondary indexes.

So yeh, one of the benefits of map/reduce is that it only uses the core storage api so it will work against this and hopefully be faster but certainly wont be able to benefit from built in indexes, what you described was definitely along my thinking that this could work very well with mango selectors / without map reduce.

Going to defer most decisions about how to release until we see if this experiment works although I am fairly sure we wont want to add yet another adapter to core, its possible we could do concurrent releases and a migration plugin could work, but pretty much all depends on what this looks like when its finished, so will work on that (currently passes anything from basics and all_docs tests that dont use changes

daleharvey · 2016-04-14T16:43:58Z

Just a heads up on the progress of this, on holiday at the moment so going a bit slower but made good progress beforehand, the basic api is complete, bulkDocs (put / post / delete), alldocs, get, changes are all working pretty much, I think the only major things still to do are attachments and compaction. Going to avoid testing performance differences until they are functionally identical but I am now fairly confident that its possible to have a working implementation in this structure.

I am considering a different approach for attachments to our current setup, right now we have 3 tables involved in any work with attachments and I am considering removing moving them into the actual document records, this removes the optimisation of a single attachment being used in multiple documents being deduplicated, however that is an optimisation for a almost an antipattern and it severely complicates our implementation, we could do something like

{
  id: 'mydoc',
  rev: '3-xxx',
  rev_tree: [{....],
  revs: { 
    '3-xxx': { 
      data: {foo: 'bar'}
    }
  }
  attachments: { 
    'foo.txt': { }
  }
}

nolanlawson · 2016-04-15T17:41:45Z

Yep, I agree with that approach for attachments. It's a premature optimization and severely complicates the code. If somebody wants to avoid excessive attachment storage, then they should use compaction. Also this brings us in line with CouchDB's implementation, which does not dedupe.

OTOH you will need to change many tests in test.attachments.js to accommodate this. Probably the best approach is to just remove those tests (which rely on testing md5sums in order to work), since those tests are only for local databases anyway, since CouchDB does not support this dedup feature.

daleharvey · 2016-05-07T22:37:58Z

ok ran up against my first issue, https://github.com/pouchdb/pouchdb/blob/master/src/adapter.js#L671 only sends enough information for the adapter to read seperately stored digests, not read binaries stored within a document. Will redo the arguments in a way thats compatible for both methods

daleharvey · 2016-05-09T12:13:37Z

Attachments issue is fixed in the latest version and attachments are about half done, this and compaction then we are good to do some comparisons :)

daleharvey · 2016-05-10T16:26:16Z

passes: 1557 failures: 0 duration: 322.94s :)

Few notes to remind myself:

~~I am skipping the adapter specific attachment tests, without these we dont have tests that ensure attachments dont get orphaned, going to look into that~~
Skipping one test since this adapter doesnt serialise / deserialise data coming in or out - https://github.com/pouchdb/pouchdb/blob/master/tests/integration/test.basics.js#L682, not certain if we want to keep that behaviour
Other skipped test is because we test implementation details of local doc revision handling - https://github.com/pouchdb/pouchdb/blob/master/tests/integration/test.local_docs.js#L83, I may switch to generate the same revisions
Last skipped test is just a sanity check - https://github.com/pouchdb/pouchdb/blob/master/tests/integration/browser.info.js#L19
~~I have lost the optimizations on local documents revision handling, I may readd them~~
Not certain we have tests ensuring local documents arent skipped in changes
Not certain our persist update_seq test is working correctly, I am not currently storing the seq and all tests passing
Missing a fairly large optimisation, I would like auto_compaction to become true by default, with this architecture it should be almost free to see that auto_compaction is enabled and run it on the document before writing back to the db
~~Missing doc_count optimisation~~

Currently the performance tests are showing this as being slower which is super confusing, its doing far far less, its possible i lost some optimisations, fairly confident either something is up with my testing or there is something big missing in the code, looking at what the code did before vs what it does now I am still confident this will be significantly faster, also for size comparison its 269K compared to master which is 356K

daleharvey · 2016-05-17T15:26:56Z

So not on a lot of peoples radad but this is now passing 100% in chrome + fx so taking a look at what native support for pouchdb-find would look like.

In an ideal world we are compatible with current pouchdb find, which looks like

db.createIndex({
  index: {fields: ['name']}
}).then(function () {
  return db.find({
    selector: {name: {$gt: null}},
    sort: ['name']
  });
});

User defines an index which operates on existing data at runtime then has the ability to use that index for queries. The issue here is that index creation in indexedDB is done at schema creation time and requires version update to the schema which we currently use in pouch to handle data format migrations.

Supporting that will require us closing idb on index creation and doing a schema upgrade (in the cases we need one), figuring it all out is hurting my head so right now I want to focus on the more basic case of predefined indexes and if we get that working, figuring out if we can / want to backport that to get runtime definition support (not 100% sure we will)

So by predefined indexes I mean code that looks like

var db = new PouchDB('data', {indexes: ['name']});
db.find({
    selector: {name: {$gt: null}},
    sort: ['name']
});

We can store some version / index information in the database, however the issue is that we need to know the version of the database we are opening before we open it in order to trigger schema upgrades. Right now the only proposal I can think of is

Proposal 1. Multiple databases

Right now (in idb-next) we use 2 core objectStores, DOC_STORE and META_STORE, META_STORE contains some persisted information about the database (mostly its update_seq), we can split META_STORE into another idb database that stores an IDB_VERSION as well as the current indexes, any time we see a difference in the indexes options, we bump the IDB_VERSION and create / delete the relevant indexes.

Cons:

Managing multiple database can be tricky, ensuring they get closed / error handling as well as ensuring they are updated in a consistent manner

daleharvey · 2016-05-18T14:37:28Z

Small update, this is passing 100% on firefox (so chrome and firefox are 100%), safari passes but intermitently, I havent found a consistent failure yet but on Safari 9.1.1 the test.basics pass like 50% of the time, on the Tech Preview they are far more stable which makes me fairly confident that we can have this working / using indexeddb in safari

UPDATE: once I let the full suite run, passes: 1540 failures: 13 duration: 1700.55s in Safari tech preview, all tests were timeouts on high level things / retry etc, so it will pass 100% in the tech preview for certain

nolanlawson · 2016-05-19T22:07:52Z

Awesome!! 🎉 Any chance the performance issues were sorted out too?

As for pouchdb-find, I'm still of the mindset that we can't really solve the problem of dynamic indexes by closing/reopening the database, because I suspect it will fail in multi-tab environments or worker/serviceworker environments (race conditions). I also imagine this will have wonky effects in Edge due to race conditions in opening/closing DBs.

I say we should punt on pouchdb-find/mapreduce for now and just continue with dependentDbs etc., for the time being anyway. Also once the monorepo is merged, we can ship this adapter as an optional adapter and hopefully start to get feedback from early adopters.

Also I'm not against the idea of multi-DB - it's a nice way to take advantage of parallellism, and I think inside of __pouchdb_ we can do whatever we want.

[skip ci]

daleharvey · 2016-05-19T23:01:54Z

Right now this is marginally faster which is still somewhat of a surprise, right now every document read is a single idb read, document write = 1 idb read, 1 idb write (apart from when adapter.js gets in the way). I expected this to be significantly faster, theres still a few optimisations lost that I will take a look at, but a little faster is nice, its still far less code, less buggy and significantly cleaner.

mapreduce just works since none of that is adapter specific, once monorepo is in I will get the changes outside the adapter stuff merged then we can see what merging this looks like, in the meantime I will work on pouchdb-find support with secondary indexes, in an ideal world we wouldnt need multiple databases, but I cant think of any way at the moment

daleharvey · 2016-05-25T00:03:39Z

Closed, got a new rebased PR and opened issues for discussion

daleharvey force-pushed the idb-next branch 9 times, most recently from a9e1ee2 to 6392f32 Compare March 21, 2016 19:38

daleharvey force-pushed the idb-next branch 5 times, most recently from f287a91 to b7fffee Compare April 8, 2016 15:07

daleharvey force-pushed the idb-next branch from b7fffee to 87f6889 Compare April 9, 2016 18:16

daleharvey mentioned this pull request Apr 22, 2016

Big amibitous plan for native secondary indexes and purge() #3775

Closed

daleharvey force-pushed the idb-next branch from 87f6889 to 14244f7 Compare May 7, 2016 22:21

daleharvey force-pushed the idb-next branch from 14244f7 to a2a6184 Compare May 9, 2016 12:09

daleharvey force-pushed the idb-next branch 6 times, most recently from 777a7d3 to 1322a8f Compare May 10, 2016 16:04

daleharvey force-pushed the idb-next branch from 1322a8f to 91bb694 Compare May 10, 2016 16:20

daleharvey force-pushed the idb-next branch 2 times, most recently from e495ad2 to 1f82351 Compare May 17, 2016 13:37

daleharvey force-pushed the idb-next branch from 1f82351 to 772fb9c Compare May 18, 2016 12:16

daleharvey force-pushed the idb-next branch 2 times, most recently from 06a893f to d084368 Compare May 19, 2016 15:06

WIP - Version of PouchDB with different storage format

e0f6206

[skip ci]

daleharvey force-pushed the idb-next branch from d084368 to e0f6206 Compare May 19, 2016 22:39

NickColley mentioned this pull request May 23, 2016

Feature Request - Method to Purge a document #802

Closed

daleharvey closed this May 25, 2016

daleharvey deleted the idb-next branch October 3, 2016 13:20

SCdF mentioned this pull request Nov 4, 2017

Implement pouchdb-find for idb-next #6543

Closed

snyk-bot mentioned this pull request Jul 16, 2021

[Snyk] Upgrade uglify-js from 2.8.21 to 3.13.9 ekmixon/pouchdb#7

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WIP - Version of PouchDB with different storage format #4984

WIP - Version of PouchDB with different storage format #4984

Uh oh!

daleharvey commented Mar 20, 2016

Uh oh!

nolanlawson commented Mar 22, 2016

Uh oh!

daleharvey commented Mar 22, 2016

Uh oh!

daleharvey commented Apr 14, 2016

Uh oh!

nolanlawson commented Apr 15, 2016

Uh oh!

daleharvey commented May 7, 2016

Uh oh!

daleharvey commented May 9, 2016

Uh oh!

daleharvey commented May 10, 2016 •

edited

Loading

Uh oh!

daleharvey commented May 17, 2016 •

edited

Loading

Uh oh!

daleharvey commented May 18, 2016 •

edited

Loading

Uh oh!

nolanlawson commented May 19, 2016

Uh oh!

daleharvey commented May 19, 2016

Uh oh!

daleharvey commented May 25, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

WIP - Version of PouchDB with different storage format #4984

WIP - Version of PouchDB with different storage format #4984

Uh oh!

Conversation

daleharvey commented Mar 20, 2016

Uh oh!

nolanlawson commented Mar 22, 2016

Uh oh!

daleharvey commented Mar 22, 2016

Uh oh!

daleharvey commented Apr 14, 2016

Uh oh!

nolanlawson commented Apr 15, 2016

Uh oh!

daleharvey commented May 7, 2016

Uh oh!

daleharvey commented May 9, 2016

Uh oh!

daleharvey commented May 10, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

daleharvey commented May 17, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Proposal 1. Multiple databases

Uh oh!

daleharvey commented May 18, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nolanlawson commented May 19, 2016

Uh oh!

daleharvey commented May 19, 2016

Uh oh!

daleharvey commented May 25, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

daleharvey commented May 10, 2016 •

edited

Loading

daleharvey commented May 17, 2016 •

edited

Loading

daleharvey commented May 18, 2016 •

edited

Loading