Skip to content

Conversation

@daleharvey
Copy link
Contributor

Keeping this here to track progress (I have started this like 3 times and ended up losing my local branch).

The main idea here is to improve our storage format, our original format naively copied CouchDB's btree storage structure, CouchDB's structure is fine for Couch but not so great for us mostly because 1. Joins are slow and expensive, 2. Mistakes in compaction can easily lead to leaks.

The idea here is instead of having a document metadata table that points towards a document data table, we store the document data (including revisions) along with the metadata. Some of the possible benefits include:

  1. No joins, so everything is just faster in general
  2. Secondary indexes, with a single table format, if the user wants an index on doc.type, we can use underlying indexes for doc.data.type
  3. Its much harder to leak during compaction, and easier to fix our mistakes if we do.
  4. Various improvements over legacy, backwards compatibility with PouchDB and wider browser compat are not issues at this time, it will target latest firefox and chrome. There are some things we will be able to get right first time, like a stateless constructor some things will be less clear, but its a priority that this remain a very clean implementation.

This is a very very early prototype, right now I am going through the basics test suite and getting a test passing at a time, anyone is more than welcome to join in if they want :)

@daleharvey daleharvey force-pushed the idb-next branch 9 times, most recently from a9e1ee2 to 6392f32 Compare March 21, 2016 19:38
@nolanlawson
Copy link
Member

Thoughts:

  • This sounds great. We can have a new adapter that works flawlessly on Firefox/Chrome, then falls back to old idb/websql adapters for Safari/IE/oldAndroid. A separate pouchdb-migration plugin could also migrate user data.
  • However, a third adapter will balloon the core size. Puts more pressure on someone (me probably 😅) to get custom builds in finally.
  • This is a good time to just drop map/reduce from core entirely. It won't work with what you describe anyway (doc.type magically indexing on doc.data.type). We can't allow arbitrary user map() functions if we want to use built-in secondary indexes.
  • That said, it's highly optimistic to say that pouchdb-find/Mango will map 1-to-1 to IndexedDB secondary indexes. I'm sure there will be edge cases and we'll need to decide whether to support them or just note them as a difference between us and Couch.

I'm still wondering if there's a case to be made for another design for this adapter, which would work across IE/Safari/Firefox/Chrome by avoiding secondary indexes and using stringification for keys on a single objectSTore, but it's incumbent upon me to implement such a thing and compare the perf relative to your implementation, and frankly it's just unlikely I'll have the time to do so. Luckily we have a plugin architecture that allows for third-party adapters, so in the future we can always make these adapters pluggable.

@daleharvey
Copy link
Contributor Author

This is a good time to just drop map/reduce from core entirely. It won't work with what you
describe anyway (doc.type magically indexing on doc.data.type). We can't allow
arbitrary user map() functions if we want to use built-in secondary indexes.

So yeh, one of the benefits of map/reduce is that it only uses the core storage api so it will work against this and hopefully be faster but certainly wont be able to benefit from built in indexes, what you described was definitely along my thinking that this could work very well with mango selectors / without map reduce.

Going to defer most decisions about how to release until we see if this experiment works although I am fairly sure we wont want to add yet another adapter to core, its possible we could do concurrent releases and a migration plugin could work, but pretty much all depends on what this looks like when its finished, so will work on that (currently passes anything from basics and all_docs tests that dont use changes

@daleharvey daleharvey force-pushed the idb-next branch 5 times, most recently from f287a91 to b7fffee Compare April 8, 2016 15:07
@daleharvey
Copy link
Contributor Author

Just a heads up on the progress of this, on holiday at the moment so going a bit slower but made good progress beforehand, the basic api is complete, bulkDocs (put / post / delete), alldocs, get, changes are all working pretty much, I think the only major things still to do are attachments and compaction. Going to avoid testing performance differences until they are functionally identical but I am now fairly confident that its possible to have a working implementation in this structure.

I am considering a different approach for attachments to our current setup, right now we have 3 tables involved in any work with attachments and I am considering removing moving them into the actual document records, this removes the optimisation of a single attachment being used in multiple documents being deduplicated, however that is an optimisation for a almost an antipattern and it severely complicates our implementation, we could do something like

{
  id: 'mydoc',
  rev: '3-xxx',
  rev_tree: [{....],
  revs: { 
    '3-xxx': { 
      data: {foo: 'bar'}
    }
  }
  attachments: { 
    'foo.txt': { }
  }
}

@nolanlawson
Copy link
Member

Yep, I agree with that approach for attachments. It's a premature optimization and severely complicates the code. If somebody wants to avoid excessive attachment storage, then they should use compaction. Also this brings us in line with CouchDB's implementation, which does not dedupe.

OTOH you will need to change many tests in test.attachments.js to accommodate this. Probably the best approach is to just remove those tests (which rely on testing md5sums in order to work), since those tests are only for local databases anyway, since CouchDB does not support this dedup feature.

@daleharvey
Copy link
Contributor Author

ok ran up against my first issue, https://github.com/pouchdb/pouchdb/blob/master/src/adapter.js#L671 only sends enough information for the adapter to read seperately stored digests, not read binaries stored within a document. Will redo the arguments in a way thats compatible for both methods

@daleharvey
Copy link
Contributor Author

Attachments issue is fixed in the latest version and attachments are about half done, this and compaction then we are good to do some comparisons :)

@daleharvey daleharvey force-pushed the idb-next branch 6 times, most recently from 777a7d3 to 1322a8f Compare May 10, 2016 16:04
@daleharvey
Copy link
Contributor Author

daleharvey commented May 10, 2016

passes: 1557 failures: 0 duration: 322.94s :)

Few notes to remind myself:

Currently the performance tests are showing this as being slower which is super confusing, its doing far far less, its possible i lost some optimisations, fairly confident either something is up with my testing or there is something big missing in the code, looking at what the code did before vs what it does now I am still confident this will be significantly faster, also for size comparison its 269K compared to master which is 356K

@daleharvey daleharvey force-pushed the idb-next branch 2 times, most recently from e495ad2 to 1f82351 Compare May 17, 2016 13:37
@daleharvey
Copy link
Contributor Author

daleharvey commented May 17, 2016

So not on a lot of peoples radad but this is now passing 100% in chrome + fx so taking a look at what native support for pouchdb-find would look like.

In an ideal world we are compatible with current pouchdb find, which looks like

db.createIndex({
  index: {fields: ['name']}
}).then(function () {
  return db.find({
    selector: {name: {$gt: null}},
    sort: ['name']
  });
});

User defines an index which operates on existing data at runtime then has the ability to use that index for queries. The issue here is that index creation in indexedDB is done at schema creation time and requires version update to the schema which we currently use in pouch to handle data format migrations.

Supporting that will require us closing idb on index creation and doing a schema upgrade (in the cases we need one), figuring it all out is hurting my head so right now I want to focus on the more basic case of predefined indexes and if we get that working, figuring out if we can / want to backport that to get runtime definition support (not 100% sure we will)

So by predefined indexes I mean code that looks like

var db = new PouchDB('data', {indexes: ['name']});
db.find({
    selector: {name: {$gt: null}},
    sort: ['name']
});

We can store some version / index information in the database, however the issue is that we need to know the version of the database we are opening before we open it in order to trigger schema upgrades. Right now the only proposal I can think of is

Proposal 1. Multiple databases

Right now (in idb-next) we use 2 core objectStores, DOC_STORE and META_STORE, META_STORE contains some persisted information about the database (mostly its update_seq), we can split META_STORE into another idb database that stores an IDB_VERSION as well as the current indexes, any time we see a difference in the indexes options, we bump the IDB_VERSION and create / delete the relevant indexes.

Cons:

  • Managing multiple database can be tricky, ensuring they get closed / error handling as well as ensuring they are updated in a consistent manner

@daleharvey
Copy link
Contributor Author

daleharvey commented May 18, 2016

Small update, this is passing 100% on firefox (so chrome and firefox are 100%), safari passes but intermitently, I havent found a consistent failure yet but on Safari 9.1.1 the test.basics pass like 50% of the time, on the Tech Preview they are far more stable which makes me fairly confident that we can have this working / using indexeddb in safari

UPDATE: once I let the full suite run, passes: 1540 failures: 13 duration: 1700.55s in Safari tech preview, all tests were timeouts on high level things / retry etc, so it will pass 100% in the tech preview for certain

@daleharvey daleharvey force-pushed the idb-next branch 2 times, most recently from 06a893f to d084368 Compare May 19, 2016 15:06
@nolanlawson
Copy link
Member

Awesome!! 🎉 Any chance the performance issues were sorted out too?

As for pouchdb-find, I'm still of the mindset that we can't really solve the problem of dynamic indexes by closing/reopening the database, because I suspect it will fail in multi-tab environments or worker/serviceworker environments (race conditions). I also imagine this will have wonky effects in Edge due to race conditions in opening/closing DBs.

I say we should punt on pouchdb-find/mapreduce for now and just continue with dependentDbs etc., for the time being anyway. Also once the monorepo is merged, we can ship this adapter as an optional adapter and hopefully start to get feedback from early adopters.

Also I'm not against the idea of multi-DB - it's a nice way to take advantage of parallellism, and I think inside of __pouchdb_ we can do whatever we want.

@daleharvey
Copy link
Contributor Author

Right now this is marginally faster which is still somewhat of a surprise, right now every document read is a single idb read, document write = 1 idb read, 1 idb write (apart from when adapter.js gets in the way). I expected this to be significantly faster, theres still a few optimisations lost that I will take a look at, but a little faster is nice, its still far less code, less buggy and significantly cleaner.

mapreduce just works since none of that is adapter specific, once monorepo is in I will get the changes outside the adapter stuff merged then we can see what merging this looks like, in the meantime I will work on pouchdb-find support with secondary indexes, in an ideal world we wouldnt need multiple databases, but I cant think of any way at the moment

@daleharvey
Copy link
Contributor Author

Closed, got a new rebased PR and opened issues for discussion

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants