Drafting with PouchDB + CouchDB

The problem

Entreprise asset management software users often run into issues when collaborating within a large set of data changes, especially when they are overlapping each other's work. Perfect synchronization is very tough and expensive, if not impossible. End up people losing their latest work edits, being puzzled by unexplained merged results or corrupting the data state.

The solution

We let user draft up all their changes (create, edit and delete) locally, and only push the changes to the remote database when the user has finalized with their draft.

These drafted changes will mix well with remote data, e.g. when you get a list of documents, it included those from remote, mixed with those in draft. Doing so ensure a consistent view when user navigate the application, seeing remote changes and local drafted changes together (with local changes took precedence if it is an update or delete operations).

All modifications will initially be stored locally as a draft, until the user decides it is time to push these changes into the remote database. To do this, the user must pull in all relevant documents from the remote database (DB) first and resolve any conflicts. Conflict happens if a new version of the document was pushed to the remote DB while user is modifying an older version of such document locally. After all conflicts are corrected, it is time to push.

There are currently two methods to do a push. The first method is easy, once pulled, after conflict resolution, user just push all the changes into the remote DB. For the second method, the user will push their draft online and share it with other co-workers or supervisors. Other users will be able to download this draft, review the draft locally in their machine, and ultimately approve and push this to remote DB like this first method.

Since it is a draft, it can be amended, discarded or shelved. Users always can restart their work, revert part of the changes, and compare their changes against the original. All these without disrupting the remote DB.


Drafter: How to draft with local PouchDB and remote CouchDB

alt-text Drafter and it’s operations

Terminology

DB - Database (PouchDB used in browser and CouchDB used at the backend)

Documents - Smallest unit of data stored in the database that can be read / written. eg. an address document may look like this:

{
    number: 35,
    street: Stirling Hwy,
    city: Crawley,
    state: WA,
    postcode: 6009
}

To read / change any one field (eg. number), the entire document has to be read / written.

Push / Pull - This is from the perspective of the initiator. For the drafter, push means to upload (push) data onto the remote DB and pull means to download (pull) data from the remote DB.

draft DB - keep track of current changes.

local DB - a copy of the remote DB for offline use and to act as a high speed cache for the remote DB.

remote DB - the master copy.

1 - Read/Find/Query

Instead of making a direct GET/FIND request to remote DB and use the result directly in the main application, we are going to get them from the localDB. Here's how it happens:

  1. Application initiates a read/find via Drafter.
  2. For Read:
    1. Drafter searches in the local DB for the document:
    2. Got it? Return it. End here.
    3. Doesn't have it? Replicate it from the remote DB.
  3. For Find:
    1. Drafter do the same find on remote DB, return only the document IDs.
    2. Drafter figure out how many of these documents are available in local DB.
    3. For documents that are missing in local DB, replicate them from remote DB.
  4. For Map/Reduced Query:
    1. Drafter perform the same query on remote DB, keep the result. Then…
    2. Drafter perform the same query on local DB, merge the remote result with the one from local. Regardless of duplicate.
    3. Return the merge result. It is up to the user of this API to deal with mixed results.
  5. Once replication is done, Drafter will perform the same read/find on local DB and return the result.

What about design document? Good question. Design document might be needed for Mango find, especially if you do sort in the find. And it is compulsory for map/reduce query. For this, we replicate the design document from remote DB into the local DB.

2 - Modification

All data modification, e.g. CREATE/UPDATE/DELETE, will be stored in the draft DB (another local PouchDB). Bear in mind that, for CouchDB, there's no operation. All edits are a new document. Newly created document is without "rev". Edited document is with a new "rev". Deleted document is with a new "rev" and a "deleted = true".

  1. Main application initiate create/update/delete via Drafter.
  2. For Create: Drafter put the new doc into draft DB.
  3. For Update/Delete:
    1. Drafter first find out whether there are already changes for the same document.
    2. If there are missing/new-to-draft documents: Drafter will replicate these documents from local DB to draft DB.
    3. Lastly, Drafter put the modified document into draft DB.
  4. When modified document has put into draft DB. Drafter will create/update a ChangeDoc. ChangeDoc is a separate document that records all the changes + revision for a particular document in draft DB. It is useful when dealing with listing changes, showing diff, revert changes and syncing with remote eventually.

3 - Fusing Draft and Remote DB

The immediate next step after storing the changes and the ChangeDocs, Drafter will do a full replication from draft DB into local DB. Now, local DB will have all the draft changes + anything replicated from the remote DB.

Remember that we insisted on doing the read/find operation on local DB all the time? The purpose is for all these read operation take into consideration of draft changes. With the powerful replication engine of CouchDB/PouchDB, the ability to merge the remote + local changes are rather painless. I mean, at this step, we are done!

Consistency First.

4 - Revert

Any changes in draft DB can be reverted, back to their original form, similar to the remote's version of it (or nothing, if the document is created). This is a cool operation, but a heavy one. We may need to optimize this in the future. Here's how Drafter does it now:

  1. Look for the document in draft DB. Revert was initiated with a ChangeDoc, not the document itself.
  2. Mark the document as "_deleted:true" and "_reverted:true".
  3. Delete the ChangeDoc.
  4. Save these changes into draft DB.
  5. Flush the draft DB:
    1. Filtered replicate all documents to a new staging PouchDB, except those marked with "_reverted:true".
    2. Delete the draft DB.
    3. Recreate draft DB.
    4. Replicate all documents from staging back to the newly created draft DB.
  6. Flush the local DB:
    1. Delete the local DB.
    2. Filtered replicate all documents from draft DB, except ChangeDoc.
    3. Replicate all Design Document from remote DB.
  7. The major reason we do flushing both DB after each revert is to totally get rid of the document. A "deleted" document still linger in the Database and will mess up future modification. There is no good way to purge in PouchDB.

Very hairy. Before this can be optimize, we should encourage user to bulk revert, instead of doing one by one revert.


Concepts

Data consistency

All data Drafter returning to the user must always represent the latest state of things:

  1. Changes in draft DB is priority number 1. Once a document is changed and registered in draft DB, we no longer care how is it in remote.
  2. Latest data from the remote DB is priority number 2. If there are latest changes in remote DB, we must always get them (provided they are not changed locally, a.k.a in draft DB.)
  3. Temporarily replicated data in local DB is least important. These data will be wiped as soon as there are remote changes or draft's changes.

To keep consistency, we fluat local DB in several condition. Flushing local DB means wipe it clean, essentially destroying it, and then do a free replication from draft DB. The second step is to ensure draft changes will always be present in local DB, even after each flush.

There are a few situations we will do a flush on local DB:

  • First boot, or browser refreshed.
  • Change occurred in remote DB. Whatever there are changes remotely, as long as we have temporary data in local DB, we wipe flush the local DB.

    The reason we wipe on change is to clean up potentially outdated temporarily data inside the local DB. And also to wipe the "find" cache and force Drafter to redo all "find" remotely first, because we do not know what has changed out there.

    Actually we do know what change, but at this stage, I can't risk fanciness for bug. So I opted for full wipe every time.

  • Revert draft changes. For every revert, we need to flush the localDB too.

    When we replicate changes from draft DB, these documents will live forever into localDB. If user performed a revert, meaning, undoing all changes in draft. The whole document in local DB needs to be purged. But we have no purge function in PouchDB, thus, we have to wipe it.

    Pushed draft changes to remote. When everything is live in the remote DB. Reset the local DB for new remote changes and wipe off all draft DB changes.

Performance Second

Since we are storing documents in the local DB, we better make use of it's close proximity and reduce remote calls when possible.

We are not doing live replication because we do not know beforehand what type of document a user is going to need. We only replicate documents on-demand, meaning, based on user requests, we replicate the bare-minimum necessary documents only.

We do listen to remote DB's changes. If there are no remote changes, great, we can safely hit the local DB, and skip calling remote, provided the request is the same and all documents are available in the local DB.


What’s Next

All features discussed in detail prior to this part are completed and used in our own development. The biggest missing part is the ability to pull changes from remote, merge them, and push them back to remote DB. Which our team will develop it very soon.


Special thanks to Kean, my teammate, who develops this technology with me. He also reviewed this post (fixing all my bad English) and provided the terminology!

Kuan

Kota Kinabalu, Malaysia
Email me

Usually types codes. A decade in games development, now trying to learn web.