Repository storage changes

November 15th, 2004 at 10:15 pm (3 years, 5 months ago) by Andi Vajda under chandlerdb

A year and a half ago, when the Chandler repository project was (re-)started, I chose to use Sleepycat’s Berkeley DB and DBXML for the persistence layer.

Why Berkeley DB ? Because it does only a few things, really well, and leaves the rest to the developer.
Why Berkeley DBXML ? Because I thought its XPath query facility was essential to supporting Chandler queries and because I thought that XML would be convenient as a format for serializing items.

It turns out that XPath queries are in very limited use in Chandler (for the Kind query only) and are unlikely to see their use increased. There are two reasons for this:

  • PyLucene, which should serve our full-text query needs.
  • The native XML format implemented by the repository is very simple in structure. Hence, it is pretty easy to maintain indexes on what would be otherwise serialized with DBXML directly in Berkeley DB.
It also turned out that for various performance reasons, I had to move certain things out of XML altogether - the namestore comes to mind. As I was moving more things out of XML, it finally occurred to me that, for Chandler’s needs, Berkeley DBXML was overkill and I decided to remove it.

By removing DBXML, I was hoping for speed improvements but didn’t encounter any. This showed me that, contrary to the general perception at OSAF, Berkeley DBXML per-se was not too slow in its current use. Still, by taking control of XML storage and by going directly to Berkeley DB, I opened up more opportunities for future performance improvements such as, for example: values are now stored as individual XML snippets, allowing me to only save those that changed from one item version to the next; some values may be better stored in a format other than XML altogether, etc.., etc..

Removing DBXML from the build also allowed me to remove Xerces-C++ and Pathan both required by DBXML. My build tree shrank by 300Mb on OS X (almost 400Mb on Windows). The download size of Chandler should be greatly improved as well.

Share and Enjoy: These icons link to social bookmarking sites where readers can share and discover new web pages.
  • Digg
  • Sphinn
  • del.icio.us
  • Facebook
  • Mixx
  • Google
  • Reddit

Leave a Reply