Comments on Jeffrey’s Import Export Landscape
July 6th, 2004 at 3:07 am (4 years ago) by Andi Vajda under chandlerdbRecently, Jeffrey Harris wrote a paper on the Import Export Landscape and asked me to comment on it.
Jeffrey: Are 2, 3 and 4 equivalent? If not, what are the differences?My understanding is that 2 and 3 are equivalent but that 4 is different. 2, 3 are done in a repository-specific way whereas 4 should be done in a yet-to-be-established standard way. I think that this is one of the challenges and opportunities for 4 to be successful. If it can be made to work then non Chandler applications should have no problem accessing Chandler data exposed in such a way at least in read-only mode.
Jeffrey: What data beyond UUIDs and attributes of items are necessary to perfectly reconstruct the repository? Is there an intermediate level of accuracy that would omit version information or other details but would still be usable?
A special case of exporting an entire repository could be viewed as the replication feature planned for 0.4 or beyond. It is technically not exporting since the formats are the same but that is not exactly true since there can be several repository implementations with different formats, storing the same information of course, and replication should be able to operate without regards to the repository implementation decisions.
What data needs to be preserved: referential integrity and consistency. If the repository exported is not shared anywhere then preserving UUIDs is not necessary although probably simpler to implement anyway. Preserving data types of values, order of values in collections also come to mind. Omitting versions but the latest would be very useful indeed. It would purge the repository of its history, reclaim space and some speed. This is a feature on my todo list for sure.
Jeffrey: Now that namespaces are being used, perhaps items in certain namespaces could be exported differently than others? If we take this route, we also need to work on the schema evolution problem, i.e. what to do if I export a Contact v1.1 and you’re schema only defines Contact v1.0.
I think that in the case of 4 (above) the schema evolution problem is addressed at a different level (uppercase Versions as opposed to lowercase versions, uppercase Items vs lowercase items) and must be supported by the format chosen. For replication, which works with lowercase items and entire repositories, the schema evolution problem exists too and needs to be solved at that time as well. But these have different constrains I think, I don’t know exactly which differences yet. I suspect the uppercase Version schema evolution problem to be easier because it can be tighter controlled and can be worked on with domain knowledge, the lowercase version schema merging can not operate with domain knowledge.
Jeffrey: I believe that data shared from a remote repository is implemented locally as a separate repository. Will this be true for imported data? Is this relevant to how import is implemented?
Currently, remote items are cached locally in a separate repository. But that doesn’t mean they’ve been imported yet. Importing foreign items into a repository should mean for them to live in the same repository. I believe that it is upon such an import operation that the schema evolution operation needs to occur. In particular, it needs to be able to occur in a reversible way so that remote items can be written back to the original repository hence the need to schema evolve back and forth when importing items from the remote repository cache repository and writing them back there.








