TCFS-The Digital Rosetta Stone

The disadvantages of keeping old file data in its raw form, as discussed in the previous chapter, forced us to choose a more amenable format for this mass of data. The ideal archival format would be portable across media and platforms and would be easy to reverse-engineer. We first searched for any existing solution we could adapt to this problem.

The need for a data format that is platform- and medium-independent has been recognized in industry for some time, as demonstrated by the System Independent Data Format developed by the SIDF Association and Legato's NetWorker. To be fair to these vendors, their goal was not a long-term archival format. (One would be surprised at how easily supposedly short-term storage becomes long-term.) Recently, however, people have begun to focus on the issues of media obsolescence and technology preservation. We must evaluate our choice of format in the context of archival storage.

A file stored in one of the above mentioned formats is utterly indecipherable to a user not versed in the format specification. It is not valid to assume that future digital historians will have a copy of these specifications, or even be able to understand them if they do. Upon study, it becomes clear that the design of a data format for long-term archival purposes is very different from the format design for most other common applications. Consequently, we decided that the existing formats that we had examined were inadequate for our purposes, and started to develop our own format.