Next: Future Work Up: The Table of Previous: How it Works

What We Would Really Prefer

The table of contents format is optimized for minimizing space and for simplifying the interface as much as possible for the Archivist program, since we would prefer to keep the critical path for any tape migration as simple as possible. It is intended to provide the user with the bare minimum of information necessary to locate a file on tape without having to run through the entire stack of tapes to get to it. Unlike the TCFS format, the table of contents format is not designed to be extensible or portable. It is also not designed to be modified once it is written, since the delta encoding has no provisions for inserting and deleting entries.

Eventually it would be ideal to have all this information in a sophisticated database where complex queries are possible. This would give us the added ability to annotate the table of contents as it is being used. We would also have the ability to track the status of files and tapes, so we would have an effective means of producing new copies of the physical media as the copies in use deteriorate.

One of the primary uses of a database-like interface is to record the format of a file. We have software that can guess the format of a file, but that guess may be incorrect and a static record like the table of contents format would not easily be able to accommodate that type of change. Entries can also be annotated for content, by perhaps including an abstract for any on-line papers. If two similar files exist, the database could indicate they are identical or have a human description of the differences between the two.

The primary reason, however, for moving toward a database-like interface would be to provide a system in which users can make complex queries and annotate the entries with their own data in a controlled fashion. Since, the TCFS data set is going to remain relatively static, so it will be practical to invest the time to create such an interface without fear of the accumulated annotations becoming obsolete.

In the previous example, the Lab member searching for all files with the characters ``EMACS'' in their titles is quite limited by using just the table of contents. If we were to have the database above, he could limit the query, for instance, to files created on Thursdays and last modified in 1971. The database's tables and query engine can be optimized much more for speed, so he would not have to parse the entire table of contents. He could also use file types to differentiate between files that were EMACS startup files written in TECO and EMACS executable binary files. He could use a helpful annotation from a previous user indicating that some of the files are EMACS documentation. Our fictitious Lab member might even be altruistic and indicate that some of the files were hopelessly scrambled before they were ever converted to TCFS.

Next: Future Work Up: The Table of Previous: How it Works

boogles@martigny.ai.mit.edu