The Problem We Are Trying to Solve

At MIT's Artificial Intelligence (AI) Lab and Laboratory for Computer Science (LCS), we have rooms full of 7-track and 9-track magnetic tapes in various states of decay. These tapes are the incremental, full, and archival dumps of all the machines used for everyday work by students, faculty, and staff in the Labsgif from 1971 to 1990. The data on any individual tape may or may not be valuable; we are unable to tell. Much of this data was written by operating systems no longer in common use, such as ITS, TOPS-20, and Genera. An air conditioner malfunction has allowed water to leak onto some of the tapes. Some of the protective outer tape rings have grown old and brittle. Many have already failed, causing the tape to fall off the tape racks and their cases to shatter. Many tapes are being stored off-site in an environmentally controlled warehouse, but some are completely un-cataloged. Others may be sitting lost in the lab's basement somewhere. Furthermore, we are losing the knowledge required to make sense of the data on the tapes, which could be in one of a dozen formats. Even perfectly preserved tapes may thus soon be reduced to gibberish. We need to pay attention to more than just the tapes themselves; we need to rescue data from the operators who wrote these tapes and from the paper dump logs. In summary, we are losing irreplaceable data every day.

Table: Inventory of Known Tapes with Valuable Archival Data
Operating SystemMachineDensity (bpi)FormatApproximate Year WrittenNumberSize
ITSDM KA-108009 track1976--83330--
ITSAI KA-108007 track1971--82362--
ITSML KA-108007 track1973--83203--
ITSMC KL-108007 track1976--85511--
ITSMC KL-1016009 track1981--8660--
ITSAll KS62509 track1985--90159--
ITS Total162577GB
Tops-20AI OZ16009 track1982--86397--
Tops-20AI OZ62509 track1986--88118--
Tops-20LCS XXmixed9 track1976--88423 (in storage)--
Tops-20 Total938132GB
Mixed/Unspecified16009 track--75730GB
Mixed/Unspecified62509 track--43663GB
Grand Total3,756302GB

The long-term goal of this project is to provide a usable system whereby users can easily search and access all the Labs' archival data in a uniform fashion, and to preserve this data in such a manner that it can be decoded even if all the supporting documentation for it is lost. The more immediate goal is to assemble the incremental, full, and archival backups from the Incompatible Timeshare System (ITS), record them on new media in a long-term (archival) format suitable for use well into the future, and provide tools to help search and manage this large data set in an effective manner. The entire collection of readable ITS backup tapes contains approximately 77 gigabytes of data, and constitutes our smallest collection of tapes. Table gif provides a brief breakdown of the tapes that have been accounted for and are believed to contain useful data. There are other tapes around the lab which are unlabeled, and there are many tapes known to be uninteresting ``scratch'' tapes.gif

Starting in 1968, ITS was the workhorse of Project MAC, which later divided into the Laboratory for Computer Science (LCS) and the Artificial Intelligence (AI) Lab. ITS ran on a two hardware platforms, the DEC PDP-6 and later the PDP-10; and was implemented entirely in MIDAS, an assembly language with macro facilities. One of the primary applications for ITS was MACSYMA, a symbolic and algebraic manipulation system developed under the direction of Prof. Joel Moses starting in 1969. The ITS machines were used for everything from writing computer programs to reading electronic mail to formatting dissertations and technical reports. Dr. Bawden has said that three common programs to be seen running on these systems were EMACS, MACSYMA, and the MACLISP dialect of LISP.

There has been a demonstrated need for some of the data on these tapes. We are not just trying to rescue data that will never be used. Here are some examples of requests that we have encountered:

