One of the primary goals of this project was to design a format for storing archival data that assumed as little knowledge as possible on the part of a future archaeologist reading the data. So, when encoding ASCII text files stored in 36-bit words, it is desirable for the ASCII text to still be readable as ASCII text in 8-bit bytes. This would make no further assumptions about what the reader knew.
Fortunately, electronic mail, program source code, and much of the system documentation on ITS was stored as ASCII text files, as it is on many Unix systems today. So, a great deal of the useful information in an ITS file system is just plain ASCII text.
To solve the problem of storing 36-bit words in 8-bit file systems, Dr. Bawden came up with the word file format. He designed it some time ago for the original problem of migrating ``important'' files between ITS and TOPS-20 machines, during the time when they coexisted in this building. The primary goal of the word file format is to preserve the readability of ASCII text. This translation is not one-way, so the 36-bit words can be reassembled by reversing the process. Therefore, we decided to continue using these word files for storing ITS data in our TCFS files.
This format preserves the readability of ASCII and enables us to reassemble the 36-bit words from the 8-bit bytes. However, the readability constraint prevents us from using the most trivial encoding of 36-bit words, which would split the word across 5 8-bit bytes and leave 4 bits empty. The word file format is similar to the trivial encoding, except that it converts 36-bit words that look like 5 7-bit ASCII characters to the appropriate 8-bit ASCII representation. As a result, it is more difficult to reverse-engineer a translator to reassemble the 36-bit words than in the trivial case. However, we have not complicated the encoding very much, but we have made it much simpler for files containing solely ASCII text.