Wednesday, 23 January 2008

Why would I think about posterity? What has posterity ever done for me?

The first computer I did not buy was an Amstrad CPC back in the eighties. It was the absolutely cheapest one in the market at the time and included monitor, CPU and printer in one marvellous package. The reason I did not buy it was luck. I stood in the shop ready to pay for it, when it suddenly struck me to ask the shop keeper if it would be easy to transfer files from this PC to other ones. Sheepishly he had to admit that it was impossible. The Amstrad CPC used Hitachi's 3" floppy disk drive, which no one else was using. Whatever one typed on an Amstrad CPC had to be retyped by hand on other computers if one wanted to preserve the data.

My most trusty computer was a PowerBook 170. However, it became increasingly difficult to get any files from it, as it did not have USB or Ethernet or Wifi, and as most modern computers do not have diskette stations. It served me for about 10 years before I gave it up due to the compatibility problems.

The question of future compatibility is still surprisingly ignored in the world of high tech.

Hardware is rarely a problem any more, as most computers handle WiFi and USB memory sticks. When you buy a new PC or PDA or telephone, there is usually no big problems transferring files as much as you like.

But the problem with file format remains.

Apple bluntly tries to sell its iWork suite, in spite of the fact that only iWork applications can read iWork files - much like Amstrad did in the eighties. One can, admittedly export files from iWork in more readable formats, like PDF or Word document, but it is completely and utterly impossible to set a widely used file format as default.

This is nothing new of course. Many fine applications have used obscure file formats, which locked the users in - Mellel, egword and iWorks' predecessor AppleWorks, just to mention a few.

AppleWorks is not even fully backward compatible with itself. The last version was not able to write in the formats of the earlier versions. And as Apple no longer sells the program, there is no legal way for your sister to access that pile of old AppleWorks documents you have on your harddisk, unless you have a spare license of the program to give her.

Microsoft Office probably has the most used file formats in the world, but with Service Pack 3 of Microsoft Office 2003, they suddenly decided not to support old files any more. With some documented hacking of the registry, one can still activate access to the old "unsecure" file formats, but if you are not careful, you may disable the OS in the process.

So what is the best file format to choose, if you want to guarantee posterity a chance to read your text?

Word documents of version 2003 is probably still one of the best bets. There are so many Word documents out there, and so many free and open source programs that support it, that it is unlikely to become impossible to read any time soon.

RTF is probably a reasonably good bet, if you avoid pictures and if you only type in Western languages. RTF files created by Mac OS X with Chinese and Japanese may fail in some applications due to encoding problems. (Shy RTFD, which Apple seems to claim is a "variant" of RTF. No application on any Operating System but Mac OS can read them.)

The ISO approved ODF format is all well, but it still has not got enough momentum to tell whether it will last.

The only file format that is promoted to be used for long time archiving is PDF-A. It is approved by ISO for this purpose. However, not many applications are able to create PDF-A files out of the box, and it is difficult to guarantee that there will be applications that can read it in 50 years' time.

My take is that the best format for long term archival is simple unformatted text files. However, even with text, things are not that simple.

No comments: