QuickTime as a Tape Archival Format

On the SIMH group, Al Kossow and others have been discussing how .tap is a terrible archival container format that also has a bunch of problems for use in emulation and simulation of systems. This is a problem I’ve been thinking about for a while since I hired Miëtek to implement SCSI tape support in MAME including the .tap format, and I had a sudden realization: There’s already a great format for representing sequential media, QuickTime!

A lot of people think QuickTime is a “video format,” but that’s not really accurate. Video and audio playback are applications atop the QuickTime container format; the container format itself is a means of representing multiple typed tracks of time-based media, each of which may have their own representation in the form of samples interpreted according to their own CODECs.

QuickTime Media Structure at a High Level

As an example, a QuickTime file containing a video with associated stereo audio and subtitles may have three tracks, each with their own timebase:

  1. The video track, whose timebase is the number of frames per second, and whose track media is the CODEC metadata needed to decode its samples.
  2. The audio track for the two audio channels, whose timebase is the number of samples per second. Its track media will be similar to that of the video, specifying the CODEC to use for the audio samples to decode.
  3. The text track for the subtitles, whose timebase is probably derived from the video timebase, whose track media will specify things like the language and font of the subtitles, and whose samples consist of the text to present and the size, location, duration, and styling for that presentation.

All of these are represented within a file as atoms which represent well-identified bags of data with arbitrary size and content, making it very easy to write general-purpose tooling and also to extend over time. (The last major extension to the low-level design was in the 1990s, to support 64-bit atom sizes, so it’s quite a stable format already.)

Mapping QuickTime to Data Tape

Once you realize that the tracks themselves can be arbitrary, it starts to become clear how this format maps nicely to tape content: Since tapes themselves are linear, they’re fundamentally time-based.

The actual content of a tape isn’t a pure stream of raw data, it’s a set of blocks of raw data between magnetic flux marks, with some gaps between — and thanks to media decay, those blocks can be good or bad. Usually these marks are used to organize tapes into files, but that’s not a guarantee; for both archival and emulation, it’s best to stick to the low-level representation and let applications impose the higher-level semantics.

In this case, you’d have a “tape data” track whose track media describes the original medium (7-track, 9-track, etc.) and the interpretation of its samples. The samples themselves would be the marks and data blocks. And there’s even a native representation of tape gaps, in the form of non-contiguous samples.

The format can also be leveraged to support random access including writes, since the intelligence for that can be in the “CODEC” for the “tape” track media, combined with the QuickTime format’s existing support for non-destructive edits. New data can be overlaid based on its “temporal” position, which should more or less accurately simulate how a rewritten tape would actually work, while still preserving the data that was just overwritten.

Finally, QuickTime has a concept of “references” that can be used to implement things like tape files independent of (rather than inline with) the tape data itself. A catalog of block references, for example, could also be stored with the tape data’s track media to indicate the block extents for individual files on tape, thus allowing direct access by tooling without having to stream through the entire file.

Implementation

Since QuickTime movie files are a moderately complex structure atop a simple base, it’s important to have a reasonable API to work with both the low-level atom structures as well as the higher-level constructs like tracks, track media, sample chunks and samples. Fortunately there already exists at least one Open Source library allowing this, QTFileLib from the Darwin Streaming Server that Apple made Open Source in 1999.

Darwin Streaming Server as a whole and its QTFileLib component are written in quite straightforward “C with Classes”-style C++, and QTFileLib has an API surface representing all of the major low-level and application-level concepts of the file format. As a side effect of the implementation of its read support, it also has a lot of the API necessary for creating and wiring together QuickTime data structures for creating files, just not support for writing it all out. Structurally that should be straightforward to add. It even looks straightforward to port to plain C, if that’s desired.

“Modernize”

In a vintage computing group, someone posted a picture of a terminal in use at a modern bookstore that’s still using the same infrastructure as they have for decades, and someone replied saying that while from a retrocomputing perspective it was cool, as a business they need to “modernize!” This was my reply…

It’s my understanding that a major US tire and oil change chain used HP 3000—Hewlett-Packard’s minicomputer and mainframe platform—for decades, right up until HP cancelled it out from under them, and only switched away from it due to the promised end of support. That is to say, they’d be using it now if HPe still supported it today.

My understanding is that their systems were built using native technologies on MPE, the HP mini/mainframe OS, like the IMAGE database, COBOL for business logic, and MPE’s native forms package. They went through a number of transitions from HP’s 16-bit mainframe architecture to 32-bit and then 64-bit PA-RISC, from using terminal concentrators in stores connected to a district mini over packet data to using a small mini at each store with store-and-forward via a modem to the regional mini (and on up) and finally to live connections over VPN via local ISPs, and from not having any direct customer access except by calling someone at a specific store to having customer access via the corporate web site.

So tell me, why should they have switched away if their hand wasn’t forced by HP? Keep in mind that they maintained and enhanced the same applications for decades to accommodate changes in technology, regulations, and expectations, and by all accounts everything was straightforward to use, fast, and worked well. What would be in it for the company and the people working in the shops to rewrite everything regularly for the platform du jour? I’ll grant that their development staff wasn’t padding their résumés with the latest webshit, but why would that actually matter?

Lisa Source Code: Clascal Evolution

Here’s another interesting thing I’ve learned about Clascal and Object Pascal: It went through exactly the same evolution from combining object allocation & initialization to separating them that Objective-C did a decade later!

In early 1983 Clascal, classes were expected to implement a New method as a function returning that type, taking zero or more parameters, and returning an instance of that type by assigning to SELF—sound familiar? This was always implemented as a “standard” method (one without dynamic dispatch) so you couldn’t call the wrong one. A cited advantage of this is that it would prevent use of the standard Pascal built-in New() within methods—which I suspect turned out not to be what people wanted, since it would prevent interoperability.

A class could also choose to implement an OVERRIDE of the also-included Free method to release any resources it had acquired, like file handles or other object instances. And each overridden Free method had to include SUPERSELF.Free; after it did so in order to ensure that its superclass would also release any resources it had acquired.

INTERFACE

  TYPE

    Object = SUBCLASS OF NIL
      FUNCTION New: Object; STANDARD;
      PROCEDURE Free; DEFAULT;
    END;

    Person = SUBCLASS OF Object
      name: String;
      FUNCTION New(name: String): Person; STANDARD;
      PROCEDURE Free; OVERRIDE;
    END;

  VAR

    gHeap: Heap;

IMPLEMENTATION

  METHODS OF Object;

    FUNCTION New{: Object}
    BEGIN
      SELF := Object(HAllocate(gHeap, Size(THISCLASS)));
    END;

    PROCEDURE Free
    BEGIN
      HFree(Handle(SELF));
    END;

  END;

  METHODS OF Person;

    FUNCTION New{(theName: String): Person;}
    BEGIN
      SELF := Person(HAllocate(gHeap, Size(THISCLASS)));
      IF SELF <> NIL THEN
        name := theName.Clone;
    END;

    PROCEDURE Free
    BEGIN
      name.Free;
      SUPERSELF.Free;
    END;
  END;

By mid-1984, Clascal changed this to the CREATE method, which was declared as ABSTRACT in the base class. Note that it still doesn’t use the standard Pascal built-in New() to create object instances. However, it takes a potentially-already-initialized object so that it’s easier for a subclass to call through to its superclass for initialization, since CREATE is still not a dynamically-dispatched method. Also, instead of referencing a global variable for a heap zone in which to perform allocation, it takes the heap zone, providing some amount of locality-of-reference that may be helpful to the VM system.

There was also a change in style to prefix class names with T.

INTERFACE

  TYPE

    TObject = SUBCLASS OF NIL
      FUNCTION CREATE(object: TObject; heap: THeap): TObject; ABSTRACT;
      PROCEDURE Free; DEFAULT;
    END;

    TPerson = SUBCLASS OF TObject
      name: TString;
      FUNCTION CREATE(theName: TString; object: TObject; heap: THeap): TPerson; STANDARD;
      PROCEDURE Free; OVERRIDE;
    END;

IMPLEMENTATION

  METHODS OF TObject;

    PROCEDURE Free
    BEGIN
      FreeObject(SELF);
    END;

  END;

  METHODS OF TPerson;

    FUNCTION CREATE{(theName: TString; object: TObject; heap: THeap): TPerson;}
    BEGIN
      IF object = NIL
        object := NewObject(heap, THISCLASS);
      SELF := TPerson(object);
      WITH SELF DO
        name := theName.Clone(heap);
    END;

    PROCEDURE Free
    BEGIN
      name.Free;
      SUPERSELF.Free;
    END;
  END;

This is starting to look even more familiar to Objective-C developers, isn’t it?

The final form of the language, Object Pascal, actually backed off on the Smalltalk terminology a little bit and renamed “classes” to “objects” and went so far as to introduce an OBJECT keyword used for defining a class. It also changed SUPERSELF. to INHERITED—yes, with whitespace instead of a dot!—as, again, developers new to OOP found “superclass” confusing.

Object Pascal also, at long last, adopted the standard Pascal built-in New() to perform object allocation (along with its counterpart Free() for deallocation) directly instead of introducing a separate function for it, since the intent can be inferred by the compiler from the type system. It also removed the need to use the METHODS OF construct to add methods, instead just prefixing the method with the class name and a period.

The final major change from Clascal to Object Pascal is that, with New() used for object allocation, the CREATE methods were changed into initialization methods instead since they just initialize the object after its allocation. They were also made procedures rather than functions returning values, and since the standard Pascal built-in New() is being used they no longer take a potentially-already-allocated object nor do they take a heap zone in which to perform the allocation. The convention is that for a class TFoo the initialization method has the form IFoo.

There was also another stylistic change, prepending field names with f to make them easy to distinguish from zero-argument function methods at a glance.

There was also a switch from not including the parameter list in the IMPLEMENTATION section to including it directly instead of in a comment.

Here’s what that looks like:

INTERFACE

  TYPE

    TObject = OBJECT
      PROCEDURE IObject; ABSTRACT;
      PROCEDURE Free; DEFAULT;
    END;

    TPerson = OBJECT(TObject)
      fName: TString;
      PROCEDURE IPerson(theName: TString); STANDARD;
      PROCEDURE Free; OVERRIDE;
    END;

IMPLEMENTATION

    PROCEDURE TObject.Free
    BEGIN
      Free(SELF);
    END;

    PROCEDURE TPerson.IPerson(theName: TString)
    BEGIN
      fName := theName.Clone();
    END;

    PROCEDURE TPerson.Free
    BEGIN
      fName.Free;
      INHERITED Free;
    END;

Based on the documentation I’ve read, it wouldn’t surprise me if the only reason initialization methods aren’t consistently named Initialize is that the language design didn’t support an OVERRIDE of a method using a different parameter list.

Lisa Source Code: Understanding Clascal

On January 19, Apple and the Computer History Museum released the source code to the Lisa Office System 7/7 version 3.1, including both the complete Office System application suite and the Lisa operating system. (The main components not released were the Workshop environment and its tooling, including the Edit application and the Pascal, COBOL, BASIC, and C compilers and the assembler.) Curious people have started to dig into what’s needed to understand and build it, and I thought I’d share some of what I’ve learned over the past few decades as a Lisa owner and enthusiast.

While Lisa appears to have an underlying procedural API similar to that of the Macintosh Toolbox, the Office System applications were primarily written in the Clascal language—an object-oriented dialect of Pascal designed by Apple with Niklaus Wirth—using the Lisa Application ToolKit so they could share as much code as possible between all of them. This framework is the forerunner of most modern frameworks, including MacApp and the NeXT frameworks, which in turn were huge influences on the Java and .NET frameworks.

One of the interesting things about Clascal is that it doesn’t add much to the Pascal dialect Apple was using at the time: Pascal was originally designed by Wirth to be a teaching language and several constructs useful for systems programming were left out, but soon added back by people who saw Pascal as a nice, straightforward, compact language with simple semantics that’s straightforward to compile. While in the 1990s there was a bitter war fought between the Pascal and C communities for microcomputer development, practically speaking the popular Pascal dialects and C are almost entirely isomorphic; there’s almost nothing in C that’s not similarly simple to express in Pascal, and vice versa.

So beyond standard Pascal, Apple Pascal had a concept of “units” for promoting code modularity: Instead of having to cram an entire program in one file, you could break it up into composable units that specify their “interface” separately from their “implementation.” Sound familiar?

When creating a unit under this model, both the interface and the implementation can go in a single file, but in separate sections. So let’s say you want to create a unit that makes some simple types available along with procedures and functions to operate on them. (In code examples, I’m putting keywords in uppercase since Pascal was historically case-insensitive and it helps to make clear the distinction between language constructs and developer code.)

UNIT Geometry;

INTERFACE

  TYPE
    Point  = RECORD
               h, v: INTEGER;
             END;

  VAR
    ZeroPoint: Point;

  PROCEDURE InitGeometry;
  PROCEDURE SetPoint(var p: Point; h, v: INTEGER);
  FUNCTION EqualPoints(a, b: Point): BOOLEAN;

IMPLEMENTATION

  PROCEDURE InitGeometry
  BEGIN
    SetPoint(ZeroPoint, 0, 0);
  END;

  PROCEDURE SetPoint
  BEGIN
    p.h = h;
    p.v = v;
  END;

  FUNCTION EqualPoints
  BEGIN
    IF a.h = b.h AND a.v = b.v THEN BEGIN
      EqualPoints := TRUE;
    ELSE BEGIN
      EqualPoints := FALSE;
    END
  END;

END.

Reading through this code, what’s the first thing you notice? While InitGeometry would typically be written without parentheses, as is normal for a zero-argument procedure or function in Pascal, functions and procedures that do take arguments and return values are also written without parameter lists but only in the IMPLEMENTATION section.

This is why, in a lot of the Lisa codebase, they would actually be written like this:

  FUNCTION EqualPoints{(a, b: Point): BOOLEAN}
  BEGIN
    IF a.h = b.h AND a.v = b.v THEN BEGIN
      EqualPoints := TRUE;
    ELSE BEGIN
      EqualPoints := FALSE;
    END
  END;

This is because, despite being “wordy,” Pascal also typically tries to minimize repetition and risk of error. So since you’ve already specified the INTERFACE why specify it again, and potentially get it wrong?

What’s interesting about Clascal is that it does the same thing! You define a class and its methods as an interface, and then its implementation doesn’t require repetition. This may sound convenient but in the end it means you don’t see the argument lists and return types at definition sites, so everyone wound up just copying & pasting them into comments next to the definition!

A couple of other things that are interesting about Clascal is that it sticks closer to Smalltalk terminology than most modern systems other than Objective-C (and, marginally, Swift): Instead of this it has SELF and instead of “member functions” it has “methods,” as PARC intended. This makes perfect sense as a bunch of the people who created and used Clascal came from PARC.

So to define a class, you simply use SUBCLASS OF SuperclassName in a TYPE definition section, provide your instance variables as if they were part of a RECORD, and declare its methods using almost-normal PROCEDURE and FUNCTION declarations (not definitions!) that require an OVERRIDE keyword to indicate a subclass override of a superclass method.

So the above code would look like this adapted to Clascal style:

UNIT Geometry;

INTERFACE

  TYPE
    TPoint = SUBCLASS OF TObject
               h, v: INTEGER;
               FUNCTION CREATE(object: TObject, heap: THeap): TPoint;
               PROCEDURE Set(h, v: INTEGER);
               FUNCTION Equals(point: TPoint): BOOLEAN;
             END;

IMPLEMENTATION

  METHODS OF TPoint;

    FUNCTION TPoint.CREATE{(object: TObject, heap: THeap): TPoint};
    BEGIN
      { Create a new object in the heap of this class, if not
        initializing an instance of a subclass. }
      IF object = NIL THEN
        object := NewObject(heap, THISCLASS);
      SELF := TPoint(TObject.CREATE(object, heap));
    END;
    PROCEDURE TPoint.Set{(h, v: INTEGER)};
      SELF.h := h;
      SELF.v := v;
    END;
    FUNCTION TPoint.Equals{(point: TPoint): BOOLEAN};
      Equals := a.h = b.h AND a.v = b.v;
    END;
  END;

END.

In addition to SELF there’s of course SUPERSELF to send messages to your superclass instead. And messages are sent via dot notation, e.g. myPoint.Set(10,20); to send Set to an instance of TPoint. It’s just about the most minimal possible object-oriented addition to Pascal, with one exception: It takes advantage of Lisa’s heap.

Just like Macintosh, Lisa has a Memory Manager whose heap is largely organized in terms of relocatable blocks referenced by handles rather than fixed blocks referenced by pointers. Thus normally in Pascal one would write SELF^^.h := h; to dereference the SELF handle and pointer when accessing the object. However, since Clascal knows SELF and myPoint and so on are objects, it just assumes the dereference—making it hard to get wrong. What I find interesting is that, unlike the Memory Manager on Macintosh, I’ve not seen any references to locking handles so they don’t move during operations. However, since there isn’t any saving and passing around of partially dereferenced handles most of the time, I suspect it isn’t actually necessary!

Honestly, as late-1970s languages go, it isn’t so bad at all. It wouldn’t even be all that difficult for the editor to show this information inline anyway, it’s the sort of thing that can be done fairly easily even in static language development environments from the 1970s.