QuickTime as a Tape Archival Format

On the SIMH group, Al Kossow and others have been discussing how .tap is a terrible archival container format that also has a bunch of problems for use in emulation and simulation of systems. This is a problem I’ve been thinking about for a while since I hired Miëtek to implement SCSI tape support in MAME including the .tap format, and I had a sudden realization: There’s already a great format for representing sequential media, QuickTime!

A lot of people think QuickTime is a “video format,” but that’s not really accurate. Video and audio playback are applications atop the QuickTime container format; the container format itself is a means of representing multiple typed tracks of time-based media, each of which may have their own representation in the form of samples interpreted according to their own CODECs.

QuickTime Media Structure at a High Level

As an example, a QuickTime file containing a video with associated stereo audio and subtitles may have three tracks, each with their own timebase:

  1. The video track, whose timebase is the number of frames per second, and whose track media is the CODEC metadata needed to decode its samples.
  2. The audio track for the two audio channels, whose timebase is the number of samples per second. Its track media will be similar to that of the video, specifying the CODEC to use for the audio samples to decode.
  3. The text track for the subtitles, whose timebase is probably derived from the video timebase, whose track media will specify things like the language and font of the subtitles, and whose samples consist of the text to present and the size, location, duration, and styling for that presentation.

All of these are represented within a file as atoms which represent well-identified bags of data with arbitrary size and content, making it very easy to write general-purpose tooling and also to extend over time. (The last major extension to the low-level design was in the 1990s, to support 64-bit atom sizes, so it’s quite a stable format already.)

Mapping QuickTime to Data Tape

Once you realize that the tracks themselves can be arbitrary, it starts to become clear how this format maps nicely to tape content: Since tapes themselves are linear, they’re fundamentally time-based.

The actual content of a tape isn’t a pure stream of raw data, it’s a set of blocks of raw data between magnetic flux marks, with some gaps between — and thanks to media decay, those blocks can be good or bad. Usually these marks are used to organize tapes into files, but that’s not a guarantee; for both archival and emulation, it’s best to stick to the low-level representation and let applications impose the higher-level semantics.

In this case, you’d have a “tape data” track whose track media describes the original medium (7-track, 9-track, etc.) and the interpretation of its samples. The samples themselves would be the marks and data blocks. And there’s even a native representation of tape gaps, in the form of non-contiguous samples.

The format can also be leveraged to support random access including writes, since the intelligence for that can be in the “CODEC” for the “tape” track media, combined with the QuickTime format’s existing support for non-destructive edits. New data can be overlaid based on its “temporal” position, which should more or less accurately simulate how a rewritten tape would actually work, while still preserving the data that was just overwritten.

Finally, QuickTime has a concept of “references” that can be used to implement things like tape files independent of (rather than inline with) the tape data itself. A catalog of block references, for example, could also be stored with the tape data’s track media to indicate the block extents for individual files on tape, thus allowing direct access by tooling without having to stream through the entire file.

Implementation

Since QuickTime movie files are a moderately complex structure atop a simple base, it’s important to have a reasonable API to work with both the low-level atom structures as well as the higher-level constructs like tracks, track media, sample chunks and samples. Fortunately there already exists at least one Open Source library allowing this, QTFileLib from the Darwin Streaming Server that Apple made Open Source in 1999.

Darwin Streaming Server as a whole and its QTFileLib component are written in quite straightforward “C with Classes”-style C++, and QTFileLib has an API surface representing all of the major low-level and application-level concepts of the file format. As a side effect of the implementation of its read support, it also has a lot of the API necessary for creating and wiring together QuickTime data structures for creating files, just not support for writing it all out. Structurally that should be straightforward to add. It even looks straightforward to port to plain C, if that’s desired.

Build LLVM and clang!

I’ve talked about the LLVM Compiler Infrastructure in the past, but what I haven’t talked about yet is just how easy and quickly you can build it on your own Mac running Leopard! This is a great way to get into hacking on compiler lexical analyzers and parsers, code generators, optimizers, and so on.

What’s more, you can build both LLVM and the new C front-end clang very easily and in five to ten minutes.

First, create a work area to check them out into, wherever you normally create your projects.

[~]% cd /Projects
[/Projects]% mkdir LLVM
[/Projects]% cd LLVM
[/Projects/LLVM]%

Then check out LLVM itself and clang from the LLVM Subversion repository.

[/Projects/LLVM]% svn checkout http://llvm.org/svn/llvm-project/llvm/trunk llvm
[/Projects/LLVM]% cd llvm/tools
[/Projects/LLVM/llvm/tools]% svn checkout http://llvm.org/svn/llvm-project/cfe/trunk clang
[/Projects/LLVM/llvm/tools]% cd ../..
[/Projects/LLVM]%

Then edit the PARALLEL_DIRS definition in llvm/tools/Makefile to tell it about clang. Just add clang onto the end, like this:

PARALLEL_DIRS := llvm-config  \
                 opt llvm-as llvm-dis \
                 llc llvm-ranlib llvm-ar llvm-nm \
                 llvm-ld llvm-prof llvm-link \
                 lli gccas gccld llvm-extract llvm-db \
                 bugpoint llvm-bcanalyzer llvm-stub llvmc2 \
                 clang

Now create a directory to build into, next to your llvm directory, and change into it.

[/Projects/LLVM]% mkdir build
[/Projects/LLVM]% cd build
[/Projects/LLVM/build]%

This is where you’ll actually run configure. This will ensure your source tree isn’t polluted with build products, and that everything stays self-contained while you hack.

[/Projects/LLVM/build]% ../llvm/configure --enable-targets=host-only
# lots of logging
[/Projects/LLVM/build]%

You’ll note that above I passed an argument to configure. This ensures that LLVM is only built to target the architecture I’m running on, to speed up the build process; this is generally fine for simple front-end development.

Now, to build LLVM as well as clang all I have to do is invoke make. LLVM is set up to correctly do parallel builds, so I’ll pass the number of CPUs I have in my machine via make -j 4.

[/Projects/LLVM/build]% make -j 4
# lots of logging
[/Projects/LLVM/build]%

That’s it! LLVM is now (hopefully) successfully built. All of the pieces are in the build directory under Debug/bin and Debug/lib and so on; see the LLVM web site for details about what the various components are.

LLVM Developers’ Meeting 2007-05

The LLVM Compiler Infrastructure is a great technology that came out of the computer science research community and can be used to develop extensible compiler platforms. Among other things, it provides a platform-independent assembly and object code (the “low level virtual machine” that its name is taken from), and a great object-oriented compilation, linking, optimization and code-generation infrastructure that you can use to efficiently target real hardware. The main idea is that LLVM provides a comprehensive back-end that you can easily build a front-end to target.

There’s a huge amount of material available on the LLVM web site, including the LLVM Assembly Language Reference Manual and LLVM Programmer’s Manual, a wide variety of papers on LLVM, and a great walkthrough of the creation of Stacker, a Forth front-end that targets LLVM. It shows how the budding language creator might leverage the tools available as part of the LLVM infrastructure. I fully expect that in time, “little languages” will have no excuse not to be JIT compiled simply because targeting LLVM is actually easier than writing your own interpreter or bytecode engine! Just walk your AST and generate naïve LLVM code for what you encounter, and let the infrastructure handle the rest. (For those who aren’t developer tools weenies, an Abstract Syntax Tree is the internal representation of a program’s structure before it’s turned into instructions to execute.)

A couple months back, the May 2007 LLVM Developers’ Meeting was held at Apple. The proceedings from this meeting — the actual session content, both in slides and in video form — are available online, and I’ve even created an LLVM Developers’ Meeting podcast (including a subscribe-directly-in-iTunes version) for easy viewing. The video may be low bit rate, but it has a 16:9 aspect ratio so you can even pretend it’s HD. (I put together the podcast primarily so I could watch the sessions on my Apple TV, since I couldn’t attend the meeting.)

So if you’re at all interested in compilers, language design or development, optimization, or development platforms in general, you’ll be very well-served by checking out LLVM. It is a seriously cool enabling technology.

Steve Yegge describes what’s wrong with Lisp

Steve Yegge, Lisp is Not an Acceptable Lisp:

You’ve all read about the Road to Lisp. I was on it for a little over a year. It’s a great road, very enlightening, blah blah blah, but what they fail to mention is that Lisp isn’t the at the end of it. Lisp is just the last semi-civilized outpost you hit before it turns into a dirt road, one that leads into the godawful swamp most of us spend our programming careers slugging around in. I guarantee you there isn’t one single Lisp programmer out there who uses exclusively Lisp. Instead we spend our time hacking around its inadequacies, often in other languages.

Steve does a very good job of articulating a lot of the things I dislike about Lisp, especially Common Lisp. One interesting thing, though, is that a lot (but not all) of the issues he raises are addressed by Dylan.

One of the more interesting things about Dylan in this context is that, despite not adopting a traditional message-based object system like Smalltalk or Objective-C (or their more static cousins C++ and Java), Dylan does push objects all the way down, but in a functional style. It appears to work pretty well, making it easy to define both nouns and verbs in the combinations a developer might need, and even (through its hygienic macro system) allow developers to extend language syntax too.

Xcode: Unit Testing

Xcode 2.1 introduced integrated unit testing to the Xcode IDE. Xcode includes two unit testing frameworks, target templates for setting up test bundle targets, and infrastructure for running unit tests every time you build your project and reporting their results in the Build Results window just like compilers and linkers do.

With Xcode unit testing, you group your test cases into test bundles which are built by separate targets. This means that you don’t need to build two versions of your software — one with tests and one without tests — and can run your tests against the actual software you want to deliver.

Xcode 2.1 and later include OCUnit for unit testing of Objective-C Cocoa software, and it includes a new framework called CPlusTest for unit testing of C++ software. Using either you should be able to test C code with relative ease. Corresponding target templates are included for creating unit test bundles appropriate for Cocoa and Carbon applications and frameworks, and file templates are included for creating OCUnit and CPlusTest test case classes.

The test bundle target templates have a shell script build phase at the very end that invokes /Developer/Tools/RunUnitTests. RunUnitTests looks at the build settings it’s passed via its environment and determines from that information how to run the tests in your test bundle.

If you’re testing a framework, RunUnitTests will run the appropriate test rig and tell it to load and run the tests in your bundle. Since your test bundle should link against your framework, your framework will be loaded when the test rig loads your bundle.

If you’re testing an application, you need to specify the application as the Test Host and Bundle Loader for your test bundle in its configuration’s build settings. The Bundle Loader setting tells the linker to link your bundle against the application that’s loading it as if the application were a framework, allowing you to refer to classes and other symbols within the application from your bundle without actually including them in the bundle. The Test Host setting tells RunUnitTests to launch the specified application and inject your test bundle into it in order to run its tests.

There’s even support in RunUnitTests for invoking a test rig of your own, rather than the test rig for one of the supplied frameworks. You just need to specify the Test Rig build setting for your test bundle; this should be the path to a tool to run. It will be passed the path to your test bundle as its first argument, and if it needs to generate failure information it can just generate it in a gcc/compiler-like format on stderr:

FailingTest.c: 10: error: (1 == 0) failed

This will cause it to show up in the Build Results window as an error, just like a compiler or linker error. You can see the RunUnitTests manpage for more information on the environment in which your test rig will be run.

There’s a great Unit Testing Guide on the Apple Developer Connection web site that has lots of information on getting started with Xcode unit testing. Check it out!

Platform Futures

On Windows, many developers seem to want to run as fast as possible away from Microsoft Visual C++ and embrace Microsoft’s C# and .NET platform for new development. Most Windows developers that I’ve seen seem downright enthusiastic about these technologies. It’s disconcerting; I’m not used to seeing Windows developers (or users) be enthusiastic about their platform.

On the Mac, many developers are trying to hold onto C++ and Carbon for as long as they can, even for new development. A new Mac developer on the Carbon list actually said he wished Apple had a C++ framework that used MFC-like “message maps” for Mac OS X-only Carbon development “to make it easier to build software fast!” (Paraphrased.) And Metrowerks is spending money & time building a next-generation C++ PowerPlant framework for Mac OS X-only Carbon development! And some developers keep on Apple’s case to try and maintain feature parity between Carbon and Cocoa.

Fortunately, Apple isn’t giving in to them as much as they might think. For instance, WebKit has a Carbon wrapper, but it’s just a wrapper; WebView is really a Cocoa framework and if you want to extend it you’re going to have to use Cocoa. The Cocoa Controller layer is only really possible to do with a rich dynamic runtime; it’ll never make it to Carbon. You can only build screen savers using Cocoa and Objective-C. You can only build preference panes using Cocoa and Objective-C. Virtually all new applications coming out of Apple are built using Cocoa and Objective-C.

(Keynote, SoundTrack, LiveType, iCal, iPhoto, iSync, iChat AV, Safari… Final Cut and Logic don’t count, since they ware originally developed for the traditional Mac OS and thus aren’t new. Neither does Shake, since it was originally developed for Irix and X11 — though it wouldn’t surprise me at all to see it rearchitected as a Cocoa application in the next couple of years.)

The future of development on Windows is C# and .NET. This has been clear since Microsoft first released .NET, and it’s especially clear in light of the latest PDC and Longhorn.

The future of development on the Mac is Objective-C and Cocoa. This has been clear ever since Apple bought NeXT, and it’s especially clear in light of the latest WWDC and Panther.

Deal with it.