Lisa Source Code: Clascal Evolution

Here’s another interesting thing I’ve learned about Clascal and Object Pascal: It went through exactly the same evolution from combining object allocation & initialization to separating them that Objective-C did a decade later!

In early 1983 Clascal, classes were expected to implement a New method as a function returning that type, taking zero or more parameters, and returning an instance of that type by assigning to SELF—sound familiar? This was always implemented as a “standard” method (one without dynamic dispatch) so you couldn’t call the wrong one. A cited advantage of this is that it would prevent use of the standard Pascal built-in New() within methods—which I suspect turned out not to be what people wanted, since it would prevent interoperability.

A class could also choose to implement an OVERRIDE of the also-included Free method to release any resources it had acquired, like file handles or other object instances. And each overridden Free method had to include SUPERSELF.Free; after it did so in order to ensure that its superclass would also release any resources it had acquired.

INTERFACE

  TYPE

    Object = SUBCLASS OF NIL
      FUNCTION New: Object; STANDARD;
      PROCEDURE Free; DEFAULT;
    END;

    Person = SUBCLASS OF Object
      name: String;
      FUNCTION New(name: String): Person; STANDARD;
      PROCEDURE Free; OVERRIDE;
    END;

  VAR

    gHeap: Heap;

IMPLEMENTATION

  METHODS OF Object;

    FUNCTION New{: Object}
    BEGIN
      SELF := Object(HAllocate(gHeap, Size(THISCLASS)));
    END;

    PROCEDURE Free
    BEGIN
      HFree(Handle(SELF));
    END;

  END;

  METHODS OF Person;

    FUNCTION New{(theName: String): Person;}
    BEGIN
      SELF := Person(HAllocate(gHeap, Size(THISCLASS)));
      IF SELF <> NIL THEN
        name := theName.Clone;
    END;

    PROCEDURE Free
    BEGIN
      name.Free;
      SUPERSELF.Free;
    END;
  END;

By mid-1984, Clascal changed this to the CREATE method, which was declared as ABSTRACT in the base class. Note that it still doesn’t use the standard Pascal built-in New() to create object instances. However, it takes a potentially-already-initialized object so that it’s easier for a subclass to call through to its superclass for initialization, since CREATE is still not a dynamically-dispatched method. Also, instead of referencing a global variable for a heap zone in which to perform allocation, it takes the heap zone, providing some amount of locality-of-reference that may be helpful to the VM system.

There was also a change in style to prefix class names with T.

INTERFACE

  TYPE

    TObject = SUBCLASS OF NIL
      FUNCTION CREATE(object: TObject; heap: THeap): TObject; ABSTRACT;
      PROCEDURE Free; DEFAULT;
    END;

    TPerson = SUBCLASS OF TObject
      name: TString;
      FUNCTION CREATE(theName: TString; object: TObject; heap: THeap): TPerson; STANDARD;
      PROCEDURE Free; OVERRIDE;
    END;

IMPLEMENTATION

  METHODS OF TObject;

    PROCEDURE Free
    BEGIN
      FreeObject(SELF);
    END;

  END;

  METHODS OF TPerson;

    FUNCTION CREATE{(theName: TString; object: TObject; heap: THeap): TPerson;}
    BEGIN
      IF object = NIL
        object := NewObject(heap, THISCLASS);
      SELF := TPerson(object);
      WITH SELF DO
        name := theName.Clone(heap);
    END;

    PROCEDURE Free
    BEGIN
      name.Free;
      SUPERSELF.Free;
    END;
  END;

This is starting to look even more familiar to Objective-C developers, isn’t it?

The final form of the language, Object Pascal, actually backed off on the Smalltalk terminology a little bit and renamed “classes” to “objects” and went so far as to introduce an OBJECT keyword used for defining a class. It also changed SUPERSELF. to INHERITED—yes, with whitespace instead of a dot!—as, again, developers new to OOP found “superclass” confusing.

Object Pascal also, at long last, adopted the standard Pascal built-in New() to perform object allocation (along with its counterpart Free() for deallocation) directly instead of introducing a separate function for it, since the intent can be inferred by the compiler from the type system. It also removed the need to use the METHODS OF construct to add methods, instead just prefixing the method with the class name and a period.

The final major change from Clascal to Object Pascal is that, with New() used for object allocation, the CREATE methods were changed into initialization methods instead since they just initialize the object after its allocation. They were also made procedures rather than functions returning values, and since the standard Pascal built-in New() is being used they no longer take a potentially-already-allocated object nor do they take a heap zone in which to perform the allocation. The convention is that for a class TFoo the initialization method has the form IFoo.

There was also another stylistic change, prepending field names with f to make them easy to distinguish from zero-argument function methods at a glance.

There was also a switch from not including the parameter list in the IMPLEMENTATION section to including it directly instead of in a comment.

Here’s what that looks like:

INTERFACE

  TYPE

    TObject = OBJECT
      PROCEDURE IObject; ABSTRACT;
      PROCEDURE Free; DEFAULT;
    END;

    TPerson = OBJECT(TObject)
      fName: TString;
      PROCEDURE IPerson(theName: TString); STANDARD;
      PROCEDURE Free; OVERRIDE;
    END;

IMPLEMENTATION

    PROCEDURE TObject.Free
    BEGIN
      Free(SELF);
    END;

    PROCEDURE TPerson.IPerson(theName: TString)
    BEGIN
      fName := theName.Clone();
    END;

    PROCEDURE TPerson.Free
    BEGIN
      fName.Free;
      INHERITED Free;
    END;

Based on the documentation I’ve read, it wouldn’t surprise me if the only reason initialization methods aren’t consistently named Initialize is that the language design didn’t support an OVERRIDE of a method using a different parameter list.

Lisa Source Code: Understanding Clascal

On January 19, Apple and the Computer History Museum released the source code to the Lisa Office System 7/7 version 3.1, including both the complete Office System application suite and the Lisa operating system. (The main components not released were the Workshop environment and its tooling, including the Edit application and the Pascal, COBOL, BASIC, and C compilers and the assembler.) Curious people have started to dig into what’s needed to understand and build it, and I thought I’d share some of what I’ve learned over the past few decades as a Lisa owner and enthusiast.

While Lisa appears to have an underlying procedural API similar to that of the Macintosh Toolbox, the Office System applications were primarily written in the Clascal language—an object-oriented dialect of Pascal designed by Apple with Niklaus Wirth—using the Lisa Application ToolKit so they could share as much code as possible between all of them. This framework is the forerunner of most modern frameworks, including MacApp and the NeXT frameworks, which in turn were huge influences on the Java and .NET frameworks.

One of the interesting things about Clascal is that it doesn’t add much to the Pascal dialect Apple was using at the time: Pascal was originally designed by Wirth to be a teaching language and several constructs useful for systems programming were left out, but soon added back by people who saw Pascal as a nice, straightforward, compact language with simple semantics that’s straightforward to compile. While in the 1990s there was a bitter war fought between the Pascal and C communities for microcomputer development, practically speaking the popular Pascal dialects and C are almost entirely isomorphic; there’s almost nothing in C that’s not similarly simple to express in Pascal, and vice versa.

So beyond standard Pascal, Apple Pascal had a concept of “units” for promoting code modularity: Instead of having to cram an entire program in one file, you could break it up into composable units that specify their “interface” separately from their “implementation.” Sound familiar?

When creating a unit under this model, both the interface and the implementation can go in a single file, but in separate sections. So let’s say you want to create a unit that makes some simple types available along with procedures and functions to operate on them. (In code examples, I’m putting keywords in uppercase since Pascal was historically case-insensitive and it helps to make clear the distinction between language constructs and developer code.)

UNIT Geometry;

INTERFACE

  TYPE
    Point  = RECORD
               h, v: INTEGER;
             END;

  VAR
    ZeroPoint: Point;

  PROCEDURE InitGeometry;
  PROCEDURE SetPoint(var p: Point; h, v: INTEGER);
  FUNCTION EqualPoints(a, b: Point): BOOLEAN;

IMPLEMENTATION

  PROCEDURE InitGeometry
  BEGIN
    SetPoint(ZeroPoint, 0, 0);
  END;

  PROCEDURE SetPoint
  BEGIN
    p.h = h;
    p.v = v;
  END;

  FUNCTION EqualPoints
  BEGIN
    IF a.h = b.h AND a.v = b.v THEN BEGIN
      EqualPoints := TRUE;
    ELSE BEGIN
      EqualPoints := FALSE;
    END
  END;

END.

Reading through this code, what’s the first thing you notice? While InitGeometry would typically be written without parentheses, as is normal for a zero-argument procedure or function in Pascal, functions and procedures that do take arguments and return values are also written without parameter lists but only in the IMPLEMENTATION section.

This is why, in a lot of the Lisa codebase, they would actually be written like this:

  FUNCTION EqualPoints{(a, b: Point): BOOLEAN}
  BEGIN
    IF a.h = b.h AND a.v = b.v THEN BEGIN
      EqualPoints := TRUE;
    ELSE BEGIN
      EqualPoints := FALSE;
    END
  END;

This is because, despite being “wordy,” Pascal also typically tries to minimize repetition and risk of error. So since you’ve already specified the INTERFACE why specify it again, and potentially get it wrong?

What’s interesting about Clascal is that it does the same thing! You define a class and its methods as an interface, and then its implementation doesn’t require repetition. This may sound convenient but in the end it means you don’t see the argument lists and return types at definition sites, so everyone wound up just copying & pasting them into comments next to the definition!

A couple of other things that are interesting about Clascal is that it sticks closer to Smalltalk terminology than most modern systems other than Objective-C (and, marginally, Swift): Instead of this it has SELF and instead of “member functions” it has “methods,” as PARC intended. This makes perfect sense as a bunch of the people who created and used Clascal came from PARC.

So to define a class, you simply use SUBCLASS OF SuperclassName in a TYPE definition section, provide your instance variables as if they were part of a RECORD, and declare its methods using almost-normal PROCEDURE and FUNCTION declarations (not definitions!) that require an OVERRIDE keyword to indicate a subclass override of a superclass method.

So the above code would look like this adapted to Clascal style:

UNIT Geometry;

INTERFACE

  TYPE
    TPoint = SUBCLASS OF TObject
               h, v: INTEGER;
               FUNCTION CREATE(object: TObject, heap: THeap): TPoint;
               PROCEDURE Set(h, v: INTEGER);
               FUNCTION Equals(point: TPoint): BOOLEAN;
             END;

IMPLEMENTATION

  METHODS OF TPoint;

    FUNCTION TPoint.CREATE{(object: TObject, heap: THeap): TPoint};
    BEGIN
      { Create a new object in the heap of this class, if not
        initializing an instance of a subclass. }
      IF object = NIL THEN
        object := NewObject(heap, THISCLASS);
      SELF := TPoint(TObject.CREATE(object, heap));
    END;
    PROCEDURE TPoint.Set{(h, v: INTEGER)};
      SELF.h := h;
      SELF.v := v;
    END;
    FUNCTION TPoint.Equals{(point: TPoint): BOOLEAN};
      Equals := a.h = b.h AND a.v = b.v;
    END;
  END;

END.

In addition to SELF there’s of course SUPERSELF to send messages to your superclass instead. And messages are sent via dot notation, e.g. myPoint.Set(10,20); to send Set to an instance of TPoint. It’s just about the most minimal possible object-oriented addition to Pascal, with one exception: It takes advantage of Lisa’s heap.

Just like Macintosh, Lisa has a Memory Manager whose heap is largely organized in terms of relocatable blocks referenced by handles rather than fixed blocks referenced by pointers. Thus normally in Pascal one would write SELF^^.h := h; to dereference the SELF handle and pointer when accessing the object. However, since Clascal knows SELF and myPoint and so on are objects, it just assumes the dereference—making it hard to get wrong. What I find interesting is that, unlike the Memory Manager on Macintosh, I’ve not seen any references to locking handles so they don’t move during operations. However, since there isn’t any saving and passing around of partially dereferenced handles most of the time, I suspect it isn’t actually necessary!

Honestly, as late-1970s languages go, it isn’t so bad at all. It wouldn’t even be all that difficult for the editor to show this information inline anyway, it’s the sort of thing that can be done fairly easily even in static language development environments from the 1970s.

Rebutting Big Nerd Ranch on Objective-C 2.0 dot notation

The Big Nerd Ranch weblog has a new post about Objective-C 2.0 dot notation. They advocate never using it and they’re completely wrong.

Given my reaction on Twitter, several people have asked me to write a more in-depth rebuttal.

I’ve already addressed when and why you should use Objective-C 2.0 properties and dot notation in an earlier post, so I won’t go into that here. I’ll just repeat my response to their weblog.

Here’s what I wrote in response:

> I disagree most emphatically. The whole point of dot notation is that, when combined with properties, it’s not *just* an alternative syntax for invoking methods. In fact, if that’s how you think about dot syntax, STOP. That’s not what it’s for at all.
>
> What dot syntax and property declarations are for is separating object *state* from object *behavior*. Classical OOP only really defines objects as only exposing behavior but the past 30+ years have demonstrated rather aptly that objects consist of both. C# was actually pioneering in this; its concept of properties is rather similar to what the combination of property declarations and dot syntax enable in Objective-C.
>
> To write idiomatic Objective-C 2.0 you should use `@property` to declare properties, and use dot syntax to access them. Period. Doing otherwise is a bad idea because it will create code that isn’t intention-revealing to other experienced Objective-C 2.0 developers. Teaching students to do otherwise is doing them a disservice, because you’re directly contradicting those responsible for the language and its evolution.

In short, Objective-C 2.0 has properties and dot notation as another way of expressing intent in your code. Use them for that, don’t refuse to use them just because they weren’t in earlier versions of the language, or because they require teaching another concept.

When to use properties & dot notation

I listened to a recent episode of the [cocoaFusion:][1] podcast about properties and dot notation today. There were a few interesting points brought up, but I felt a couple of the most important reasons to use `@property` declarations and dot notation weren’t addressed.

The biggest reason I see to use a different notation for both property declaration and property access than for method declaration and sending messages — even if property access ultimately results in a message send, as it does in Objective-C 2.0 — is **separation of state and behavior**.

In the ur-OOP days of Smalltalk, state was supposed to be *encapsulated* within objects *entirely*. This has become Smalltalk dogma and some developers have tried to propagate it to Objective-C: The idea that objects should *never* expose their state to each other, but instead only vend *behaviors*. It makes a certain amount of sense if you see objects purely as actors.

However, in today’s modern world we understand that objects aren’t *just* actors. Objects both “do things” *and* “represent things;” they’re nouns. And they have both *internal* state that they use in managing their behavior and *external* state they expose to the world.

You can use the same mechanism to do both, as Smalltalk does and as Objective-C did before 2007. However, it turns out that it can make a lot more sense to use different syntax to represent the distinct concepts, even if they’re handled by the same underlying mechanism.

For example, I might have a Person class and an NSViewController subclass that presents a Person instance in a view. The API to my Person class might look like this:

@interface Person : NSObject
@property (copy) NSString *name;
@property (copy) NSImage *picture;
@end

This sends a strong signal to the users of the class that `name` and `picture` are *external* state for a Person instance, leading me to write code like this:

Person *person = [[Person alloc] init];
person.name = @”Chris”;
person.picture = pictureOfChris;

The intent of this code is clear: I’m not asking `person` to do anything, I’m just manipulating its external state. That may cause it to do things as a side-effect, but at least in the code snippet I’m looking at, I’m just interested in changing some state.

Now let’s see what the NSViewController might look like:

@interface PersonViewController : NSViewController
@property (retain) Person *person;
– (BOOL)updatePicture;
@end

And let’s see how I’d use one, assuming it’s been instantiated and wired up in my view hierarchy already:

selectedPersonViewController.person = person;

if ([selectedPersonViewController updatePicture]) {
// note elsewhere that the person’s picture is updated
}

Even though the `-updatePicture` method has no arguments and returns a `BOOL` I (1) don’t make it a property and (2) don’t use dot notation to invoke it. Why not, since it fits the *form* of a property? Because it doesn’t fit the *intent* of a property. I’m actually telling the view controller to perform some action, not just changing some state, and I want the intent of that to be clear to a reader of the above code.

That brings me to the other major reason to both declare properties using `@property` and use dot notation to access them: Xcode. The code completion technology in Xcode tries hard to provide you with a good default completion and good choices in the completion list. But this can be very difficult, especially with the combination of Objective-C’s dynamic dispatch and the sheer size of the frameworks: There are hundreds (!) of methods declared in categories on NSObject as a result of categories describing informal protocols, and any of those methods could be valid on an arbitrary instance.

Dot notation, on the other hand, is **not** valid on arbitrary objects. Even though it compiles to exactly the same sort of `objc_msgSend` dynamic dispatch that bracket notation does, from a type system perspective it actually **requires** a declaration of the property being accessed to work correctly. The reason for this is that the name of a property is not necessarily the name of the message selector to use in dispatch; think of a property `hidden` whose underlying getter method is named `-isHidden`.

This means that tools like Xcode can provide a much more streamlined experience for properties using dot notation. If I write the above code in Xcode, rather than MarsEdit, Xcode will not offer completions like `-autorelease` and `-retainCount` when I type `person.` — it will only offer me the properties it knows are on Person.

Now, you *can* invoke arbitrary getters or setters (so long as they have a corresponding getter) using dot notation. The compiler doesn’t require an `@property` declaration for the use of dot notation, just a declaration. I could have declared the `Person.name` property like this:

@interface Person : NSObject
– (NSString *)name;
– (void)setName:(NSString *)value;
@end

The compiler will compile `person.name = @”Chris”;` just fine with this declaration, to an invocation of `-[Person setName:]` — but Xcode won’t offer you code completion for it if you don’t use `@property` to declare it. By not offering to complete non-`@property` properties for dot notation, Xcode avoids offering you *bad* completions like `autorelease` and `retain` that are technically allowed but are abominable style.

Of course, one complication is that many frameworks on Mac OS X predate `@property` declarations and thus don’t expose their state this way. That means Xcode won’t offer you completions for their state with dot notation until those frameworks do. Fortunately for iPhone developers, UIKit uses `@property` declarations pervasively. You should too.

To sum up:

* Use `@property` to declare the state exposed by your objects.
* Use dot notation to get and set objects’ state.
* Use method declarations to declare the behavior exposed by your objects.
* Use bracket notation to invoke objects’ behavior.

This results in intention-revealing code that clearly separates state and behavior both in declaration and use.

[1]: http://www.cocoafusion.net/ “cocoaFusion: podcast”

Steve Yegge describes what’s wrong with Lisp

Steve Yegge, Lisp is Not an Acceptable Lisp:

You’ve all read about the Road to Lisp. I was on it for a little over a year. It’s a great road, very enlightening, blah blah blah, but what they fail to mention is that Lisp isn’t the at the end of it. Lisp is just the last semi-civilized outpost you hit before it turns into a dirt road, one that leads into the godawful swamp most of us spend our programming careers slugging around in. I guarantee you there isn’t one single Lisp programmer out there who uses exclusively Lisp. Instead we spend our time hacking around its inadequacies, often in other languages.

Steve does a very good job of articulating a lot of the things I dislike about Lisp, especially Common Lisp. One interesting thing, though, is that a lot (but not all) of the issues he raises are addressed by Dylan.

One of the more interesting things about Dylan in this context is that, despite not adopting a traditional message-based object system like Smalltalk or Objective-C (or their more static cousins C++ and Java), Dylan does push objects all the way down, but in a functional style. It appears to work pretty well, making it easy to define both nouns and verbs in the combinations a developer might need, and even (through its hygienic macro system) allow developers to extend language syntax too.