Learning the Lingo
This week I worked out some longstanding Lingo issues!
Duplicate Scripts
The first issue was duplicate scripts in Director 4 movies. Each cast member should have at most one Lingo script associated with it, but we were running into movies in which a cast member seemingly had several scripts. There was no obvious way to deal with this - redefining the script usually led to incorrect behavior, and so did keeping the original definition.
These duplicate scripts were rare in most movies, so the problem went ignored for a while, but in our recently added target Majestic Part 1: Alien Encounter, there were several hundred scripts, and almost every one conflicted with another.
Initially, I thought that there must be something that indicated certain scripts, or at least certain handlers within these scripts, were unused. The first place I investigated was the script’s “handler vectors.” These differed between some of the duplicate scripts, and I thought they might hold the key to how the script conflicts should be handled.
“Handler vectors” were identified as an array of 16-bit integers in Anthony Kleine’s Director documentation, but there was no explanation of their purpose. I got in touch with Anthony, but he couldn’t remember what they were for, and they remained a mystery to me for weeks. Once I began deeper investigation, it quickly became apparent that the “handler vectors” are just used to map event IDs to handler IDs. Totally unrelated.
The next suspect was the Lingo context, a container which maps script IDs to script data:
Section LctX {
Struct header {
...
Uint16 [big] freePtr
}
...
Array scripts(count) {
Struct scriptLink {
Uint32 unknown
Uint32 [big] ID // use MMAP!
Uint16 [big] used // 0 : unused , 4: used
Uint12 [big] link // For unused entries: link to next unused, or -1.
}
}
}
The two areas of interest are:
- The script entry’s
used
field - A linked list of unused scripts, which begins at the script entry at index
freePtr
. The entry’slink
field gives the index of the next unused script, or -1.
However, after much investigation, it seems that there is actually no difference in how a script with used = 0
and a script with used = 4
should be handled. The linked list does indeed indicate unused scripts, but all of the entries in the list seem to have an ID
of -1. Thus, these unused scripts have no script data associated with them, and they were never being loaded in the first place. There was no way they could be causing conflicts, since they didn’t really exist.
To finally solve the mystery, I had to throw out what I thought I knew about how Lingo scripts are linked to cast members. For years, the Director reverse engineering community understood that:
The scripts are not owned by their individual Cast Members in the Key Table [which links cast members to most of their assets] as you may expect. Instead, each Lingo Script has the number of its corresponding Cast Member (Source: Anthony Kleine).
After a few days of testing, I noticed that the cast member ID stored within Lingo scripts was sometimes incorrect. Or, as in Majestic, almost always incorrect. There had to be some other way by which cast members were linked to their scripts.
The obvious place to look was in the cast member data, which is split into two parts - data specific to the cast member type, followed by largely standard cast member info. We had previously identified a scriptId
field in the data specific to script cast members, and these IDs always seemed to be correct. However, other types of cast members could have scripts as well, and since they wouldn’t have this field, this solution wouldn’t work for them. Or so it seemed.
Long story short, we were treating too many bytes as type-specific data, and the scriptId
was actually in the standard cast member info. Once that was fixed, every cast member had a single, correct scriptId
associated with it. Use that to link cast members to the scripts, and no more duplicate scripts!
Grammar
Next was improvements to the Lingo grammar.
First, I needed to differentiate between statements and expressions. An expression by itself, like 2 + 2
, isn’t a valid Lingo script - it needs to be an argument to a statement, like put 2 + 2
. However, we were treating expressions and statements exactly the same, which allowed incorrect scripts and significantly complicated the grammar. Once this was fixed, half of the grammar’s 441 conflicts were gone.
Next, I needed to get rid of the differentiation between Lingo’s subroutine types. Confusingly, Lingo has (at least) 3 different types, with overlapping purposes:
- Commands - These are built-in, and invoked by a call statement, like
foo()
orfoo
. - Functions - These are also built-in, and invoked by a call expression, like
put foo()
. Very rarely, you can also invoke them as statements. - Handlers - These are user-defined, and can be invoked by either call statements or call expressions.
Now, these are separate things, but they should only be treated separately during execution. Previously we were differentiating them in the grammar, which again complicated things.
With that done, I began general cleanup. Reorganizing things where conflicts could be eliminated, reducing the use of right recursion, and adding support for fun statements like this one:
put cast cast
What should this do? Why, of course, it prints the cast member whose ID is equal to the variable cast
:
set cast = 1
put cast cast
-- (cast 1)
All in all, the grammar is now truer to the original, and we’re down to 6 conflicts from 441!