.TD4 File RLE Compression

For all you rct hackers out there - enjoy!

Here's the lowdown on the RLE compression used for some RCT files.
This ties in with the checksum calculation stuff I posted last week.
The files that are in RLE-compressed format are the .SV4, .SC4, .TD4,
and the SC.IDX files. Before doing anything to any of these files you
first have to un-rle it. Then you can hack around with it. When done,
you must rle-compress it and tack on the 4-byte checksum value.
Without the correct checksum, RCT will not recognize any of the
files regardless of the file extension. One of the first things you will
notice is that the un-rle'ed files have common sizes. To wit:

All .TD4 files decompress to 8,058 bytes.
All .SC4 files decompress to 2,056,676 bytes
All .SV4 files decompress to 2,056,676 bytes
The SC.IDX file decompresses to 14,864 bytes

The same decompressed sizes for the scenario and saved game files
leads me to conclude their format is very much similar, if not identical.

When you rle-compress a file, the resulting file size is dependent on
the data that was compressed, obviously. Thus, a 1-byte change in
a decompressed file may lead to the compressed version having a
different file size from the original file. Not to worry. Once you can
compress and uncompress the files, these file sizes are no longer
an issue. The uncompressed forms are ALWAYS the above sizes.
This is a "good thing".

The procedure to use is (1) uncompress a file, (2) edit it, (3) compress
the file and (4) add the 4-byte checksum onto the end. When uncompressing,
you uncompress only (file size - 4) bytes. Discard the checksum at the end
of the file. You will need to recompute it and add it back on in step (4),
however.

Now for the rle stuff. RLE stands for Run-Length Encoding. I'm not going to
elaborate on this a lot, other than to say it's a technique that's been around
for a long time. Also, I'm only going to describe how to un-rle (decompress)
a file. To compress it back to usable form (minus the checksum, of course),
you have to reverse the process. Luke Harless wrote a program that scans
your TD4 files and generates some numbers for each of the rides. He used
Visual Basic to do it. I'm not familiar with VB, so I'm not sure how to implement
this stuff that way, but you'll figure it out. My description is in language-neutral
form (I use C). Here's the narrative on decompressing an rle'ed file:

Read in your TD4/SC4/SV4/sc.idx file (minus the last 4 bytes). Start at the
beginning of the buffer - byte 0 - a very good place to start. We're basically
going to process the data in chunks. When all your chunks are processed,
your input buffer should be empty. Let's use "Shuttle Loop.TD4" for this
example.

At the beginning of every chunk is a size byte. The first byte we see is x'02'.
It's positive (bytes 00-7F are positive while 80-FF are negative).

If the size byte is positive, add 1 to it. Here we get 3. So copy the next 3 bytes
in the buffer to your output exactly as they appear. In this case, that would be
'00 0F 80'.

The next byte we see in the buffer is x'FE'. This is another size byte. It's negative.
When you see a negative size byte, negate it and add 1 to it. Here, we get
(-1 * -2) + 1 = 3. If this were a positive size byte, we would copy the next 3 bytes
to the output. But it's negative and that means THE NEXT BYTE is to be repeated
in the output the result number of times: 3 in this case.

So now, our output so far is "0F 00 80 00 00 00".

Lather - Rinse - Repeat

That's about it. Keep processing the chunks in the input buffer until you run out of
data. Start with the size byte. If positive, copy (n+1) following bytes exactly as
they appear to output. If negative, repeat the following byte in the output (n*-1)+1
times. When you're done, the resulting output file will be one of the numbers I gave
above, depending on the type of file.

Rle compression is the reverse. It's a bit more complicated to implement as you
have to scan forward in your data to see if bytes repeat, and if so how many. If they
don't repeat, how many unique values are there before there's a repeating value?

It is possible to cheat somewhat on compression. For instance, when decompressing,
"FE 00" goes to "00 00 00". The same result could be had with "02 00 00 00". Both
decompress to the same thing: 3 bytes of 00. So you could compress this chunk as
"FE 00" or "02 00 00 00". The net result is the same. Of course, you defeat the purpose
of compression here as your "compressed" output is 4 bytes instead of 2 bytes
for the 3-byte input chunk.

There are a few limitations that become evident as you look at this scheme. The
largest positive size byte is 7F. So the largest chunk of output data (or conversely,
un-compressed input data) is (x'7F' + 1) 128 bytes. A 7F size byte says "Put the next
128 bytes exactly as they appear into the output stream". The largest negative size byte
would be x'80'. An 80 size byte says "Put the next byte into the output stream 128 times".

A VERY IMPORTANT CAVEAT: The scheme used by rct uses a 125-byte buffer for
compression. Conversely, when you un-compress, the maximum chunk of data you
get will be 125 bytes per size byte. So, if you are compressing a file and you see, for
example, 1,000 bytes of x'00', you must compress ONLY 125 bytes at a time and then
begin a new chunk. The same goes for if you see 1,000 bytes of unique non-repeating
byte values. Cut off each input chunk at 125 bytes and set the size byte to x'7C'.
This is why you see a proliferation of "84 00" bytes in the compressed files. Each of
these is in effect describing a sequence of 125 x'00's in the raw data. A bunch of
"84 00"s in a row simply means there were A LOT of x'00's in a row being compressed.

End of RLE discussion. Remember if you are compressing, you have to calculate the
checksum on all the bytes you've output and tack it on to your output.

This business of rle compression is critical. At the stage where the data is compressed
or uncompressed we are totally unconcerned with what the bytes mean. It's merely a way
of storing data in files more efficiently

Have fun!

DE

Comments and suggestions can be sent here