RUN- LENGTH ENCODING FOR IMAGE COMPRESSION.
Existing image compression formats, (such as the .gif and .jpg standards) provide perfectly adequate performance in most circumstances. However, you may want to experiment with image compression without delving into Huffman coding, or detailed Transform processing. The following run- length encoding algorithm will provide good results with images having a large uniform backround (some prior vector quantization may be useful. So is a program that removes an isolated pixel surrounded by pixels of the same colour- it might have been noise, anyway. Dithering will restore some of the original texture in the decompressed image.)
- Find a value within the colour range of the image which is not being used: With images having 256 (or more) colour levels, it will be a surprise if you cannot find one; Occasionally, a free slot will not be availiable with 16- tone images: Then, the least frequently occuring colour can merge with either the closest one, or the colour of the second lowest occurence- in that case the decompressed image will not be a lossless replica of the unprocessed image, but then a 16- tone image would not have been an accurate copy of the original scene in the first place. The free value then becomes your key to indicate that the colour value immediately preceeding it is being repeated the number of times shown in the value immediately following it.
- Starting at the lower left- hand corner, the image is scanned from left to right, and then the program moves to the line directly above.
- Read the bottom left pixel, and output this to a file. If any of the two subsequent pixels has a different value, move to the next pixel, otherwise output the key and the length of the uniform run, as well. The reason for not encoding uniform runs of length less than three, is to guarantee that the intended compressed file is never longer than the original. Example: The sequence 2,3,3,4,4,4,4,4,1 is output as 2,3,3,4,14,5,1, the key being '14'.
- If the run length is longer than the largest number in the wordsize availiable, output the key again, followed by the remaining length of the run. The use of a key could as much as halve the compressed file length, because isolated pixels do not have to be encoded as uniform sequences of length '1'.
- For (2- tone) black- and- white images, encode the value of the bottom left pixel, followed by the length of each run. A run- length of zero implies that the run- length immediately preceeding cannot be accommodated within the availiable wordlength (ie, it is longer than 255.)
- Sometimes, it is preferable to scan the image column by column, instead of row by row (eg, in an image containing strips): If the option is made availiable, a flag must be included to indicate when it is active.
- Performance is not the same at any magnification of the original image: Encoding works best, when the length of uniform sequences is much larger than three. If the savings are meagre, the program could decline to output a compressed file.
- An obvious improvement (well- known in text compression), is to encode any sequence which is repeated one or more times, rather than just uniform sequences: This time, the key is followed by the serial number of the run, its length and the sequence itself. A sequence length of zero indicates that a previously encoded sequence is being repeated: Sequences shorter than 4 are not worth encoding. If the method is used in conjunction with uniform sequence encoding, then a serial number of zero signifies a uniform sequence.
Example:
A convenient picture to compress: A binary tree. Saved as a 16- tone bitmap, the file length is about 6500 bytes, inclusive of size and resolution information, but exclusive of palette and other information required by the bitmap header.
Compressed version at 777 bytes, (to scale), inclusive of size and resolution information, but exclusive of palette information. The reconstruction is, in this case, exact. Note: The gif standard is used for transmission of all images in this website, so the file lengths are different from the values quoted above, due to additional information required by the gif header.