18.424 Homework 3 Question 5

The Burrows-Wheeler transform has been applied to create the string LHEESSEE_*PYP, where _ is a space and * is an End-Of-Data marker. The ordering of the symbols is ABC...Z_*.


5a. Invert the Burrows-Wheeler transform to obtain the original message.


5b. Turn this message into a string of integers using move-to-front encoding. (We are missing the last step in this compression, which might be Huffman coding. You don't have to do it.)


5c. Suppose that the Burrows-Wheeler transform is applied without an End-Of-Data marker. Without any extra information, the message is only determined up to cyclic shifts. Generally, if the message has n symbols, a pointer which can take on one of n values would be needed to disambiguate the message. This adds log2 n bits to the message. Suppose that the BWT is followed by move-to-front encoding and then a Huffman code. In general, will using an End-Of-Data marker be more, less or approximally equally efficient as using a pointer? Why?