Home | My notes

The Lempel-Ziv or Ziv-lempel, or, even LZ compression

Mon 14 October 2019

Between the years 1976 to 1978, Abraham Lempel and Jacob Ziv created a procedure, and several variants of it, aiming the compression of data. Its latter proposed variants turnout to be a very effective procedure to compress many data types, such as texts and images. The fact that no prior information about the data statistics were required in advance (constrain imposed by statistical compression methods) and the simplistic, but powerfull, algorithm were two facts contributing to the popularity of the LZ compression methods.

Since it's creation, the LZ algorithms have been widely used but in many occasions there's a confusion between which algorithm correspond each variant. Furthermore, the increasing in the amount of data produced have pushed the initial LZ algorithms to be augumented with new data structures in order to become feasible for the analysis of big datasets. Specially for big texts, like human genome that has more than 3 billions ($10^9$) base pairs.

In this post, I plan to cover the whys and whichs of the lempel-ziv algorithms called LZ76, LZ77 and LZ78.

algorithms data structures strings genomics