lzop vs compress vs gzip vs bzip2 vs lzma vs lzma2/xz benchmark, reloaded
Post
Cancel

lzop vs compress vs gzip vs bzip2 vs lzma vs lzma2/xz benchmark, reloaded

I’ve had a couple of interesting comments at my last attempt to benchmark those algorithms. So, here is a more complete benchmark, with hopefully more detailed results.

Benchmark protocol

We are benchmarking all the algorithms supported by recent tar versions (1.22 was used):

program extension version description supported compression levels
lzop .lzop 1.02rc1 known to be very fast 1 to 9, but 2 to 6 are equivalent, 3 by default
compress .Z 4.2.4 the legacy UNIX compression algorithm not configurable
gzip .gz (.tgz) 1.3.12 replaced compress in recent UNIX-like OSes 1 to 9, 6 by default
bzip2 .bzip2 (.tbz, .tbz2) 1.0.5 known to have a better compression ratio than gzip, but much slower 1 to 9, 9 by default
lzma .lzma 4.999.9beta new algorithm aiming at high compression ratios 0 to 9, 6 by default
lzma2 .xz (.txz) 4.999.9beta xz is a compression format, and uses the lzma2 algorithm by default, it has some new features over lzma, for example integrity checking 0 to 9, 6 by default

Benchmark protocol at a glance:

  • I used the Linux 2.4.0 kernel archive contents as data to compress. The uncompressed version takes 100 132 718 bytes of disk space (or 95.5 Mb).
  • Each algorithm has been tested with all supported compression levels
  • The resulting archive size has of course been measured
  • Compression and decompression tests have been run 3 times per algorithm per compression level
  • RAM memory used has been measured during both compression and decompression
  • The time elapsed during compression and decompression has been measured
  • All thoses tests have been done in /dev/shm (i.e. in memory) to avoid disk I/O overhead
  • I tried to use the multithreading features of LZMA/LZMA2, but it’s not yet implemented, as reported by the man and as tested by myself

For reference, the following script has been used to automate the benchmark:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
#! /bin/bash
NBLOOP=3
COMPRESS_OBJECT=linux-2.4.0
memstats()
{
  (
  renice 19 $$ >/dev/null 2>&1
  while : ; do
    ps --no-headers -o rss -C $1 || break
    sleep 1
  done | tail -n 1
  )
}
bench()
{
  for i in $(seq 1 $NBLOOP) ; do
    trap "rm -f out.$2" EXIT
    /usr/bin/time -f "DONE: comp $1-$3 ($i) time: %e" tar cf out.$2 $COMPRESS_OBJECT --$1 2>&1 >/dev/null & sleep 1
    mem=$(memstats $1)
    size=$(stat -c '%s' out.$2)
    echo "... mem: $mem size: $size"
    echo
    mkdir tmp_extract_$$ || exit 1
    trap "rm -f out.$2 ; rm -Rf tmp_extract_$$" EXIT
    /usr/bin/time -f "DONE: decomp $1-$3 ($i) time: %e" tar xf out.$2 -C tmp_extract_$$ 2>&1 >/dev/null & sleep 1
    mem=$(memstats $1)
    echo "... mem: $mem"
    echo
    rm -f out.$2
    rm -Rf tmp_extract_$$
    trap - EXIT
  done
}
for level in none ; do
  echo "=== COMPRESS ==="
  bench compress Z
done
for level in 1 3 7 8 9 ; do
  echo "=== LZOP -$level ==="
  export LZOP="-$level"
  bench lzop lzo $level
done
for level in 1 2 3 4 5 6 7 8 9 ; do
  echo "=== GZIP -$level ==="
  export GZIP="-$level"
  bench gzip gz $level
done
for level in 1 2 3 4 5 6 7 8 9 ; do
  echo "=== BZIP2 -$level ==="
  export BZIP2="-$level"
  bench bzip2 bz2 $level
done
for level in 0 1 2 3 4 5 6 7 8 9 ; do
  echo "=== LZMA -$level ==="
  export XZ_OPT="-$level"
  bench lzma lzma $level
done
for level in 0 1 2 3 4 5 6 7 8 9 ; do
  echo "=== XZ (LZMA2) -$level ==="
  export XZ_OPT="-$level"
  bench xz xz $level
done

Benchmark results

Here are the raw -and somewhat unreadable- results.

algo size (Mb) compression time (s) compression mem (Kb) decompression time (s) decompression mem (Kb)
compress 39.56 2.64 1 124 1.60 548
lzop-1 36.17 1.04 1 004 0.63 ?
lzop-3 36.38 1.11 940 0.65 ?
lzop-7 27.07 13.15 1 312 0.70 ?
lzop-8 26.74 27.67 1 308 0.65 ?
lzop-9 26.73 33.3 1 308 0.60 ?
gzip-1 28.72 2.74 708 1.42 486
gzip-2 27.44 2.90 708 1.42 486
gzip-3 26.50 3.22 708 1.40 484
gzip-4 24.77 3.56 708 1.33 486
gzip-5 23.82 4.43 718 1.27 500
gzip-6 23.43 5.78 716 1.29 488
gzip-7 23.33 6.74 700 1.25 488
gzip-8 23.25 9.82 692 1.27 488
gzip-9 23.23 13.2 694 1.25 486
bzip2-1 21.81 17.5 1 554 4.62 898
bzip2-2 20.59 17.6 2 336 4.48 1 288
bzip2-3 20.02 17.8 3 120 4.43 1 700
bzip2-4 19.66 18.5 3 900 4.49 3 900
bzip2-5 19.42 20.0 4 688 4.56 2 468
bzip2-6 19.25 20.6 5 468 4.76 2 878
bzip2-7 19.07 21.9 6 256 5.07 3 250
bzip2-8 18.94 22.5 7 040 5.08 3 644
bzip2-9 18.89 22.6 7 820 5.38 4 040
lzma-0 23.16 10.3 1 980 3.42 840
lzma-1 21.94 13.1 2 000 3.34 824
lzma-2 20.08 13.1 5 476 3.11 1 272
lzma-3 17.24 60.3 13 600 2.44 1 788
lzma-4 16.64 66.8 25 376 2.40 2 814
lzma-5 16.21 69.2 48 926 2.28 4 858
lzma-6 15.62 90.5 96 030 2.21 8 952
lzma-7 15.36 97.6 190 260 2.24 17 146
lzma-8 15.17 106 378 688 2.25 33 536
lzma-9 15.04 113 689 956 2.23 66 304
xz-0 23.16 10.7 2 088 3.63 864
xz-1 21.95 11.5 2 066 3.31 875
xz-2 20.08 13.2 5 556 2.96 1 300
xz-3 17.25 63.0 13 684 2.70 1 830
xz-4 16.64 65.6 25 450 2.60 2 836
xz-5 16.21 70.0 49 012 2.48 4 886
xz-6 15.62 90.5 96 112 2.50 9 000
xz-7 15.36 97.4 190 324 2.40 17 196
xz-8 15.17 110 378 740 2.44 35 556
xz-9 15.05 117 690 060 2.46 66 326

Results analysis

The outsiders

The compress algorithm is completely awful: it has the worst compression ratio. Other algorithms perform better, faster, and using less RAM. There’s not much more to say: forget this one. The lzop algorithm is indeed very fast, it can compress the whole kernel tree in about one second. The level 3 (which is the default) is really weird: it has a lower compression ratio and a lower compression speed than the level 1! So, it really has no advantages over the level 1. Levels 7, 8 and 9 are totally useless: very slow compression time, and still an awful compression ratio. So, the only interesting level of lzop seems to be 1. Take it if you need blazing speed at the cost of a terrible compression ratio, compared to the other algorithms (you’ll also get a low RAM usage for no additional cost).

Difference between XZ and LZMA2

Short answer: xz is a format that (currently) only uses the lzma2 compression algorithm.

Long answer: think of xz as a container for the compression data generated by the lzma2 algorithm. We also have this paradigm for video files for example: avi/mkv/mov/mp4/ogv are containers, and xvid/x264/theora are compression algorithms. The confusion is often made because currently, the xz format only supports the lzma2 algorithm (and it’ll remain the default, even if some day, others algorithms may be added). This confusion doesn’t happen with other formats/algorithms, as for example gzip is both a compression algorithm and a format. To be exact, the gzip format only supports to encapsulate data generated by gzip… the compression algorithm. In this article I’ll use “xz” to say “the lzma2 algorithm whose data is being encapsulated by the xz format”. You’ll probably agree it’s way simpler :)

Performance of LZMA vs LZMA2 (XZ)

The performance of lzma and xz are extremely close. Lzma2 doesn’t outperform lzma (“lzma1”), as one might expect : there’s no real difference between lzma and lzma2 in terms of compression ratio, compression/decompression speed, or RAM usage. This is because lzma2 has just a few modifications over lzma1, and most of them are not regarding the compression algorithm itself, it just fixes some practical issues lzma1 had (according to the xz man page). The ‘‘.lzma’’ format will most likely disappear in a near future in favor of the ‘‘.xz’’ format (which is already widely preferred over ‘‘.lzma’’). And if you have read the above paragraph, yes, lzma1 was both a compression algorithm and a (messy) format. :)

Results ordered by compression ratio

In the following table, I’ve removed lzma for brevity’s sake (if you’ve read the above paragraph, you know why).

The lines in grey mean that the current algorithm+level is suboptimal: it has a lower compression ratio and an higher compression time than the algorithm+level of the immediately above row. In short: these are combinations you shouldn’t use.

Two numbers in orange have a big gap between them, this is to ease readability and pinpoint the major magnitude transitions between the numbers.

algo size (Mb) ctime (s) cmem (Kb) dtime (s) dmem (Kb)
xz-9 15.05 117 690 060 2.46 66 326
xz-8 15.17 110 378 740 2.44 35 556
xz-7 15.36 97.4 190 324 2.40 17 196
xz-6 15.62 90.5 96 112 2.50 9 000
xz-5 16.21 70.0 49 012 2.48 4 886
xz-4 16.64 65.6 25 450 2.60 2 836
xz-3 17.25 63.0 13 684 2.70 1 830
bzip2-9 18.89 22.6 7 820 5.38 4 040
bzip2-8 18.94 22.5 7 040 5.08 3 644
bzip2-7 19.07 21.9 6 256 5.07 3 250
bzip2-6 19.25 20.6 5 468 4.76 2 878
bzip2-5 19.42 20.0 4 688 4.56 2 468
bzip2-4 19.66 18.5 3 900 4.49 3 900
bzip2-3 20.02 17.8 3 120 4.43 1 700
xz-2 20.08 13.2 5 556 2.96 1 300
bzip2-2 20.59 17.6 2 336 4.48 1 288
bzip2-1 21.81 17.5 1 554 4.62 898
xz-1 21.95 11.5 2 066 3.31 875
xz-0 23.16 10.7 2 088 3.63 864
gzip-9 23.23 13.2 694 1.25 486
gzip-8 23.25 9.82 692 1.27 488
gzip-7 23.33 6.74 700 1.25 488
gzip-6 23.43 5.78 716 1.29 488
gzip-5 23.82 4.43 718 1.27 500
gzip-4 24.77 3.56 708 1.33 486
gzip-3 26.50 3.22 708 1.40 484
lzop-9 26.73 33.3 1 308 0.60 ?
lzop-8 26.74 27.67 1 308 0.65 ?
lzop-7 27.07 13.15 1 312 0.70 ?
gzip-2 27.44 2.90 708 1.42 486
gzip-1 28.72 2.74 708 1.42 486
lzop-1 36.17 1.04 1 004 0.63 ?
lzop-3 36.38 1.11 940 0.65 ?
compress 39.56 2.64 1 124 1.60 548

Some highlights

  • As we already seen, lzop is the fastest algorithm, but if you’re looking for pure speed, you might better want to take a look at gzip and its lowest compression levels. It’s also pretty fast, and achieves a way better compression ratio than lzop.
  • The higher level of gzip (9, which is the default), and the lower levels of bzip2 (1, 2, 3) are outperformed by the lower levels of xz (0, 1, 2).
  • The level 0 of xz might not be used, its use is somewhat discouraged in the man, because its meaning might change in a future version, and select an non-lzma2 algorithm to try to achieve an higher compression speed.
  • The higher levels of xz (3 and above) might only be used if you want the best compression ratio, and definitely don’t care about the enormous time of compression, and gigantic amount of RAM used. The levels 7 to 9 are particularly insane in this regard, while offering you a ridiculously tiny better compression ratio than mid-levels.
  • The bzip2 decompression time is particularly bad, whatever level is used. If you care about the decompression time, better avoid bzip2 entirely, and use gzip if you prefer speed or xz if you prefer compression ratio.
This post is licensed under CC BY 4.0 by the author.