====== compress vs gzip vs bzip2 vs lzma vs lzma2 aka xz benchmark ====== **Edit:** A more recent and more complete benchmark is available : [[lzop_vs_compress_vs_gzip_vs_bzip2_vs_lzma_vs_lzma2-xz_benchmark_reloaded]] I discovered the <color red>''xz''</color> compression algorithm some weeks ago. It's derived from LZMA, which is quite effective. Recent <color red>''tar''</color> versions even include support for this algorithm, and the official [[ftp://ftp.gnu.org/gnu/coreutils/|GNU FTP server]] now uses it for all its new programs releases. I was wondering how well it performed versus more usually used algorithms, as <color red>''gzip''</color> and <color red>''bzip2''</color>. For completeness, I've also included the obsolete <color red>''compress''</color> algorithm. I also tried the plain and unmodified LZMA algorithm (contrary to the LZMA2/xz algorithm), which is not directly supported by <color red>''tar''</color>, and will probably disappear in favor of the <color red>''xz''</color> utils anyway. For the benchmark, I used the Linux 2.6.32-rc1 kernel source files. Read and write is done entirely in memory, to avoid the overhead of disk i/o. Each test has been done 5 times. Here are the results: ^ compression algorithm ^ command line ^ archive size ^ compression time ^ | tar w/o compression | tar cf arc.tar linux-2.6.32-rc1 | 363 Mb | 00:01.05 | | tar + compress | tar cZf arc.tar.Z linux-2.6.32-rc1 | 144 Mb | 00:10.08 | | tar + gzip | tar czf arc.tar.gz linux-2.6.32-rc1 | 79 Mb | 00:20.40 | | tar + bz2 | tar cjf arc.tar.bz2 linux-2.6.32-rc1 | 62 Mb | 01:11.50 | | tar + lzma | tar cf arc.tar.lzma --use-compress-program=lzma linux-2.6.32-rc1 | 52 Mb | 05:24.00 | | tar + xz (lzma2) | tar cJf arc.tar.xz linux-2.6.32-rc1 | 52 Mb | 05:24.00 | Some nice graphs of these results, built with [[http://www.chartgo.com|this tool]]: {{:blog:tarbench_size.png|}} {{:blog:tarbench_time.png|}} As you can see, <color red>''lzma''</color> and <color red>''xz''</color> (lzma2) offer the same results... I don't know which is the real technical difference between both. <color red>''xz''</color> indeed offers the best compression ratio, but only about 16% better than <color red>''bzip2''</color>. This comes at a price (and an expensive one) : <color red>''xz''</color> is 7,7 times slower than <color red>''bzip2''</color>, which is already 3 times slower than <color red>''gzip''</color> ! So, should you replace all your <color red>''gzip''</color>/<color red>''bzip2''</color> love with <color red>''xz''</color> love ? Well, I'm not sure you should. The only reason I would use <color red>''xz''</color> is for big archives, which will be almost never read again, like, CD backups, etc. And only if you don't care at all about the time it takes to generate those archives: I personally use the excellent <color red>''duplicity''</color> for my server daily-backups, and I would never use <color red>''xz''</color> for this. Indeed I don't want my server to spend all it's CPU time compressing stuff ! I even switched from <color red>''bzip2''</color> to <color red>''gzip''</color>, as I have a lot of space on my backup backend, and am more interested in fast backups that don't get in the way than smaller backups. <color red>''xz''</color> might also be a choice if you have to transfer the same data to a lot of people, and are a bit short in bandwidth... but again only if the price of the big compression time is smaller than the price of the several extra data you won't have to transfer. That might be why GNU's FTP switched to this format ! **Edit:** A more recent and more complete benchmark is available : [[lzop_vs_compress_vs_gzip_vs_bzip2_vs_lzma_vs_lzma2-xz_benchmark_reloaded]] ~~META:date created = 2009-10-03 23:30:00~~

 
blog/compress_vs_gzip_vs_bzip2_vs_lzma_vs_lzma2_aka_xz_benchmark.txt · Last modified: 09/03/2010 14:07 by speed47 · []
Recent changes RSS feed Powered by PHP Valid XHTML 1.0 Valid CSS Driven by DokuWiki