I Did Not Know: pbzip2
I just learned about pbzip2, which lets your multicore computer use more than one core when using the bzip2 compression algorithm.
On my Mac Pro at work, I installed it with MacPorts (sudo port install pbzip2
). It is this kind of awesome:
$ ls -lh original.tar
-rw-r--r-- 1 jmcmurry staff 2.4G Feb 4 13:47 original.tar
$ time bzip2 -k -v original.tar
original.tar: 36.215:1, 0.221 bits/byte, 97.24% saved,
2604288000 in, 71911733 out.
real 13m3.313s
user 12m50.536s
sys 0m3.773s
$ mv original.tar.bz2 bzip2.tar.bz2
$ time pbzip2 -k -v original.tar
Parallel BZIP2 v1.0.5 - by: Jeff Gilchrist [http://compression.ca]
[Jan. 08, 2009] (uses libbzip2 by Julian Seward)
# CPUs: 8
BWT Block Size: 900k
File Block Size: 900k
-------------------------------------------
File #: 1 of 1
Input Name: original.tar
Output Name: original.tar.bz2
Input Size: 2604288000 bytes
Compressing data...
-------------------------------------------
Wall Clock: 119.369207 seconds
real 1m59.612s
user 14m39.090s
sys 0m44.840s
Sweet. 6.57x faster by adding a “p” to my command line.
The resulting compressed .bz2 files aren’t exactly the same according to md5 (the pbzip2 output is a little larger, which makes sense due to the splitting of the work), but when they decompress, they’re both identical to the original .tar file.
See also: mgzip.