Parallel BZIP2 is awesome. I know bzip2 is a
step up from gzip, but often I wonder if the added CPU cost is worth the extra
compression. Parallel bzip2 is a parallel implementation which uses
pthreads to achieve near-linear speedup on SMP machines. It uses libbzip2
underneath so you get full compatibility.
Why would one want this? Well, I have a quad xeon at work which
takes a very long time to compress certain files which are of 10-20G+
size. bzip2
precompiled for win32 pegged one CPU for hours on end. I
figure any speedup would be a good thing.
Why not just use zip? Well, it turns out that I was doing just that
using infozip and running into a limit with zip files in general. 4GB max file
size. At first I thought about some workarounds, but I realized that the
best solution was probably to use something without such a limit.
Windows CMD.EXE seems so limiting compared to borne, korne, or bash.
Yet, there are some gems in there that make simple things simple.
I’m using this command script weekly to zip up some logs.
@echo off
FOR
%%F IN (*_bak) DO c:\sysmaint\bzip2.exe -f %%F
The Parallel Bzip2 page is missing a precompiled win32
version, so I made one using mingw. I just followed the directions in the
pbzip2 README file, but still, it took a while to download mingw and msys and
get the pthreads library and build bzip2 because I wanted it statically
linked. So depending on your unix/win-fu using this prebuilt stuff could
save you anywhere from 10 minutes to half a day of time
I realized after the fact that maybe 7zip would have been a good alternative. There is an easy Windows installer, probably command line tools, and hopefully no 4G file limit. Having 7zip installed so that my coworkers can simply double click on a 7z file may make more since. My coworkers aren’t going to know what to do with a bzip2 file, and if I show them, they will think that it must not be of good quality if it is at the command line.