Parallel BZIP2 is awesome. I know bzip2 is a
step up from gzip, but often I wonder if the added CPU cost is worth the extra
compression. Parallel bzip2 is a parallel implementation which uses
pthreads to achieve near-linear speedup on SMP machines. It uses libbzip2
underneath so you get full compatibility.
Why would one want this? Well, I have a quad xeon at work which
takes a very long time to compress certain files which are of 10-20G+
precompiled for win32 pegged one CPU for hours on end. I
figure any speedup would be a good thing.
Why not just use zip? Well, it turns out that I was doing just that
using infozip and running into a limit with zip files in general. 4GB max file
size. At first I thought about some workarounds, but I realized that the
best solution was probably to use something without such a limit.
Windows CMD.EXE seems so limiting compared to borne, korne, or bash.
Yet, there are some gems in there that make simple things simple.
I’m using this command script weekly to zip up some logs.
%%F IN (*_bak) DO c:\sysmaint\bzip2.exe -f %%F
The Parallel Bzip2 page is missing a precompiled win32
version, so I made one using mingw. I just followed the directions in the
pbzip2 README file, but still, it took a while to download mingw and msys and
get the pthreads library and build bzip2 because I wanted it statically
linked. So depending on your unix/win-fu using this prebuilt stuff could
save you anywhere from 10 minutes to half a day of time