Following testing after ebe798a, this is a more than sufficient size to
cover our use case.
The old default was a drop of about 58 dB PSNR using the old code, and
this new default is about 65 dB PSNR, so it's actually an improvement
despite resulting in a smaller size.
There was no outlier whatsoever when comparing sizes around the 64
neighbourhood (with every step corresponding to a PSNR drop of about
0.07 dB), so I picked this since it's a power of two and requires no
change to the current 3dlut-size parsing logic.
I also tested smaller sizes such as 32x32x32 which performed almost as
well on colorful samples, but this results in noticeable black boost in
the dark regions, which is pretty undesirable. Therefore, we should
avoid going much further below 64x64x64.
Either way, this new size is so fast to compute that the 3dlut cache is
almost useless on my end. In fact, it might even be slower to load the
profile from the cache than to recompute it from scratch. (For caches on
a disk. For cache on a tmpfs, it makes no difference)