2021. március 29., hétfő

NTFS compression: when and how to use it, escpecially for IBM CMOD?

Fun fact: "NTFS can compress files using LZNT1 algorithm (a variant of LZ77)"

Birds of a feather with the OD77 used by IBM CMOD, and the algorithm is a very old / not a modern one... https://en.wikipedia.org/wiki/NTFS#File_compression

Quick history of the past 45 years of general purpose compression algorithms used by computers

  1. 1977-78: In the beginning there was LZ77 and a variation of it LZ78
  2. 1982: LZSS, a derivative of LZ77 (used by e.g. PkZip, ARJ, RAR etc.)
  3. 1984: LZ78 was improved upon by LZW (used for e.g. GIF, TIFF, PDF pictures)
  4. 1991: DEFLATE combines LZSS with Huffman coding (used in ZIP, WinZip, gzip etc.)
  5. 1996: BZIP2 This is the only one that is NOT LZ77 based, but uses the Burrows-Wheeler algorithm, RLE and Huffman coding - compresses better than LZW or Deflate, but also slower.
  6. 1996: LZMA is like LZ77 but on a bit level (instead of bytes) plus arithmetic coding - very good compression at the expense of being slow - using a lot of computation. Even slower than Bzip2 but also even better compression. (used in 7zip, XZ utils etc.)
  7. 1996: LZO is also Lempep-Ziv based, but instead of even better compression this focuses on much faster decompression (mostly used by Linux Kerenel: Initramfs, brtfs, squashfs, zram, zswap etc.)
  8. 2011: LZ4 also LZ77 based, even faster decompression than LZO, although at higher memory use
  9. 2013: Brotli developed in Google, combing  LZ77, Huffman coding and 2nd order context modelling (mostly used in HTTP by web servers and browsers)
  10. 2015: LZFSE is also Lempep-Ziv based, with Finite State Entropy coding by Apple, claimed to compress as good as DEFLATE but decompress 2x-3x faster and using less resources. Uses the LZSS based LZVN for small sizes, as LZFE is not good for those. According to the Squash Benchmark, LZFSE is similar in speed to ZSTD (level 6), but has a slightly worse ratio. LZVN is similar in speed to LZ4 level 4, with a slightly worse ratio as well.
  11. 2016: Zstandard/ZSTD is also LZ77 based and uses both FSE and Huffman coding (developed in Facebook, used by HTTP transfers, software packaging and the Linux kernel for several filesystems etc.)

(quick comparison of the significant speed and compression differences here: https://catchchallenger.first-world.info/wiki/Quick_Benchmark:_Gzip_vs_Bzip2_vs_LZMA_vs_XZ_vs_LZ4_vs_LZO)

As you can see:
(a) most formats in the last half century are building upon  LZ77.
(b) Up to the latest few years there is constant development in the field, formerly by scientists, nowadays by big tech companies.
(c) Software like the web servers and browsers (HTTP) and the Linux (both the kernel for BRTFS, Ext4FS, F2FS, InitRamFS, SquashFS, ZFS etc.) and the distributions) have been picking up the benefits of this progress.
(d) Software like Windows NTFS or IBM CMOD were stuck and left behind on the LZ77, that was improved upon so many times...

TL;DR:

Windows 10 and the modern Windows server versions however have a "hidden" new compression type LZX that they have created for their CompactOS initiative. It is also based on LZ77 but it uses Huffman entropy coding that LZNT1 lacked. LZX algorithm has a much higher compression ratio (it’s about 40-60%).

  • this is especially useful if you have large text files
  • this compression will not be automatically used by NTFS
  • Unlike the parameters suggest it is not only for executable files but can be applied for any file types. 
  • Any change to the files decompresses them, so this is ideally used on folders with non-changing files (Program Files, Windows folders, Games folders etc.)
  • If you apply it on e.g. large text files extracted from IBM CMOD then finish writing to them first and only apply it on the final files.
  • Can be turned on manually for files or all files in a specified folder:
compact /c /s /a /i /exe:lzx "C:\Program Files (x86)\*"

Rendszeres olvasók