2016. december 29., csütörtök

VM image compression and optimization for transfer, distribution or deep-freeze

We often need to send and receive VM images, mostly over intercontinental and/or slow and/or high latency and/or unreliable connections. We had serious hardships in the past trying to receive last minute updates in such circumstances e.g. using hotel WiFi or 3G connections in Kuala Lumpur.

After some research it seems most important to clean up the inside of the image before transfer to avoid up to 10 hours (in my test example) spent transferring unnecessary clutter.

Most important findings, dos and do not dos follow, detailed instructions and data afterwards!

Most important findings

  1. You can often gain more size reduction from cleanup of the VM, than compression! In my example a 39M demo image became 25GB after a few cleanup steps.
  2. Transferring with rsync (or scp) can compress, _but_ that uses "gzip -6" which compresses my initial image to 18GB - my test image was compressed to 7.23GB with 7zip after cleanup.
  3. Translating that to transfer / download times: I tested on an intercontinental link (Singapore-EU) having the average speed of ~320kByte/sec (~2.5Mbit/s) at the time. The transfer time avoided is almost 10 hours! (16:08:00 for compressed rsync vs. 6:15:46 for clean-up plus 7zip) Transfer amount also matters a lot if you are distributing to several parties.


  1.  Don't use zip or gzip to compress VMs!
  2. Don't over-optimize compression settings, cleanup matters more and the extra time will not buy you too much extra compression (see my insane 7zip settings tested below).
See below how - should you have questions, ask away in the comments section!

Step 1: Cleanup!

After some research (see benchmark data below if interested) it seems most important to clean up the inside of the image before transfer to avoid up to 10 hours spent transferring unnecessary clutter.
In our example: IntelliVectorDemo/ VMware product demo image:
Size (bytes)StepStep
38,852,268,020-Original image (vmware*.log and vprintproxy*.log already deleted)
37,231,468,532clean up inside
  1. Must have KB2852386 to remove files from WinSXS.
  2. Set StateFlags(DWORD32)=1 in all keys under (create if did not exist): HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows\CurrentVersion\Explorer\VolumeCaches
  3. Run Disk Cleanup for C: and any other drives from Properties
  4. Run:
    Dism /Online /Cleanup-Image /RestoreHealth
    Dism /online /Cleanup-Image /SPSuperseded
    Dism /online /Cleanup-Image /StartComponentCleanup /ResetBase
  5. Must reboot the VM for the WinSXS cleanup to happen. Do VMware "Clean Up Disks..." afterwards.
  6. If you got the guts, be more agressive by removing unused packages and features!
29,947,840,826clean up VM imageAfter all the cleanup _in_ the VM run the VMware disk cleanup _on_ it to shrink the disk image.
VMware / VM / Manage / Clean Up Disks…
28,932,666,690CCleanerRun Crap Cleaner and CC Enhancer inside VM.
27,528,290,245JBOSSUse Hdgraph or WinDirStat to find out what else is taking up a lot of space, maybe stale log files, App Server temp files, batch folders etc. Are hanging around and are not necessary.
In this case: JBOSS EAP cleanup + VM/Manage/Clean Up Disks…
25,160,484,978installsThink about the purpose of the VM. E.g. if it is not explicitly installation of software, remove the sw installation files from it.
Altogether 13691783042 bytes, more then 1/3 of the image was not necessary for the demo image!

Step 2: Initial transfer / distribution

There are two important parts to the network transfer itself:
  1. Initial transfer (upload or download) of the VM image: lot more efficient in a compressed form. It is very important to choose the right compression: forget Zip (DEFLATE), Gzip (DEFLATE), Bzip (Burrows-Wheeler):
    7z using LZMA(2) is the best both in compression and in compression ratio per wall time (taking its multi-threading ready design into account).
  2. Update image with changes: can be more efficient on the decompressed image using rsync to only transmit the changed blocks (no tests run to verify).
Both can be accomplished using rsnyc (to transfer just the *.7z file or to update the VM image directory):
rsync -avvrhz --partial --stats --progress --inplace [folder or 7z file name] user@server:/mnt/rsync/VMs/
rsync -avvrhz --partial --stats --progress user@server:/mnt/rsync/VMs/[folder or 7z file name] .
Of course you should change "/mnt/rsync/VMs" to the folder you use on the server.
User needs to have SSH access to the server and you need to have rsync and ssh installed locally (for windows: use the Linux subsystem in Windows 10, Cygwin on earlier versions of Windows).

Step 3: Updating the image (if necessary) 

Update image with changes: can be more efficient on the decompressed image using rsync to only transmit the changed blocks (no tests run to verify).
  1. Upload the 7z
  2. then unzip on the server
  3. others download the 7z
  4. unzip locally
  5. then keep syncing the changes on the decompressed images on the server and locally

For this the same rsync command above can also be used.

Benchmark data

Compression IntelliVector
CPU used Wall
Max mem. used
vmware*.log and vprintproxy*.log deleted 38,852,268,020 100.00%

gzip --fast 19,496,966,923 50.18% 88% 19:45.0 1.4MB
gzip -6 (this is what rsync compression uses) 18,625,975,551 47.94% 95% 31:49.4 1.4MB
gzip --best 18,572,524,862 47.80% 115% 56:14.1 1.4MB
7z a -t7z -m0=lzma2 -mx=3 -mfb=32 -md=1m -ms=on 16,460,348,353 42.37% 285% 0:49:30 61MB
7z a -t7z -m0=lzma2 -mx=5 -mfb=32 -md=16m -ms=on 14,732,928,437 37.92% 326% 1:38:24 587MB
7z a -t7z -m0=lzma2 -mx=9 -mfb=64 -md=64m -ms=on 14,079,148,699 36.24% 329% 2:04:54 2212MB
7z a -t7z -m0=lzma2 -mx=3 -mfb=32 -md=1m -ms=on 8,854,261,456 22.79% 312% 0:25:10 61MB
7z a -t7z -m0=lzma2 -mx=5 -mfb=32 -md=16m -ms=on 7,617,172,329 19.61% 284% 1:18:22 589MB
7z a -t7z -m0=lzma2 -mx=9 -mfb=64 -md=64m -ms=on 7,230,345,981 18.61% 283% 1:42:12 2070MB
7z a -t7z -m0=lzma2 -mx=9 -mfb=256 -md=256m -ms=on 7,107,098,190 18.29% 278 2:36:43 5102MB

Test environment

Processor i7-2640M @ 2.80GHz 2 cores+HT (4 virtual cores)
Disk Toshiba external 500GB USB3
RAM 8GB Physical
PC Lenovo X1 2011 (used since 2012 Feb 22)
OS Windows 10 64bit version 1607
linux Windows 10 Linux subsystem Ubuntu 14.04.5 LTS trusty
Vmware VMware Workstation Pro 12.5.2 build-4638234 Virtual Disk Manager
7zip 7-Zip [64] 9.20  p7zip Version 9.20 (HugeFiles=on,4 CPUs)
gzip gzip 1.6
test 1 CMOD95base 100GB+ image with several snapshots (8 vmdk files)
test 2 IntelliVectorDemo simple VMware player image (1 vmdk)

Additional interesting findings

  1. 16GB of the 25GB image is Windows (Server 2008 R2) and 8GB of that is the WinSXS folder. IT seems like a good idea to use a more recent version of Windows which also allows more flexible cleanup options with DISM.exe.
  2. Gzip can only use a single CPU core while 7zip used all of my cores in my test. So 7zip on a modern CPU is actually not so much slower!
  3. 7zip used 280%-320% CPU in my test of a 2 core +  Hyper threading CPU. Contrary to my previous beliefs HT actually adds significant bandwidth to the CPU!
  4. vmware-vdiskmanager.exe did not have any significant shrink or defrag effect on my .vmdk file, wonder why... The VMware Workstation GUI "Clean Up Disks..." worked though.

Nincsenek megjegyzések:

Megjegyzés küldése

Rendszeres olvasók