We often need to send and receive VM images, mostly over intercontinental and/or slow and/or high latency and/or unreliable connections. We had serious hardships in the past trying to receive last minute updates in such circumstances e.g. using hotel WiFi or 3G connections in Kuala Lumpur.
After some research it seems most important to clean up the inside of the image before transfer to avoid up to 10 hours (in my test example) spent transferring unnecessary clutter.
Most important findings, dos and do not dos follow, detailed instructions and data afterwards!
Most important findings
- You can often gain more size reduction from cleanup of the VM, than compression! In my example a 39M demo image became 25GB after a few cleanup steps.
- Transferring with rsync (or scp) can compress, _but_ that uses "gzip -6" which compresses my initial image to 18GB - my test image was compressed to 7.23GB with 7zip after cleanup.
- Translating that to transfer / download times: I tested on an intercontinental link (Singapore-EU) having the average speed of ~320kByte/sec (~2.5Mbit/s) at the time. The transfer time avoided is almost 10 hours! (16:08:00 for compressed rsync vs. 6:15:46 for clean-up plus 7zip) Transfer amount also matters a lot if you are distributing to several parties.
Don't:
- Don't use zip or gzip to compress VMs!
- Don't over-optimize compression settings, cleanup matters more and the extra time will not buy you too much extra compression (see my insane 7zip settings tested below).
See below how - should you have questions, ask away in the comments section!
Step 1: Cleanup!
After some research (see benchmark data below if interested) it seems most important to clean up the
inside of the image before transfer to avoid
up to 10 hours spent transferring unnecessary clutter.
In our example: IntelliVectorDemo/ VMware product demo image:
Size (bytes) | Step | Step |
38,852,268,020 | - | Original image (vmware*.log and vprintproxy*.log already deleted) |
37,231,468,532 | clean up inside |
- Must have KB2852386 to remove files from WinSXS.
- Set
StateFlags(DWORD32)=1 in all keys under (create if did not exist):
HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows\CurrentVersion\Explorer\VolumeCaches
- Run Disk Cleanup for C: and any other drives from Properties
- Run:
Dism /Online /Cleanup-Image /RestoreHealth
Dism /online /Cleanup-Image /SPSuperseded
Dism /online /Cleanup-Image /StartComponentCleanup /ResetBase
- Must reboot the VM for the WinSXS cleanup to happen. Do VMware "Clean Up Disks..." afterwards.
- If you got the guts, be more agressive by removing unused packages and features!
|
29,947,840,826 | clean up VM image | After all the cleanup _in_ the VM run the VMware disk cleanup _on_ it to shrink the disk image.
VMware / VM / Manage / Clean Up Disks… |
28,932,666,690 | CCleaner | Run Crap Cleaner and CC Enhancer inside VM. |
27,528,290,245 | JBOSS | Use
Hdgraph or WinDirStat to find out what else is taking up a lot of
space, maybe stale log files, App Server temp files, batch folders etc.
Are hanging around and are not necessary.
In this case: JBOSS EAP cleanup + VM/Manage/Clean Up Disks… |
25,160,484,978 | installs | Think
about the purpose of the VM. E.g. if it is not explicitly installation
of software, remove the sw installation files from it. |
Altogether 13691783042 bytes, more then 1/3 of the image was not necessary for the demo image!
Step 2: Initial transfer / distribution
There are two important parts to the network transfer itself:
- Initial
transfer (upload or download) of the VM image: lot more efficient in a
compressed form. It is very important to choose the right compression: forget Zip (DEFLATE), Gzip (DEFLATE), Bzip (Burrows-Wheeler):
7z using LZMA(2) is the best both in compression and in compression ratio per
wall time (taking its multi-threading ready design into account).
- Update
image with changes: can be more efficient on the decompressed image
using rsync to only transmit the changed blocks (no tests run to
verify).
Both can be accomplished using rsnyc (to transfer just the *.7z file or to update the VM image directory):
Upload:
rsync -avvrhz --partial --stats --progress --inplace
[folder or 7z file name]
user
@server
:/mnt/rsync/VMs/
Download:
rsync -avvrhz --partial --stats --progress user
@server
:/mnt/rsync/VMs/[folder or 7z file name] .
Of course you should change "
/mnt/rsync/VMs" to the folder you use on the server.
User needs to have SSH access to the server and you need to have
rsync and
ssh installed locally (for windows: use the
Linux subsystem in Windows 10,
Cygwin on earlier versions of Windows).
Step 3: Updating the image (if necessary)
Update
image with changes: can be more efficient on the decompressed image
using rsync to only transmit the changed blocks (no tests run to
verify).
- Upload the 7z
- then unzip on the server
- others download the 7z
- unzip locally
- then keep syncing the changes on the decompressed images on the server and locally
For this the same rsync command above can also be used.
Benchmark data
Compression |
IntelliVector
Demo.zip |
size ratio |
CPU used |
Wall
time |
Max mem. used |
vmware*.log and
vprintproxy*.log deleted |
38,852,268,020 |
100.00% |
|
|
|
gzip --fast |
19,496,966,923 |
50.18% |
88% |
19:45.0 |
1.4MB |
gzip -6 (this is what rsync compression uses) |
18,625,975,551 |
47.94% |
95% |
31:49.4 |
1.4MB |
gzip --best |
18,572,524,862 |
47.80% |
115% |
56:14.1 |
1.4MB |
7z a -t7z -m0=lzma2 -mx=3
-mfb=32 -md=1m -ms=on |
16,460,348,353 |
42.37% |
285% |
0:49:30 |
61MB |
7z a -t7z -m0=lzma2 -mx=5
-mfb=32 -md=16m -ms=on |
14,732,928,437 |
37.92% |
326% |
1:38:24 |
587MB |
7z a -t7z -m0=lzma2 -mx=9
-mfb=64 -md=64m -ms=on |
14,079,148,699 |
36.24% |
329% |
2:04:54 |
2212MB |
7z a -t7z -m0=lzma2 -mx=3
-mfb=32 -md=1m -ms=on |
8,854,261,456 |
22.79% |
312% |
0:25:10 |
61MB |
7z a -t7z -m0=lzma2 -mx=5
-mfb=32 -md=16m -ms=on |
7,617,172,329 |
19.61% |
284% |
1:18:22 |
589MB |
7z a -t7z -m0=lzma2 -mx=9
-mfb=64 -md=64m -ms=on |
7,230,345,981 |
18.61% |
283% |
1:42:12 |
2070MB |
7z a -t7z -m0=lzma2 -mx=9
-mfb=256 -md=256m -ms=on |
7,107,098,190 |
18.29% |
278 |
2:36:43 |
5102MB |
Test environment
|
Environment |
Processor |
i7-2640M @ 2.80GHz 2 cores+HT (4
virtual cores) |
Disk |
Toshiba external 500GB USB3 |
RAM |
8GB Physical |
PC |
Lenovo X1 2011 (used since 2012
Feb 22) |
OS |
Windows 10 64bit version 1607 |
linux |
Windows 10 Linux subsystem
Ubuntu 14.04.5 LTS trusty |
Vmware |
VMware Workstation Pro 12.5.2
build-4638234 Virtual Disk Manager |
7zip |
7-Zip [64] 9.20 p7zip Version 9.20 (HugeFiles=on,4 CPUs) |
gzip |
gzip 1.6 |
test 1 |
CMOD95base 100GB+ image with
several snapshots (8 vmdk files) |
test 2 |
IntelliVectorDemo simple VMware
player image (1 vmdk) |
Additional interesting findings
- 16GB of the 25GB image is Windows (Server 2008 R2) and 8GB of that is the WinSXS folder. IT seems like a good idea to use a more recent version of Windows which also allows more flexible cleanup options with DISM.exe.
- Gzip can only use a single CPU core while 7zip used all of my cores in my test. So 7zip on a modern CPU is actually not so much slower!
- 7zip used 280%-320% CPU in my test of a 2 core + Hyper threading CPU. Contrary to my previous beliefs HT actually adds significant bandwidth to the CPU!
- vmware-vdiskmanager.exe did not have any significant shrink or defrag effect on my .vmdk file, wonder why... The VMware Workstation GUI "Clean Up Disks..." worked though.