Tuesday, 11 June 2013

CBT - Change Block Tracking and its common miss interpretations.

I have been working on a vCloud project for the last few months and it has now come to looking at the backup of the solution.  Commvault is the existing backup vendor and is going to continue to provide backup services in this Private cloud.

Now the customer came to me with an interesting question about how CBT worked, they wanted to understand how the changed blocks in the vmdk were tracked.  This was partly down to there thirst for information and partly because I think they were concerned at contraints that might be seen with IO and CPU.

The perception of most people is that CBT creates snapshots to record all the information in the snapshot and then it is consolidated, much like the usage of a normal snapshot in VMware. This is incorrect and although I am no storage expert I will attempt to explain how it works from my VMware background.

How it actually works is by keeping track of the blocks that have changed in a vmdk based on significant disk events recorded for a specific vmdk.

Simple right?  Well the next question from my customer was "Well how does it do that? And is it going to cause me pain on my storage IO and/or my hosts"  - This is a valid question as recording all this information has to have some overhead.

So how does it do it? Lets use the most common usage of CBT, Backup of a Virtual Machine.
  1. We take a full Backup of a VM with CBT enabled on all disks.  This takes a snapshot to record all IO, while the backup is being processed on the original vmdk.  The backup application records the Change Clock timestamp (T1) at the time the snapshot is created to facilitate this backup. After the backup is completed normal snapshot consolidation is conducted and  the changes in the delta are merged into the original vmdk.
  2. When the next backup is started 24 Hours later, another snapshot is taken of the VM's vmdk's.  The backup application records the Change Clock timestamp (T2) at the time of creating this snapshot. All changes are written to the delta vmdk while we backup the changed blocks in the origonal vmdk.  We still create a snapshot as we need to have access to the original vmdk.
  3. The backup application then backups up only the blocks of the original vmdk that have changed between the two time stamps (C).  
Now the next part of the proccess I wanted to examine was "How does it know what blocks are changed"

When we enable change block tracking, detailed here in KB1020128 and the VM is powered on for the first time. Have a look in the Datastore Browser and the VMs home directory.  You will see a file named "vm_name-ctk.vmdk" this file is used to track the blocks that are changed on a given vmdk.  This auxiliary file has a pointer configured in the vmdk to point to this file to track changes made to blocks in the correcponding vmdk.


So when step three, listed above, is performed the auxiliary file is used to identify blocks based on the two time stamps, and the backup application then performs a backup of the changed blocks.

Now the change clock is not based on a normal clock and utilizes a Unix Epoch based clock,  Now I dont work in engineering and I am under NDA from VMware (being an employee) so I cant share any more information around the nitty gritty details.

But stepping back, my customer wanted to know if this was going to cause them any pain with IO and/or CPU on the hosts.  The simple answer is that the impact of using CBT is minimal.  It will maybe cause the kernal to in use 1 or 2 % more CPU per 20 or 30 VMs. However it is important to remember that the small increase the kernal sees now will help reduce the IO and/or CPU increase if the VM was being fully backed up every night because CBT was not being used.  backups will also be much shorter as well due to smaller amounts of data being backed up.

Hope this has been helpful, many thanks.

Phil


No comments:

Post a Comment