Storage
Increase of number of hosts able to share a read
only file has increased from 8 to 32
A read only file in VMware is often a source VMDK used for
linked clones in either VMware View or VMware vCloud Director. Linked clones are used for quick deployment
in both View and VCD.
The current limit is 8 hosts. This results in View and VCD clusters being
limited to 8 hosts as if one of the linked clones was placed on the 9th host
in the cluster, this host would be denied access to the source VMDK file.
This limitation has been increased to allow up to 32 hosts
to access the read only file. This
removes the 8 host limitation for View and VCD and the limitation now is
defined by the cluster limitation of 32.
Introduction of new VMDK type, SE virtual disk
(Space –efficient virtual disk)
A well know problem with thin provisioning VMDK files was
that if space was freed in the OS of the VMDK file by deleting files etc., the
VMDK file did not shrink.
With SE virtual disks the space can be reclaimed.
- VMware tools will scan the
OS hard disks for unused allocated blocks.
The blocks are marked as free.
- Run the SCSI UNMAP command
in the guest to instruct the virtual SCSI layer in the VMKernal to mark
the blocks as free in the SE vmdk.
- The once the VMKernal
knows what blocks are free it reorganizes the SE vmdk so that the
construction of the vmdk is of a continuous data lump with all the free
blocks at the end of the vmdk.
- The VMKernal then sends
either a SCSI UNMAP command to the SCSI array or a RPC TRUNCATE command to
NFS based storage.
Improvements in detecting APD (all paths down) and
PDL (permanent device loss)
It is common when an APD condition is seen by an ESXI host,
the host will become unresponsive and will eventually become disconnected from
vCenter. This is due to the hostd
process not knowing if the removal of the storage devices is a permanent or a
transient state for the lost paths, because of this hostd does not timeout the
recan operation for rediscovering the paths or any threads being processed by
hostd. Everything just waits for I/O to
be received from the storage array, and because hostd has an infinite number of
threads the hostd process often becomes unresponsive and crashes.
There have been improvements over the last few releases of
vSphere to improve the detection of APD by introducing PDL detection (Permanent
device loss) by detecting specific SCSI sense codes from the target array. Improvements have been made to this function
as well as alterations to how APD conditions are handled.
The following information is an extract from the VMware
documentation listed in the appendix of this document.
In vSphere 5.1, a new time-out value for APD is being introduced. There
is a new global setting for this
feature called Misc.APDHandlingEnable. If this value is set to 0, the
current (vSphere 5.0) condition is used,
i.e., permanently retrying failing I/Os. If Misc.APDHandlingEnable is
set to 1, APD handling is enabled to follow the new model, using the time-out
value Misc.APDTimeout. This is set to a 140-second time-out by default, but it
is tunable. These settings are exposed in the UI. When APD is detected, the
timer starts. After 140 seconds, the device is marked as APD Timeout. Any
further I/Os are fast-failed with a status of No_Connect, preventing hostd and
others from getting hung. If any of the paths to the device recover, subsequent
I/Os to the device are issued normally, and special APD treatment concludes.
The above text indicates that the advanced setting Misc.APDHandillingEnable
should be set to 1 to allow for APD timeouts and to prevent the hostd process
crashing when APD occurs.
Another setting that should be configured is the disk.terminateVMOnPDLDefault
this allows HA to restart VMs that were impacted by the APD on another
host that is unaffected by the APD issues. There is a know problem with this
setting restarting machines that were gracefully shutdown during APD. Specifying the following advanced setting
removes this problem das.maskCleanShutdownEnabled. Both advanced settings should be used
together for best results from an APD condition.
Storage DRS V2.0
Improvements to storage DRS include additional detection for
latency and datastore placement on the storage device. Storage DRS introduces Storage Correlation which is a new feature used to detect if datastores
reside on the same SAN spindles. There would be little benefit in moving a VM
from one datastore to another if they reside on the same set of physical
spindles. Previously SDRS would analysis
constraint on the datastores and move a VM to a less populated datastore with
the assumption the VM would receive a performance benefit due to the datastore
being less populated. Now SDRS will
investigate if the datastore is sitting on the same spindles and if so it will
conclude that there will be little to no benefit and depending the
aggressiveness of the SDRS settings will not move the VM’s VMDK files.
VmObservedLatency
is another new feature of SDRS and is used to analysis the latency from the
time the VMKernal receives the storage command to the time the VMKernal
receives a response from the storage array.
This is an improvement over the previous level of monitoring which was
monitoring the latency only after the storage request had left the ESXi host. The new feature allows for latency inside the
host to be monitored as well. This is
useful because the latency between the array and the host may be 1 or 2
milliseconds but the latency in the host from the VMKernal could be
20-30milliseconds due to the number of commands being issued/queued on the HBA
that is being used for a specific datastore.
Thanks pleased you found it interesting
ReplyDelete