Skip to content

vSAN 8 Express Storage Architecture

Learn about advancements in vSAN technology and the new features and the benefits of the re-architected vSAN offering

Author:  Ian Moore, boxxe Hybrid Cloud Hub Leader
Posted:  10 November 2023
Category:  Blog

vSAN 8 Express Storage Architecture

At VMware Explore Barcelona, we learned about several advancements in vSAN technology included in vSAN 8 Express Storage Architecture.  We’re here to focus on some of the new features and the benefits of the re-architected vSAN offering. 

The first question to address when discussing a refresh of the architecture for a product which has been a staple in the VMware product catalogue and a fundamental component of HCI is why?  VMware vSAN OSA (Original Storage Architecture) has served us well for a number of years, however advancements in compute architecture meant it was possible to introduce ESA (Express Storage Architecture) to leverage increases in available CPU, memory, networking speed, bandwidth and storage.  The advancements have increased the amount of resource that can be consumed as vSAN overhead, which in return delivers a more performant, efficient platform with additional capabilities.

Efficiency

Fault tolerance is a major component and consideration for any storage platform & vSAN is no different.  To provide vSAN fault tolerance, OSA (Original Storage Architecture) uses a large amount of resource to perform the same data services on multiple hosts.  Exactly how resource intensive this is depends on the level of fault tolerance applied by policy based management to each VM disk. 

Data services are implemented at the bottom of the stack, so tasks including compression, dedupe, checksum & encryption is performed on every host contributing to the fault tolerance policy level of a VM.  If you are using RAID 6 to protect a VM, this results in each action being performed six times. 

ESA has been rearchitected to make this more efficient.  The major change being actions are carried out at the source, which is the top of the vSAN stack rather than at the destination & bottom of the vSAN stack.  This results in actions performed once at the top of the stack and then written to the bottom of the stack resulting in a much more efficient process. 

Resiliency

One of the frequent requests from vSAN customers is an increase in resiliency of the vSAN platform.  Adaptive RAID configuration has been introduced to meet this requirement and protect data in the event of a host contributing to the vSAN pool being offline for more than 24 hours.  When this situation occurs, vSAN will automatically change the fault tolerance level best suited to the number of available hosts to protect data.  For example, if a cluster with 5 nodes with VM disks protected by a RAID 5 configuration has a host failure, after 24 hours the VM will be refactored from 4+1 configuration (4 data + 1 parity) to a 2+1 configuration (2 data + 1 parity).  The reason for this change is to protect the accessible VM from an additional host failure, resulting in an additional layer of resiliency.  This process is automatic so doesn't require any intervention by administrators.  There is an advanced setting which can be used to change the 24 hour period, however it is recommended to leave this at the default value in production environments.  

With ESA there are still options for running small clusters such as the two node cluster with a witness node or a three node cluster.  The same recommendation applies of including an additional level of resiliency in the form of an additional host to retain the configured FTT in the event of a single host failure. 

The concept of a disk group or multiple disk groups within each vSAN host is no longer included in ESA and is replaced with a system whereby all disks in a vSAN host are available to the vSAN storage pool.  This is a huge change in the architecture of vSAN and a welcome one considering the resulting impact of a disk failure in OSA, which depending a cache / capacity disk failure & features enabled could result in a disk marked as failed to a disk group offline.  In a configuration where a single disk group is present in each host this could result in a host unable to contribute storage to the vSAN disk pool. 

In ESA a disk failure will not have such an affect as all remaining disks in the disk pool are still accessible.  In comparison to a disk failure condition in OSA, this results in an increase in availability, capacity & performance. 

Performance

ESA is optimised for RAID5 & RAID6, taking advantage of advancements in storage hardware including TLC NVMe based devices.  Large blocks are written to the devices for maximum performance with a focus on full stripe writes. 

With the performance leg residing in a single tier, data is collected in a log structured file system & written to the performance leg which is almost always a RAID1 configuration.  Once enough data is collected here it is written to the capacity leg using full stripe writes to a RAID5 or RAID6 configuration.  This results in the write performance of RAID1 with the fault tolerance and space efficiency of RAID5 & RAID6. 

Scalable snapshots have been introduced to ESA to address the issue of multiple snapshots (for example 5 or more) impacting the performance of a VM disk.  Performance degradation occurs when data is stored in multiple snapshot files.  In ESA, metadata is used instead of a separate snapshot file so all data is written to the base disk.  When the snapshot is deleted, only the metadata changes so there is no longer a requirement to merge the data from the snapshot file into the base disk as this is where it is already located.  This results in a huge decrease in the time it takes to delete a snapshot & reduction in the performance degradation associated with multiple snapshots.   

Management

Auto-policy management looks for changes in the vSAN environment such as the addition of a host, then makes recommendations via Skyline Health regarding the default policy applied to vSphere workloads.  An example of this would be the addition of a host that would allow the policy RAID level to increase from RAID5 to RAID6.  VMware have included a button in vSAN 8 U2 which simplifies the process to apply the recommended change to the default policy.   Once the policy has been updated this will then need to be applied to VM's for the change in policy to take effect. 

Summary

VMware has made huge advancements in vSAN technology with improvements in efficiency, resilience, performance and management.  With all of the benefits of vSAN 8, taking advantage of this by introducing it into your environment seems like a no-brainer.  If you didn't know, now you know.  When is your upgrade to vSAN 8 scheduled for?

Schedule your upgrade to vSAN 8 

Take advantage of the huge advancements in vSAN technology with improvements in efficiency, resilience, performance and management.

Get in touch to discuss your upgrade to vSAN 8.

The boxxe Shop - Get the hardware and accessories you need, with next day delivery.  Shop now.

The boxxe shop

Get the top-brand tech you need, with next day delivery