Goal Improve IO and cut costs without risk.
Benchmark methodology
Measure at three layers to avoid false conclusions: guest IO tool, VM metrics, and disk metrics.
- Linux fio
# Random read write 4k mix for OLTP
sudo fio \
--name=randrw --filename=/mnt/data/testfile \
--rw=randrw --rwmixread=70 \
--bs=4k --iodepth=32 --numjobs=8 \
--ioengine=libaio --runtime=120 --time_based --group_reporting
- Windows DiskSpd
# 64k sequential read for backup or analytics
DiskSpd.exe -d120 -b64K -o32 -t8 -Sh -r -w0 C:\test\testfile.dat
- Collect platform metrics
- VM: Data Disk IOPS Consumed Percentage, VM Uncached Bandwidth Consumed Percentage, Queue Depth, Latency.
- Disk: Read Ops per second, Write Ops per second, Bytes per second.
Sample metrics from a test run
Workload Disk type Size GiB IOPS Latency ms p99 Throughput MBps OLTP 4k 70r30w Premium SSD v2 256 8,200 2.1 190 OLAP 64k read Premium SSD v2 512 7,100 1.6 720 Boot and patch Ephemeral OS (NVMe placement) n a n a Boot 40 percent faster n a
Numbers above are illustrative from lab tests. Always validate on your SKU and region.
SKU comparison and what to use when
Capability Premium SSD v2 (data disk) Premium SSD (OS or data) Ultra Disk OS disk support No Yes No Host caching support No Yes No Max IOPS per disk Up to 80,000 Up to 20,000 Up to 160,000 Max MBps per disk Up to 1,200 Up to 900 Up to 2,000 Performance control Set capacity, IOPS, MBps independently Fixed tier by size, optional performance tiering on some sizes Set IOPS and MBps independently Cost model Capacity plus provisioned IOPS plus MBps Fixed by size tier, optional performance tier uplift Capacity plus provisioned IOPS plus MBps Best for Databases, queues, hot logs, game servers OS disks, general workloads on fixed tiers Extreme transactional or analytics peaks
Ephemeral OS disks
- Store the OS on local SSD or NVMe of the VM. No remote managed OS disk. Ideal for stateless fleets that can be reimaged.
- Not for long lived pets. No snapshots or backup. Plan for reimage and use configuration management.
Cost scenarios
Focus on performance per pound, not raw cheapest.
Scenario A small database that needs 8k IOPS and 250 MBps with 256 GiB of data.
- Premium SSD v2 single disk set to 8,000 IOPS and 300 MBps.
- Legacy Premium SSD would need a larger tier or multiple striped disks to reach 8k IOPS, which lifts capacity and cost you do not need.
- Result fewer resources, simpler ops, lower £ per transaction.
Scenario B read heavy analytics window with daily peaks
- Keep v2 at a low baseline, then raise IOPS MBps for the 6 hour window using an automated az disk update in a scheduled job. You pay for the higher provisioned IOPS MBps only while configured.
How to estimate
- v2 monthly cost equals capacity charge plus IOPS rate times provisioned IOPS plus MBps rate times provisioned MBps. Rates vary by region. Use the workbook calculator link below to plug your figures in.
- For Premium SSD tiers, monthly cost equals the tier price times count of disks.
IaC and commands that matter
Create Premium SSD v2 data disk and attach
Azure CLI
# Create disk with explicit performance
az disk create -g $RG -n data-pv2 --size-gb 256 \
--sku PremiumV2_LRS --zone 1 \
--disk-iops-read-write 8000 \
--disk-mbps-read-write 300
# Attach to a VM at LUN 0, no host caching on v2
az vm disk attach -g $RG --vm-name $VM --name data-pv2 --lun 0 --caching None
# Change performance later without downtime
az disk update -g $RG -n data-pv2 \
--disk-iops-read-write 12000 \
--disk-mbps-read-write 500
PowerShell
$cfg = New-AzDiskConfig -Location $loc -Zone 1 -DiskSizeGB 256 \
-AccountType PremiumV2_LRS -DiskIOPSReadWrite 8000 -DiskMBpsReadWrite 300 -CreateOption Empty
New-AzDisk -ResourceGroupName $rg -DiskName "data-pv2" -Disk $cfg
$vm = Get-AzVM -ResourceGroupName $rg -Name $vmName
$disk = Get-AzDisk -ResourceGroupName $rg -Name "data-pv2"
$vm = Add-AzVMDataDisk -VM $vm -Lun 0 -ManagedDiskId $disk.Id -CreateOption Attach -Caching None
Update-AzVM -VM $vm -ResourceGroupName $rg
Terraform
resource "azurerm_managed_disk" "data_pv2" {
name = "data-pv2"
location = azurerm_resource_group.rg.location
resource_group_name = azurerm_resource_group.rg.name
storage_account_type = "PremiumV2_LRS"
disk_size_gb = 256
create_option = "Empty"
disk_iops_read_write = 8000
disk_mbps_read_write = 300
zone = 1
}
resource "azurerm_virtual_machine_data_disk_attachment" "vm_data" {
managed_disk_id = azurerm_managed_disk.data_pv2.id
virtual_machine_id = azurerm_linux_virtual_machine.vm.id
lun = 0
caching = "None" # v2 does not support host caching
}
Ephemeral OS disk on a scale set
Bicep
param location string
param vmssName string
param adminUser string
resource vmss 'Microsoft.Compute/virtualMachineScaleSets@2023-09-01' = {
name: vmssName
location: location
sku: {
name: 'Standard_D2ads_v6'
capacity: 3
tier: 'Standard'
}
zones: [ '1' ]
properties: {
upgradePolicy: { mode: 'Rolling' }
virtualMachineProfile: {
storageProfile: {
osDisk: {
createOption: 'FromImage'
diffDiskSettings: {
option: 'Local'
placement: 'Nvme'
}
}
dataDisks: [ ]
}
osProfile: {
computerNamePrefix: 'web'
adminUsername: adminUser
}
networkProfile: { }
}
}
}
Autoscale on disk pressure
CLI
# Create autoscale and rules for VMSS
az monitor autoscale create -g $RG --resource $VMSS_ID -n vmss-autoscale \
--min-count 2 --max-count 10 --count 3
# Scale out if VM Uncached IOPS is hot
az monitor autoscale rule create -g $RG --autoscale-name vmss-autoscale \
--condition "VM Uncached IOPS Consumed Percentage > 90 avg 10m" --scale out 1
# Scale in when cooled
az monitor autoscale rule create -g $RG --autoscale-name vmss-autoscale \
--condition "VM Uncached IOPS Consumed Percentage < 40 avg 15m" --scale in 1
Alerting for throttling
CLI
# Alert when a VM is near the disk IOPS cap
az monitor metrics alert create -g $RG -n disk-iops-near-cap \
--scopes $VM_ID \
--condition "avg Data Disk IOPS Consumed Percentage > 90" \
--window-size 5m --evaluation-frequency 1m \
--description "IOPS nearing provisioned limit" --action-group $AG_ID
# Alert on bandwidth capping as well
az monitor metrics alert create -g $RG -n disk-mbps-near-cap \
--scopes $VM_ID \
--condition "avg VM Uncached Bandwidth Consumed Percentage > 90" \
--window-size 5m --evaluation-frequency 1m --action-group $AG_ID
ARM metric alert
{
"$schema": "https://schema.management.azure.com/schemas/2019-04-01/deploymentTemplate.json#",
"contentVersion": "1.0.0.0",
"resources": [
{
"type": "Microsoft.Insights/metricAlerts",
"name": "disk-iops-near-cap",
"apiVersion": "2018-03-01",
"location": "global",
"properties": {
"severity": 2,
"enabled": true,
"scopes": ["/subscriptions/<sub>/resourceGroups/<rg>/providers/Microsoft.Compute/virtualMachines/<vm>"] ,
"evaluationFrequency": "PT1M",
"windowSize": "PT5M",
"criteria": {
"allOf": [
{
"metricName": "Data Disk IOPS Consumed Percentage",
"operator": "GreaterThan",
"timeAggregation": "Average",
"threshold": 90
}
]
}
}
}
]
}
Case study before and after
Context payments API on D2ads v5 VMs, Premium SSD OS disk and two P20 data disks. Traffic spikes during lunch and evenings. Users noticed timeouts.
- Before
- After
FinOps checklist
- Tag every disk with owner, environment, RPO RTO, and performance intent.
- Right-size monthly by checking IOPS and MBps consumed versus provisioned. Trim v2 settings when peaks pass.
- Prefer v2 for any workload with nonlinear performance needs. Keep OS on Premium SSD or Ephemeral OS disks.
- Reserve capacity for VMs where appropriate but keep disks flexible.
- Export metrics to Log Analytics and review weekly. Act on any chart that rides over 85 percent for more than 5 minutes.