Performance per Pound: Premium SSD v2, Ephemeral OS Disks, and Real-World Cost Optimization

by G.R Badhon

Goal Improve IO and cut costs without risk.

Benchmark methodology

Measure at three layers to avoid false conclusions: guest IO tool, VM metrics, and disk metrics.

  1. Linux fio
# Random read write 4k mix for OLTP
sudo fio \
  --name=randrw --filename=/mnt/data/testfile \
  --rw=randrw --rwmixread=70 \
  --bs=4k --iodepth=32 --numjobs=8 \
  --ioengine=libaio --runtime=120 --time_based --group_reporting 
  1. Windows DiskSpd
# 64k sequential read for backup or analytics
DiskSpd.exe -d120 -b64K -o32 -t8 -Sh -r -w0 C:\test\testfile.dat 
  1. Collect platform metrics
  • VM: Data Disk IOPS Consumed Percentage, VM Uncached Bandwidth Consumed Percentage, Queue Depth, Latency.
  • Disk: Read Ops per second, Write Ops per second, Bytes per second.

Sample metrics from a test run

Workload Disk type Size GiB IOPS Latency ms p99 Throughput MBps OLTP 4k 70r30w Premium SSD v2 256 8,200 2.1 190 OLAP 64k read Premium SSD v2 512 7,100 1.6 720 Boot and patch Ephemeral OS (NVMe placement) n a n a Boot 40 percent faster n a

Numbers above are illustrative from lab tests. Always validate on your SKU and region.

SKU comparison and what to use when

Capability Premium SSD v2 (data disk) Premium SSD (OS or data) Ultra Disk OS disk support No Yes No Host caching support No Yes No Max IOPS per disk Up to 80,000 Up to 20,000 Up to 160,000 Max MBps per disk Up to 1,200 Up to 900 Up to 2,000 Performance control Set capacity, IOPS, MBps independently Fixed tier by size, optional performance tiering on some sizes Set IOPS and MBps independently Cost model Capacity plus provisioned IOPS plus MBps Fixed by size tier, optional performance tier uplift Capacity plus provisioned IOPS plus MBps Best for Databases, queues, hot logs, game servers OS disks, general workloads on fixed tiers Extreme transactional or analytics peaks

Ephemeral OS disks

  • Store the OS on local SSD or NVMe of the VM. No remote managed OS disk. Ideal for stateless fleets that can be reimaged.
  • Not for long lived pets. No snapshots or backup. Plan for reimage and use configuration management.

Cost scenarios

Focus on performance per pound, not raw cheapest.

Scenario A small database that needs 8k IOPS and 250 MBps with 256 GiB of data.

  • Premium SSD v2 single disk set to 8,000 IOPS and 300 MBps.
  • Legacy Premium SSD would need a larger tier or multiple striped disks to reach 8k IOPS, which lifts capacity and cost you do not need.
  • Result fewer resources, simpler ops, lower £ per transaction.

Scenario B read heavy analytics window with daily peaks

  • Keep v2 at a low baseline, then raise IOPS MBps for the 6 hour window using an automated az disk update in a scheduled job. You pay for the higher provisioned IOPS MBps only while configured.

How to estimate

  • v2 monthly cost equals capacity charge plus IOPS rate times provisioned IOPS plus MBps rate times provisioned MBps. Rates vary by region. Use the workbook calculator link below to plug your figures in.
  • For Premium SSD tiers, monthly cost equals the tier price times count of disks.

IaC and commands that matter

Create Premium SSD v2 data disk and attach

Azure CLI

# Create disk with explicit performance
az disk create -g $RG -n data-pv2 --size-gb 256 \
  --sku PremiumV2_LRS --zone 1 \
  --disk-iops-read-write 8000 \
  --disk-mbps-read-write 300

# Attach to a VM at LUN 0, no host caching on v2
az vm disk attach -g $RG --vm-name $VM --name data-pv2 --lun 0 --caching None

# Change performance later without downtime
az disk update -g $RG -n data-pv2 \
  --disk-iops-read-write 12000 \
  --disk-mbps-read-write 500 

PowerShell

$cfg = New-AzDiskConfig -Location $loc -Zone 1 -DiskSizeGB 256 \
  -AccountType PremiumV2_LRS -DiskIOPSReadWrite 8000 -DiskMBpsReadWrite 300 -CreateOption Empty
New-AzDisk -ResourceGroupName $rg -DiskName "data-pv2" -Disk $cfg
$vm = Get-AzVM -ResourceGroupName $rg -Name $vmName
$disk = Get-AzDisk -ResourceGroupName $rg -Name "data-pv2"
$vm = Add-AzVMDataDisk -VM $vm -Lun 0 -ManagedDiskId $disk.Id -CreateOption Attach -Caching None
Update-AzVM -VM $vm -ResourceGroupName $rg 

Terraform

resource "azurerm_managed_disk" "data_pv2" {
  name                 = "data-pv2"
  location             = azurerm_resource_group.rg.location
  resource_group_name  = azurerm_resource_group.rg.name
  storage_account_type = "PremiumV2_LRS"
  disk_size_gb         = 256
  create_option        = "Empty"
  disk_iops_read_write = 8000
  disk_mbps_read_write = 300
  zone                 = 1
}

resource "azurerm_virtual_machine_data_disk_attachment" "vm_data" {
  managed_disk_id    = azurerm_managed_disk.data_pv2.id
  virtual_machine_id = azurerm_linux_virtual_machine.vm.id
  lun                = 0
  caching            = "None" # v2 does not support host caching
} 

Ephemeral OS disk on a scale set

Bicep

param location string
param vmssName string
param adminUser string

resource vmss 'Microsoft.Compute/virtualMachineScaleSets@2023-09-01' = {
  name: vmssName
  location: location
  sku: {
    name: 'Standard_D2ads_v6'
    capacity: 3
    tier: 'Standard'
  }
  zones: [ '1' ]
  properties: {
    upgradePolicy: { mode: 'Rolling' }
    virtualMachineProfile: {
      storageProfile: {
        osDisk: {
          createOption: 'FromImage'
          diffDiskSettings: {
            option: 'Local'
            placement: 'Nvme'
          }
        }
        dataDisks: [ ]
      }
      osProfile: {
        computerNamePrefix: 'web'
        adminUsername: adminUser
      }
      networkProfile: { }
    }
  }
} 

Autoscale on disk pressure

CLI

# Create autoscale and rules for VMSS
az monitor autoscale create -g $RG --resource $VMSS_ID -n vmss-autoscale \
  --min-count 2 --max-count 10 --count 3

# Scale out if VM Uncached IOPS is hot
az monitor autoscale rule create -g $RG --autoscale-name vmss-autoscale \
  --condition "VM Uncached IOPS Consumed Percentage > 90 avg 10m" --scale out 1

# Scale in when cooled
az monitor autoscale rule create -g $RG --autoscale-name vmss-autoscale \
  --condition "VM Uncached IOPS Consumed Percentage < 40 avg 15m" --scale in 1 

Alerting for throttling

CLI

# Alert when a VM is near the disk IOPS cap
az monitor metrics alert create -g $RG -n disk-iops-near-cap \
  --scopes $VM_ID \
  --condition "avg Data Disk IOPS Consumed Percentage > 90" \
  --window-size 5m --evaluation-frequency 1m \
  --description "IOPS nearing provisioned limit" --action-group $AG_ID

# Alert on bandwidth capping as well
az monitor metrics alert create -g $RG -n disk-mbps-near-cap \
  --scopes $VM_ID \
  --condition "avg VM Uncached Bandwidth Consumed Percentage > 90" \
  --window-size 5m --evaluation-frequency 1m --action-group $AG_ID 

ARM metric alert

{
  "$schema": "https://schema.management.azure.com/schemas/2019-04-01/deploymentTemplate.json#",
  "contentVersion": "1.0.0.0",
  "resources": [
    {
      "type": "Microsoft.Insights/metricAlerts",
      "name": "disk-iops-near-cap",
      "apiVersion": "2018-03-01",
      "location": "global",
      "properties": {
        "severity": 2,
        "enabled": true,
        "scopes": ["/subscriptions/<sub>/resourceGroups/<rg>/providers/Microsoft.Compute/virtualMachines/<vm>"] ,
        "evaluationFrequency": "PT1M",
        "windowSize": "PT5M",
        "criteria": {
          "allOf": [
            {
              "metricName": "Data Disk IOPS Consumed Percentage",
              "operator": "GreaterThan",
              "timeAggregation": "Average",
              "threshold": 90
            }
          ]
        }
      }
    }
  ]
} 

Case study before and after

Context payments API on D2ads v5 VMs, Premium SSD OS disk and two P20 data disks. Traffic spikes during lunch and evenings. Users noticed timeouts.

  • Before
  • After

FinOps checklist

  • Tag every disk with owner, environment, RPO RTO, and performance intent.
  • Right-size monthly by checking IOPS and MBps consumed versus provisioned. Trim v2 settings when peaks pass.
  • Prefer v2 for any workload with nonlinear performance needs. Keep OS on Premium SSD or Ephemeral OS disks.
  • Reserve capacity for VMs where appropriate but keep disks flexible.
  • Export metrics to Log Analytics and review weekly. Act on any chart that rides over 85 percent for more than 5 minutes.

You may also like