Microsoft 365 makers build apps and flows that people rely on every day. This guide shows how to get full end‑to‑end visibility using Power Platform Monitor for live debugging and Azure Application Insights for durable telemetry, dashboards, and alerts.
What to capture every time
Capture a compact, consistent envelope so you can correlate everything later: request id (requests.id), correlation id (operation_Id, plus operation_ParentId where present), latency (duration), result code (resultCode or success), user (user_Id, user_AuthenticatedId), session (session_Id), environmentId, resourceId (flow id), signalCategory (for flows: runs, triggers, actions), operation_Name (flow display name or action name), and a small customDimensions record for business context like customer, feature, or screen.
Where data lands in Application Insights
For Power Automate, cloud flow runs arrive in the requests table and triggers/actions arrive in dependencies. For model‑driven apps and Dataverse, you get standard App Insights schema such as requests, dependencies, pageViews, and exceptions, with common fields like operation_Id, user_Id, and session_Id. Most Power Platform metadata is in customDimensions (for example environmentId, resourceId, signalCategory).
Monitor for live diagnostics (makers)
Canvas apps: in Power Apps Studio open Advanced tools → Open live monitor and play the app to stream events. Model‑driven apps: open the app and choose Monitor from the command bar or append &monitor=true to the app URL. Share the Monitor session to pair with a teammate. Filter by categories such as network or formula evaluation, drill into long operations, and export a snapshot for an incident ticket.
Add structured traces in Canvas apps so Monitor and Application Insights see the same context:
// Example: trace a checkout failure with business context
Trace(
"CheckoutFailed",
TraceSeverity.Error,
{
CorrelationId: GUID(),
Screen: App.ActiveScreen.Name,
CartTotal: Sum(colCart, Price),
UserUPN: User().Email,
ErrorCode: "PAYMENT_TIMEOUT"
}
)
If the app is allowed to send telemetry to Application Insights, these traces also appear in your App Insights logs with the fields under customDimensions.
Wire up end‑to‑end export
In the Power Platform admin center create a Data export → App Insights connection for your environment and select the sources to export (Dataverse diagnostics and performance, and Power Automate runs, triggers, actions). Map it to a workspace‑based Application Insights resource that your team owns. Export is supported for Managed Environments and delivers within a target window rather than instant streaming, so plan for up to a day for initial population. Validate by querying the requests and dependencies tables for your environmentId.
KQL you will actually use
1) Errors by flow, user impact, and code
let env = "<environmentId>";
requests
| where timestamp > ago(7d)
| where tostring(customDimensions['resourceProvider']) == 'Cloud Flow'
| where tostring(customDimensions['signalCategory']) == 'Cloud flow runs'
| where tostring(customDimensions['environmentId']) == env
| where success == false or toint(resultCode) >= 400
| summarize errors = count(), users = dcount(user_Id)
by operation_Name, flowId = tostring(customDimensions['resourceId']), resultCode
| order by errors desc
For action or trigger failures, swap to dependencies and set signalCategory to Cloud flow actions or Cloud flow triggers.
2) Slow runs versus SLO (P95)
let env = "<environmentId>";
let slo = 5s; // change to your target P95
requests
| where timestamp > ago(7d)
| where tostring(customDimensions['resourceProvider']) == 'Cloud Flow'
| where tostring(customDimensions['environmentId']) == env
| summarize p95 = percentile(duration, 95), avg = avg(duration)
by operation_Name, flowId = tostring(customDimensions['resourceId'])
| where p95 > slo
| order by p95 desc
3) Usage trends and active users
let env = "<environmentId>";
requests
| where timestamp > ago(30d)
| where tostring(customDimensions['resourceProvider']) == 'Cloud Flow'
| where tostring(customDimensions['environmentId']) == env
| summarize runs = count(), users = dcount(user_Id)
by day = bin(timestamp, 1d), flow = operation_Name
| order by day asc
For model‑driven or custom pages, use pageViews for screen usage and exceptions to sample full stack traces.
A workbook layout that works
Use one workbook per environment. Top row: tiles for Error rate (last 24h), P95 run duration (last 24h), Users impacted (last 24h). Middle: left table of Top failing flows using the error query; right table of Slowest flows vs SLO using the P95 query. Bottom: time charts for Runs and unique users by day and a sample grid of Latest exceptions joined with operation_Id so on‑call engineers can pivot from an exception to the related run and dependencies.
Alerting and SLOs
Define simple service objectives first, then wire alerts that point to action groups.
SLO examples. Weekly success rate ≥ 99.5% for production flows. P95 run duration ≤ 5 s for user‑interactive flows, ≤ 30 s for back‑office flows. Trigger‑to‑first‑action P95 ≤ 3 s for critical event‑driven automations.
Alert rules. Failed Server requests (flows) rate above 1% for 15 minutes; Dependency call failures above 2% for 15 minutes; custom log search where percentile(duration,95) > slo for 3 consecutive checks; single‑flow hard failure: more than 5 failed runs in 15 minutes. Route to an Action Group that sends Teams, email, and triggers a webhook to a Power Automate “Notify On‑Call” flow.
Escalation matrix. Sev2: on‑call maker or flow owner within 15 minutes; Sev1: platform admin within 15 minutes and product owner within 30 minutes; Major incident: after 60 minutes without recovery, add duty manager and customer comms lead. Record every incident link to the workbook query that shows impact and time‑to‑detect.
Deployment steps
Create a workspace‑based Application Insights resource and grant Contributor to the platform team. In Power Platform admin center, add Data export → App Insights for the environment and select Dataverse and Power Automate sources. Wait for initial data, then import the workbook below. Create alert rules for the four conditions described above and bind to an action group. In Canvas apps, add targeted Trace() calls at important boundaries such as screen load, save, and external API calls. In model‑driven custom code, use the platform’s telemetry integration or plugin ILogger for any bespoke steps. Bake environment ids and flow ids into tags or dimensions when you generate custom telemetry so filters remain simple.
Cost considerations
Application Insights is billed per GB ingested and by retention. Control cost by defaulting to sampling at the App Insights resource, limiting high‑cardinality fields in customDimensions, excluding non‑production environments from noisy alerts, and setting a daily cap with notifications. Dashboards are free; metric and log alerts incur small charges per rule and per evaluation period. The export feed is not a guaranteed real‑time stream; plan operational responses accordingly and keep transactional truth in the Flow run history when investigating gaps.
Workbook template and CTA
Copy the JSON below and import it in the Azure portal under Monitor → Workbooks → New → Advanced editor → Gallery template. Save it in a shared resource group. Then tailor the environmentId, time ranges, and SLO threshold.
{
"$schema": "https://github.com/Microsoft/Application-Insights-Workbooks/blob/master/schema/workbook.json",
"version": "Notebook/1.0",
"items": [
{
"type": 1,
"content": {
"json": "# Power Platform – Flow Health\nThis workbook shows error rate, P95 duration, users impacted, and trends."
}
},
{
"type": 9,
"content": {
"version": "KqlItem/1.0",
"query": "let env='YOUR-ENV-ID'; requests | where timestamp > ago(24h) | where tostring(customDimensions['resourceProvider'])=='Cloud Flow' and tostring(customDimensions['environmentId'])==env | summarize errors= countif(success==false or toint(resultCode)>=400), total=count() | project ErrorRate= todouble(errors)/todouble(total)",
"size": 1,
"visualization": "tiles",
"tileSettings": {"titleContent": {"columnMatch":"ErrorRate","formatter":"percentage","formatOptions":{"maximumFractionDigits":2}}}
}
},
{
"type": 9,
"content": {
"version": "KqlItem/1.0",
"query": "let env='YOUR-ENV-ID'; let slo=5s; requests | where timestamp > ago(24h) | where tostring(customDimensions['resourceProvider'])=='Cloud Flow' and tostring(customDimensions['environmentId'])==env | summarize p95=percentile(duration,95) | project P95Seconds=todouble(p95)/1000.0",
"size": 1,
"visualization": "tiles"
}
},
{
"type": 9,
"content": {
"version": "KqlItem/1.0",
"query": "let env='YOUR-ENV-ID'; requests | where timestamp > ago(24h) | where tostring(customDimensions['resourceProvider'])=='Cloud Flow' and tostring(customDimensions['environmentId'])==env | summarize UsersImpacted=dcount(user_Id)",
"size": 1,
"visualization": "tiles"
}
},
{
"type": 9,
"content": {
"version": "KqlItem/1.0",
"query": "let env='YOUR-ENV-ID'; requests | where timestamp > ago(7d) | where tostring(customDimensions['resourceProvider'])=='Cloud Flow' and tostring(customDimensions['environmentId'])==env and (success==false or toint(resultCode)>=400) | summarize errors=count(), users=dcount(user_Id) by operation_Name, flowId=tostring(customDimensions['resourceId']), resultCode | order by errors desc",
"size": 2,
"visualization": "table"
}
},
{
"type": 9,
"content": {
"version": "KqlItem/1.0",
"query": "let env='YOUR-ENV-ID'; let slo=5s; requests | where timestamp > ago(7d) | where tostring(customDimensions['resourceProvider'])=='Cloud Flow' and tostring(customDimensions['environmentId'])==env | summarize p95=percentile(duration,95), avg=avg(duration) by operation_Name, flowId=tostring(customDimensions['resourceId']) | where p95 > slo | order by p95 desc",
"size": 2,
"visualization": "table"
}
},
{
"type": 9,
"content": {
"version": "KqlItem/1.0",
"query": "let env='YOUR-ENV-ID'; requests | where timestamp > ago(30d) | where tostring(customDimensions['resourceProvider'])=='Cloud Flow' and tostring(customDimensions['environmentId'])==env | summarize runs=count(), users=dcount(user_Id) by day=bin(timestamp,1d) | order by day asc",
"size": 2,
"visualization": "timechart"
}
}
]
}