CloudRunbook | Practical Cloud Engineering
Defender for Cloud baseline: plan coverage, auto-provisioning, and turning recommendations into guardrails
A practical Defender for Cloud baseline for Azure landing zones: plan coverage decisions, auto-provisioning, and closing the loop with policy-backed guardrails.
- Treat Defender for Cloud as a platform service: plans, agents, contacts, and policies are all IaC-driven.
- Day 0: enable the mandatory plans, auto-provision AMA/agentless scanning, route telemetry to the agreed workspaces.
- Day 30: promote noisy “audit” recommendations to policy, integrate alerts with SOC tooling, prune exemptions.
- Quarterly: review plan drift, auto-provisioning toggles, cost dashboards, and KQL reports; redeploy via pipeline.
- Portal is the dashboard, not the control plane. Everything that matters lives in Git.
Why you should care
Defender for Cloud is part signal, part configuration drift detector. Without a baseline:
- workloads inherit random plan coverage
- recommendations never feed back into policy
- auto-provisioning toggles are left to subscription owners
- breaches go undetected because telemetry never lands in a Log Analytics workspace
A tidy baseline gives you predictable coverage, standard deployment of agents, and an audit trail you can trust when auditors or responders come calling.
What good looks like
- Plans – Servers, SQL, Storage, Containers, Key Vault, and CSPM enabled at every subscription via IaC.
- Auto-provisioning – AMA and agentless scanning on, legacy Log Analytics agent off, with notifications stored in source control.
- Telemetry – Diagnostic settings push all Defender data to the correct Log Analytics workspace(s) with named owners.
- Guardrails – Top recurring recommendations converted into initiatives and enforced at the LandingZones management group.
- Operations – SOC receives Defender alerts, exemptions have expiry dates, and cost dashboards show who pays.
Baseline decision points
- Plan coverage granularity – Decide which plans are mandatory vs optional. Tie costs to subscription metadata (env/criticality).
- Telemetry landing zone – Document the workspace hierarchy (e.g.
law-security-prod,law-security-nonprod) and retention. - Auto-provisioning scope – Agentless scanning, AMA, and Defender CSPM agents must be enforced centrally, not toggled by workloads.
- Recommendation routing – Assign owners per recommendation category (platform, database, SOC). Portal comments do not count.
- Guardrail feedback loop – Pick the “always ignored” recommendations and codify them as policy initiatives with deployIfNotExists/deny.
- Exemptions – Define request, approval, expiry, and review cadence. Include justification in source control.
- Cost transparency – Produce monthly dashboards so FinOps knows which subscription pays for which plan.
Signal vs noise
- Enable now: Defender plans you can operate (Servers, SQL, Storage, Key Vault, Containers), AMA, agentless scanning, security contacts, diagnostic routing.
- Enable at Day 30: Advanced plans (APIM, Kubernetes data plane), auto-remediation policies that need SOC sign-off, integration with ticketing.
- Probably never: Enabling every plan “for coverage” without owners, leaving legacy MMA agents running forever, or turning on continuous export of every alert without throttling.
Phased rollout
- Day 0 baseline – Define plans, auto-provisioning, workspaces, contacts, and policy assignments in Git. Deploy to pilot subscriptions first.
- Day 30 hardening – Review alerts, turn recurring findings into policy, ensure SOC routing works, prune exemptions.
- Quarterly review – Use KQL to confirm coverage, audit auto-provisioning toggles, refresh cost dashboards, and test rollback.
Runbook: Defender for Cloud baseline that sticks
- Inventory scope and workspaces
List every subscription, note the tenant/management group they sit under, and record the prod/non-prod workspaces they should write to. Capture workspace IDs—you will reuse them in Terraform and policy.
- Enable plan coverage via IaC
Use Terraform/Bicep to enable the required Defender plans per subscription. Do not rely on portal toggles or screenshots.
- Lock auto-provisioning
Turn on AMA, agentless scanning, and CSPM agents centrally. Disable legacy agents if you are migrating. Store the configuration in Git and redeploy on every new subscription.
- Route telemetry + contacts
Apply diagnostic settings and security contacts so telemetry flows into the correct workspaces and alerts reach the SOC/platform inboxes.
- Convert noise into policy
Choose the top recommendations (e.g., public Storage, SQL TDE) and create policy initiatives that either deploy fixes or deny non-compliant resources.
- Operationalise alerts
Integrate Defender alerts with Sentinel/ticketing/chat, create runbooks for frequent alerts, and document the exemption workflow.
- Review + cost
Nightly or monthly scripts should confirm coverage; dashboards should show cost per subscription. Raise drift issues via backlog, not chat.
Before you enable plans en masse, apply this Terraform snippet per subscription (or through a module) to turn on the key Defender plans. Replace the subscription ID variable as needed.
resource "azurerm_security_center_subscription_pricing" "plans" {
for_each = toset([
"VirtualMachines",
"SqlServers",
"StorageAccounts",
"KeyVaults",
"Containers"
])
tier = "Standard"
resource_type = each.value
}Lock auto-provisioning with Terraform as well. Update the name values if Microsoft adds more toggles.
resource "azurerm_security_center_auto_provisioning" "ama" {
auto_provision = "On"
}
resource "azurerm_security_center_setting" "agentless" {
name = "MVAgentlessScanning"
value = "On"
}Apply this Bicep initiative template to codify recommendations you do not want to see again. Swap in your policy definition IDs.
param mgScope string
var recommendationSet = {
name: 'defender-guardrails'
displayName: 'Defender Guardrails'
policyDefinitions: [
{
policyDefinitionReferenceId: 'deny-public-storage'
policyDefinitionId: '/providers/Microsoft.Authorization/policyDefinitions/<storage-policy>'
}
{
policyDefinitionReferenceId: 'deploy-sql-tde'
policyDefinitionId: '/providers/Microsoft.Authorization/policyDefinitions/<sql-policy>'
}
]
}
resource initiative 'Microsoft.Authorization/policySetDefinitions@2021-06-01' = {
name: recommendationSet.name
properties: recommendationSet
}
resource assignment 'Microsoft.Authorization/policyAssignments@2021-06-01' = {
name: 'defender-guardrails'
properties: {
displayName: 'Enforce Defender recommendations'
policyDefinitionId: initiative.id
scope: mgScope
}
}Use the Azure CLI to confirm auto-provisioning and plan coverage before you exit the change window. Replace the subscription ID with the one under test.
az account set --subscription "<subscription-id>"
az security pricing list --query "[].{plan:name,tier:pricingTier}"
az security auto-provisioning-setting list --query "[].{name:name,setting:autoProvision}"Validation checks
The following KQL highlights subscriptions where the expected Defender plan is missing telemetry in the last 24 hours. Replace the workspace name before use.
Heartbeat
| where Category == "Security"
| summarize lastSeen = max(TimeGenerated) by SubscriptionId
| where lastSeen < ago(1d)Validation checklist
- ✓ Every subscription shows the expected Defender plans enabled via IaC state.
- ✓ Auto-provisioning toggles (AMA, agentless scanning) match the documented baseline.
- ✓ Security contacts and notification channels receive Defender alerts during testing.
- ✓ Diagnostic settings push Defender data into the agreed Log Analytics workspace(s).
- ✓ Policy initiatives exist for the high-impact recommendations you care about.
- ✓ Exemptions are tracked with owners, expiry dates, and an audit trail.
- ✓ Operations teams confirm Defender alerts reach their tooling and have clear response steps.
- ✓ Cost dashboards or reports show Defender charges per subscription.
- ✓ The quarterly drift review is scheduled and the owner is documented.
Common pitfalls
Coverage drifts within days. Lock every plan via Terraform/Bicep and audit nightly.
Sprawling workspaces kill queries and double retention bills. Use a hub-and-spoke or prod/non-prod split.
If Defender nags about the same thing twice, codify it as policy. Noise erodes trust.
Temporary waivers become permanent blind spots. Force expiry dates and justification.
Defender emails without SOC integration are just spam. Wire them into Sentinel/ticketing with owners.
Validation checklist
- ✓ Every target subscription shows the expected Defender plans enabled via IaC state.
- ✓ Auto-provisioning toggles match the documented baseline (AMA, agentless scanning, etc).
- ✓ Security contacts and notification channels receive Defender alerts during testing.
- ✓ Diagnostic settings push Defender data into the agreed Log Analytics workspace(s).
- ✓ Policy initiatives exist for the high-impact recommendations you care about.
- ✓ Exemptions are tracked with owners, expiry dates, and an audit trail.
- ✓ Operations teams confirm Defender alerts reach their tooling and have clear response steps.
- ✓ Cost dashboards or reports show Defender charges per subscription.
- ✓ The quarterly drift review is scheduled and the owner is documented.
Rollback / back-out plan
- Plan rollback – revert the Terraform/Bicep change, redeploy the previous pricing tier, and document any alert gaps created.
- Auto-provisioning – switch off the setting in IaC, run remediation tasks to uninstall agents/extensions, and confirm via CLI.
- Telemetry routing – reapply the previous diagnostic settings template; be aware of ingestion costs when duplicating streams.
- Policy guardrails – disable the assignment instead of deleting definitions, fix parameters offline, then redeploy to a pilot scope.
Rollback is partial—telemetry already ingested stays, and automatic remediations may have mutated resources. Log every rollback in the platform change record.