CloudRunbook | Practical Cloud Engineering
Azure Landing Zones: the practical guide (secure-by-default, engineer-friendly)
A hands-on walkthrough of Azure Landing Zones (ALZ): what they are, why they matter, and a runbook-style path to deploying a secure platform foundation.
A practical understanding of Azure Landing Zones (ALZ), the platform vs workload split, and a runbook-style path to deploying a secure foundation that scales.
- Design the management group tree, policy packs, and logging story once, then force every subscription through it.
- Use Terraform or Bicep for everything, including policy initiatives and diagnostic settings, so drift is reviewable.
- Day 0: enforce the minimum viable baseline; Day 30: tighten the deny set; quarterly: prove it still works.
- Do not enable every Defender/Policy toggle on day one—pick the controls you can respond to.
- Treat the platform like a product: backlog, releases, telemetry, and rollback.
Why landing zones still matter
If you inherited an Azure estate where every subscription looks different, RBAC is welded to people, and nobody can prove logs land anywhere useful, you have already met the debt a landing zone avoids. A landing zone is the operating model plus the technical scaffolding that makes every new workload trustworthy by default. If vending a subscription takes a service request and a best-effort checklist, you do not have one yet.
What good looks like
- One management group hierarchy with clear owners: tenant root →
Platform→LandingZones→ workload scopes. - Subscription vending pipeline that assigns RBAC, policy, and diagnostics automatically based on metadata.
- Guardrails split into deny (foot-guns), deployIfNotExists (baseline config), and audit (future enforcement).
- Shared networking/DNS patterns owned by platform, not retrofitted per workload.
- Logging + security operations with agreed routings for alerts, playbooks, and break-glass accounts.
- Runbooks for change: Day 0 baseline, Day 30 hardening, quarterly review with measured drift.
Baseline decision points
- Management group layout – Keep it shallow:
Tenant Root→Platformfor shared services,LandingZonesfor workloads, optionalSandboxfor experiments. - Subscription lifecycle – Who can request one, what metadata is mandatory, how long until retirement, and how vending is automated (GitHub Actions / Azure DevOps).
- Identity boundaries – Which groups map to which RBAC roles, how PIM and break-glass are handled, and where admin workstations sit.
- Policy scope – Decide which guardrails land at the
LandingZonesmanagement group and which are delegated to workload owners. Avoid “deny everything” until telemetry proves it is safe. - Networking + DNS – Hub and spoke vs vWAN, who owns private DNS zones, how Private Endpoint approvals work, and how shared firewalls are funded.
- Operations + alerts – Where Defender, Sentinel, and platform alerts go, who triages, and how exemptions expire.
Signal vs noise
- Enable now: cost guardrails, public network denies, default diagnostic settings, Defender for Cloud plans you have owners for (Servers, SQL, Storage).
- Delay until Day 30: advanced Defender plans (Containers, APIs) if the SOC is not ready; aggressive TLS mandates if legacy workloads still exist; network micro-segmentation before hub routing is proven.
- Probably never: auto-enabling extensions nobody operates (legacy Log Analytics agents), enforcing niche policies because “the template had them,” or enabling Defender plans without budget. Every alert without an owner is noise.
Phased rollout
- Day 0 (Baseline) – Deploy management groups, core policies (deny obvious foot-guns, deploy diagnostics), subscription vending automation, and central logging.
- Day 30 (Hardening) – Review alert volume, promote high-confidence audits to deny, enable the next set of Defender plans, and extend diagnostics to tier-two services.
- Quarterly (Operate) – Measure drift (policy compliance, RBAC, diagnostics), prune unused subscriptions, update the initiative versions, and review exemptions.
Runbook: build an Azure landing zone you can scale
Landing zone runbook
- Map the governance skeleton
Document the management group hierarchy, owner per scope, and where policy/role assignments will live. Keep the diagram in version control so reviews have a source of truth.
Use Terraform to pin the structure. Replace the IDs below with your tenant IDs and display names.
- Choose IaC rails (Terraform and Bicep)
Standardise on Terraform for hierarchy / vending and Bicep for platform-native teams who prefer Azure deployments. Everything goes through pull requests. No portal drift.
- Put privileged access on rails
Separate the platform admin identity from workload owners, enforce PIM, and document break-glass usage. Do not issue Owner at tenant root.
- Set guardrails (deny, deploy, audit)
Build initiatives for each landing zone persona (platform, mission-critical, sandbox). Start with denies for public IP foot-guns, deployIfNotExists to push diagnostic settings, and audits for upcoming change.
- Standardise networking and DNS
Pick hub-and-spoke or vWAN, decide who approves Private Endpoints, and publish the DNS ownership model. Without it, Private Link will be a 02:00 call-out.
- Make logging non-negotiable
Push subscription activity and resource diagnostics to a central workspace, enforce RBAC for the operations team, and write down who owns KQL.
- Automate subscription vending
The pipeline must: create the subscription, move it under the right management group, assign RBAC via groups, attach policy initiative, and configure diagnostics. Fail the build if any step is skipped.
- Treat the landing zone as a product
Maintain a backlog, change log, and release notes. Pilots first, then progressive rollout via the vending pipeline. Track exemptions with expiry dates.
Before running the subscription vending pipeline, apply this Terraform snippet to create management groups and a dedicated logging workspace scope. Update the parent_management_group_id and role assignment object IDs to match your tenant.
resource "azurerm_management_group" "platform" {
display_name = "Platform"
name = "mg-platform"
}
resource "azurerm_management_group" "landingzones" {
display_name = "LandingZones"
name = "mg-landingzones"
parent_management_group_id = azurerm_management_group.platform.id
}
resource "azurerm_log_analytics_workspace" "landing" {
name = "lz-law-plat"
location = "uksouth"
resource_group_name = "rg-operations-core"
sku = "PerGB2018"
retention_in_days = 30
}
resource "azurerm_role_assignment" "log_contributor" {
scope = azurerm_log_analytics_workspace.landing.id
role_definition_name = "Log Analytics Contributor"
principal_id = var.platform_team_object_id
}When creating initiatives, keep parameters lightweight so they can be reused. This Bicep skeleton creates an initiative plus an assignment at the LandingZones management group. Edit the policy definition references to match your catalogue.
param landingZonesMgId string
var policySet = {
name: 'alz-lz-guardrails'
displayName: 'Landing Zone Guardrails'
description: 'Baseline deny + deploy rules for workload subscriptions.'
policyDefinitions: [
{
policyDefinitionReferenceId: 'deny-public-ip'
policyDefinitionId: '/providers/Microsoft.Authorization/policyDefinitions/your-deny-id'
parameters: {}
}
{
policyDefinitionReferenceId: 'deploy-diagnostics'
policyDefinitionId: '/providers/Microsoft.Authorization/policyDefinitions/your-deploy-id'
parameters: {
logAnalytics_: {
value: '/subscriptions/${subscription().subscriptionId}/resourceGroups/rg-operations-core/providers/Microsoft.OperationalInsights/workspaces/lz-law-plat'
}
}
}
]
}
resource initiative 'Microsoft.Authorization/policySetDefinitions@2021-06-01' = {
name: policySet.name
properties: policySet
}
resource assignment 'Microsoft.Authorization/policyAssignments@2021-06-01' = {
name: 'alz-lz-guardrails'
properties: {
displayName: 'Apply baseline guardrails to LandingZones'
policyDefinitionId: initiative.id
scope: landingZonesMgId
}
}Use the Azure CLI to verify every subscription emits diagnostics to your workspace. Replace the subscription and workspace names before running.
az account set --subscription "<workload-subscription>"
az monitor diagnostic-settings list --resource "/subscriptions/<workload-subscription>" \
--query "[?name=='send-to-law'].*"Validation checks
Use these KQL queries and governance checks to prove the landing zone is behaving.
Before running the KQL, swap the workspace name for your central Log Analytics instance. The query lists subscriptions that have not streamed activity logs in the last 24 hours.
AzureActivity
| summarize count() by SubscriptionId, bin(TimeGenerated, 1d)
| where TimeGenerated < ago(1d)Validation checks
- ✓
All subscriptions sit under the intended management group with no manual assignments.
- ✓
Policy initiative compliance is ≥95% and exemptions have an expiry date.
- ✓
Log Analytics workspace receives Activity, Resource, and Defender signals within 5 minutes of change.
- ✓
Defender plans enabled match the documented matrix (Servers, SQL, Storage at Day 0; Containers from Day 30).
- ✓
Subscription vending pipeline output includes RBAC, policy assignment, and diagnostics confirmations.
Common pitfalls
Landing zones are an operating model. If you do not plan for updates and drift control, your guardrails will decay.
Private connectivity is powerful, but it requires a clear DNS pattern and ownership model.
Over-modeling the org in management groups and policies makes governance brittle. Start simple, evolve intentionally.
Start with denies for clear foot-guns. Audit the rest, then tighten based on evidence.
Engineers tweaking RBAC/policy manually will undo your work. Block portal edits unless through break-glass.
Enabling every plan without owners creates noise and cost. Start with the ones you can respond to.
If teams cannot see what the platform changed last week, they bypass it. Publish release notes.
Rollback / back-out plan
- Policy releases – disable or reassign the offending policy assignment; leave the definition in place to avoid referencing issues. Remediate the resources separately.
- Vending pipeline failures – subscriptions created during failed runs should be moved to a quarantine management group, stripped of RBAC, then reprocessed once the pipeline is fixed.
- Networking/DNS rollbacks – revert to the previous hub configuration through IaC; publish the change window and flush Private DNS zones after validation.
- Logging changes – if diagnostic settings cause duplicate ingestion cost, roll back the deployIfNotExists policy assignment and reapply once the policy definition is corrected.
Document every rollback in the platform change log and include the date you expect to re-attempt the change. Landing zones are only “done” when you can repeat them safely.
Validation checks (what good looks like)
- ✓
A new subscription can be vended via pipeline and is compliant on day 0 (RBAC, policy, logging).
- ✓
Platform subscriptions have tighter controls than workload subscriptions.
- ✓
Central logging exists and core platform signals flow into it.
- ✓
Policy assignments are versioned and updated through Git.
- ✓
Networking has a defined standard and DNS has a clear owner.
- ✓
There is a process and cadence to keep the landing zone current.
A landing zone is how you make “secure by default” real. The fastest teams are not the ones with no controls. They are the ones with standard controls that do not require constant negotiation.