CloudRunbook | Practical Cloud Engineering
Secure Azure networking baseline: a practical foundation for landing zones
A runbook-style secure networking baseline for Azure: hub/spoke vs vWAN, DNS ownership, private endpoints, egress control, and inbound protection. Built to scale.
Secure Azure networking baseline: a practical foundation for landing zones
- Pick one topology (hub/spoke or vWAN) and document the contract: address space, DNS, egress, support model.
- Keep DNS, firewalls, and Private Endpoints in central subscriptions; workloads consume, they don’t invent.
- Day 0: deploy hubs, NAT/firewall, DNS resolver, and policy guardrails. Day 30: tighten denies and logging. Quarterly: review flow logs + exemptions.
- Collect flow logs, firewall logs, and DNS logs centrally or you can’t answer basic incident questions.
- Do not enable every Defender/WAF/Firewall SKU “just in case” unless someone owns the alerts and cost.
Why networking is the landing zone multiplier
- Connectivity decisions touch every workload, so bad ones scale faster than any policy.
- DNS and Private Endpoint ownership determines whether data services stay available.
- Egress and firewall controls are where real risk and cost live; without standards every team builds their own.
- A consistent network makes subscription vending predictable: NSGs, diagnostics, and routes arrive already compliant.
What good looks like
- One hub per region (or vWAN hub) owned by the platform team, with well-known subnets, firewall/NAT, and DNS resolvers.
- Private DNS zones centralised, linked via automation, with platform ownership and change control.
- Policy guardrails that block public exposure, force NSG baselines, and link spokes to approved hubs.
- Subscription vending that wires new spokes to the right hub, assigns NSG baselines, and ensures diagnostics are enabled.
- Operational runbooks for private endpoint onboarding, egress change requests, and incident response.
Baseline decision points
- Topology – Hub/spoke is simpler and suits most estates; vWAN only when you truly need global managed transit. Pick one and stick to it.
- IP plan – Reserve blocks per region, per environment, and per hub. Avoid overlaps if you have hybrid aspirations.
- DNS ownership – Decide where private zones live, who approves record changes, and how resolver rules are deployed.
- Egress strategy – NAT Gateway for simple outbound, Azure Firewall when you need filtering/logging, proxy only if compliance forces it.
- Inbound pattern – Front Door or Application Gateway with WAF; no ad-hoc public IPs.
- Segmentation – Subscriptions, VNets, NSGs. Decide how prod vs non-prod are separated and who can create virtual networks.
- Logging + monitoring – NSG flow logs, Azure Firewall logs, DNS logs all land in a central workspace with retention defined.
Signal vs noise
- Enable now: Controlled egress, NSG baselines, DNS logging, Defender for Cloud plan for networking (if you act on it), Azure Firewall logging.
- Enable later (Day 30): Micro-segmentation policies, advanced DDoS plans, Front Door Premium everywhere—only after Day 0 telemetry is stable.
- Maybe never: Full IDS/IPS on every workload subnet without owners, enabling every firewall SKU “for completeness,” or multi-layer proxies when a single firewall would do. Every log without an incident playbook is noise.
Phased rollout
- Day 0 baseline – Deploy regional hubs, NAT/firewall, DNS resolver, Private DNS zones, telemetry routing, and guardrail policies. Onboard a pilot workload.
- Day 30 hardening – Promote audit policies to deny, enable traffic analytics, integrate firewall alerts with SOC tooling, and validate DNS/Private Endpoint processes.
- Quarterly review – Run flow-log reviews, check Private Endpoint inventory vs DNS zones, renew firewall policies, and prune unused VNets/subscriptions.
Topology trade-offs
- Hub/spoke – Great for single/few regions, centralised control, and simpler routing. Works well for most landing zones.
- vWAN – Use only if you need multiple regions/branches, dynamic routing to SD-WAN/ExpressRoute, or simplified global connectivity.
Letting teams pick their own topology is how you end up debugging overlapping address spaces at 3am.
Runbook: secure networking baseline
Secure networking runbook
- Publish the network contract
Document which topology you support, what an application team gets by default, how DNS/Private Endpoints work, and where requests go. Keep it to one page and version it.
- Build the regional hub
Use Terraform or Bicep to create the hub vNet, firewall/NAT, DNS resolver, logging workspace, and any shared bastion. Apply route tables and diagnostics upfront.
- Define spoke onboarding
Automate VNet creation, peering to the hub, NSG/ASG assignments, and Private Endpoint subnets. Subscription vending should trigger this flow.
- Standardise DNS
Store private DNS zones in the platform subscription, link them to spokes automatically, and document the change request path. If Private Endpoint requests arrive before DNS is ready, delay the workload.
- Control egress
Decide whether NAT Gateway or Azure Firewall handles outbound, then configure route tables and diagnostics so every spoke flows through it.
- Collect logs + alerts
Enable NSG flow logs, firewall diagnostics, and DNS logs to the same Log Analytics workspace. Wire alerts (high deny rates, unusual DNS lookups) into your SOC tooling.
- Keep it current
Track changes in source control, pilot major updates, and review exemptions quarterly. Networking is never “done,” but it should be boring.
Before you wire the vending pipeline, lay out the hub using Terraform. Swap the address prefixes, resource group, and firewall SKU for your estate.
resource "azurerm_resource_group" "hub" {
name = "rg-network-hub-uks"
location = "uksouth"
}
resource "azurerm_virtual_network" "hub" {
name = "vnet-hub-uks"
address_space = ["10.100.0.0/16"]
location = azurerm_resource_group.hub.location
resource_group_name = azurerm_resource_group.hub.name
}
resource "azurerm_subnet" "firewall" {
name = "AzureFirewallSubnet"
resource_group_name = azurerm_resource_group.hub.name
virtual_network_name = azurerm_virtual_network.hub.name
address_prefixes = ["10.100.1.0/26"]
}
resource "azurerm_firewall" "hub" {
name = "fw-hub-uks"
location = azurerm_resource_group.hub.location
resource_group_name = azurerm_resource_group.hub.name
sku_name = "AZFW_VNet"
sku_tier = "Standard"
ip_configuration {
name = "fw-ipconfig"
subnet_id = azurerm_subnet.firewall.id
public_ip_address_id = azurerm_public_ip.fw.id
}
}Policy guardrails
Use Bicep to enforce that all spokes peer to the approved hub and use standard route tables. Update the policy definition IDs for your environment.
param mgScope string
param hubResourceId string
var policySet = {
name: 'alz-network-guardrails'
displayName: 'Landing Zone Network Guardrails'
description: 'Force spokes to peer to approved hubs and send diagnostics.'
policyDefinitions: [
{
policyDefinitionReferenceId: 'require-hub-peering'
policyDefinitionId: '/providers/Microsoft.Authorization/policyDefinitions/<policy-id-peer>'
parameters: {
hubVnetId: { value: hubResourceId }
}
}
{
policyDefinitionReferenceId: 'deploy-nsg-diagnostics'
policyDefinitionId: '/providers/Microsoft.Authorization/policyDefinitions/<policy-id-nsg>'
parameters: {}
}
]
}
resource initiative 'Microsoft.Authorization/policySetDefinitions@2021-06-01' = {
name: policySet.name
properties: policySet
}
resource assignment 'Microsoft.Authorization/policyAssignments@2021-06-01' = {
name: 'alz-network-guardrails'
properties: {
displayName: 'Apply network guardrails to LandingZones'
policyDefinitionId: initiative.id
scope: mgScope
}
}Validation checks
Audit routing, DNS, and flow logs regularly. This KQL snippet highlights spokes that have not produced NSG flow logs in 24 hours—swap the workspace name before running.
AzureNetworkAnalytics_CL
| summarize count() by VirtualNetwork_s, bin(TimeGenerated, 1d)
| where TimeGenerated < ago(1d)Validation checks
- ✓
All spokes are peered to an approved hub and inherit the correct route tables.
- ✓
Private DNS zones are linked to active spokes; no dangling links from retired subscriptions.
- ✓
NSG flow logs and Azure Firewall logs arrive within 5 minutes of traffic.
- ✓
Defender for Cloud network alerts are triaged within the platform/SOC SLA.
- ✓
Subscription vending emits proof of DNS linking, NSG assignment, and diagnostics on completion.
Common pitfalls
Private Endpoints fail silently when every team owns its own zone. Keep zones central and automate linking.
Each bespoke firewall burns budget and creates conflicting rule sets. Centralise unless regulation forces otherwise.
Flow logs and firewall logs labelled “later” never arrive. Enable them with the hub and bake into IaC.
Manual peering with mismatched route tables causes asymmetric routing. Force automation through vending.
Advanced Defender/WAF plans without SOC routing are just expensive noise. Wire alerts into tooling with clear owners.
Rollback / back-out plan
- Failed policy rollout – Disable the specific assignment, not the entire initiative. Fix definition parameters, re-run on a pilot subscription, then reapply.
- Firewall change gone wrong – Revert the IaC change, redeploy the previous version, and document the blocked traffic. Keep last-known-good configs in source control.
- DNS misconfiguration – Remove the faulty Private Endpoint link, flush caches, and validate before re-linking. Document the change window to avoid surprise outages.
- Spoke onboarding failure – Move the subscription to a quarantine management group, delete partially created VNets, and rerun the pipeline after the fix.
Networking baselines succeed when they are boring to operate. Make decisions once, automate them ruthlessly, collect the logs, and review them often.
If workloads can egress however they like, you lose control of data exfil paths and create inconsistent troubleshooting.
If public IPs are easy to create, they will be created. Use policy guardrails and standard ingress patterns.
Every exception becomes operational debt. Track them. Revisit them. Delete them where possible.
Success criteria for secure networking baseline
- ✓
A single standard topology exists (hub/spoke or vWAN) and is documented.
- ✓
DNS zones and private resolution have a clear owner and pattern.
- ✓
Private endpoints work consistently across workload VNets (zone linking is standardized).
- ✓
Egress is controlled and logged (NAT or firewall/proxy, but consistent).
- ✓
Inbound exposure uses standard WAF patterns; random public IP creation is restricted.
- ✓
Flow/security logs are centralized with defined retention.
Networking becomes “secure” when the safe path is the default path. Standardize the topology, own DNS, control egress, and the rest of your landing zone becomes easier to run.