Response Policy Zones (RPZ): Internal Overrides & Security
Tại sao điều này quan trọng
Response Policy Zones (RPZ) = DNS-level access control & security filtering. Cho phép intercept và modify DNS responses tại infrastructure level, trước khi app sees them.
Real scenarios:
Scenario 1: Malware Blocking
├── Query: malware.badsite.com
├── RPZ rule: malware.badsite.com → NXDOMAIN
└── Result: Query blocked at DNS level (faster than firewall)
Scenario 2: Internal Service Redirect
├── Query: api.internal-test.example.com (from staging)
├── RPZ rule: api.internal-test.example.com → 10.0.2.100 (staging IP)
└── Result: Redirect to internal IP, bypass public endpoint
Scenario 3: Compliance Override
├── Query: external-service.com (not allowed in prod)
├── RPZ rule: external-service.com → NXDOMAIN
└── Result: Compliance enforced at DNS levelRPZ Fundamentals
What is RPZ?
RPZ = policy layer above DNS records, allows:
- Block queries (return NXDOMAIN)
- Redirect queries (return internal IP)
- Pass through (allow normally)
- Log queries (audit trail)
Normal Zone Flow:
Query: api.example.com
↓
Check records
↓
Return A 35.201.100.50
RPZ Flow:
Query: api.example.com
↓
Check RPZ rules first
↓
Rule matches: Return 10.0.1.100 (internal IP)
↓
Return 10.0.1.100GCP Cloud DNS RPZ Support
Important: GCP Cloud DNS does NOT have native RPZ support yet (as of writing).
Alternatives in GCP:
- Custom Corefile in GKE (via CoreDNS plugins)
- Firewall rules (layer 4, not DNS)
- Cloud Armor (layer 7, not DNS)
- Unbound / custom DNS proxy (DIY approach)
For on-premises BIND DNS with RPZ:
bind
zone "rpz.internal" {
type master;
file "/etc/bind/rpz/internal-policy.zone";
};
zone "." {
type hint;
file "/etc/bind/db.root";
};
# Apply RPZ to queries
response-policy {
zone "rpz.internal";
};GKE Implementation via CoreDNS
CoreDNS Rewrite Plugin (Closest to RPZ)
corefile
.:53 {
cache 30
errors
health
kubernetes cluster.local in-addr.arpa ip6.arpa {
pods insecure
fallthrough in-addr.arpa ip6.arpa
ttl 30
}
# Rewrite/redirect (similar to RPZ behavior)
rewrite name regex (.*)\.internal-test\.example\.com {1}.internal.prod.example.com
# Can also block with custom response
template ANY NXDOMAIN malware.badsite.com {
rcode NXDOMAIN
}
forward . /etc/resolv.conf {
max_concurrent 1000
}
}Custom DNS Proxy (Full RPZ)
Deploy custom DNS proxy with RPZ support:
bash
# Option 1: Unbound with RPZ
docker run -d \
--name=dns-proxy \
--net=host \
--volume=/etc/unbound:/etc/unbound:ro \
nlnetlabs/unbound:latest
# Option 2: Bind with RPZ
docker run -d \
--name=dns-proxy \
--net=host \
--volume=/etc/bind:/etc/bind:ro \
internetsystemsconsortium/bind9:latestProduction Patterns
Pattern 1: Malware Block List
Maintain list of known malware domains:
- ransomware.badguys.com
- c2.attacker.io
- phishing.site.com
RPZ Rule: Block all
Block any query to these domains → NXDOMAIN
Result:
✓ Prevents infection propagation
✓ Blocks C&C communication
✓ Audit trail of attemptsPattern 2: Internal Service Redirect
Scenario: Staging env isolates from external APIs
Public: api.example.com → 35.201.100.50 (production)
Internal (staging): api.example.com → 10.0.2.100 (staging mock)
RPZ Rule:
From staging VPC: api.example.com → 10.0.2.100
From prod VPC: api.example.com → 35.201.100.50 (normal)
Result:
✓ Staging doesn't call prod APIs (safety)
✓ No code changes needed
✓ Enforced at DNS levelPattern 3: Compliance Override
Requirement: Certain domains not accessible from prod
RPZ Rule:
social-media.com → NXDOMAIN (from prod)
news-site.com → NXDOMAIN (from prod)
Result:
✓ Blocks non-business services
✓ Audit trail for compliance
✓ Cannot be bypassed (DNS-level)Implementation Strategy
Strategy 1: DNS Firewall (Using Cloud Armor)
bash
# Create firewall policy
gcloud compute security-policies create dns-policy \
--description="DNS filtering policy"
# Allow known good domains
gcloud compute security-policies rules create 1000 \
--security-policy=dns-policy \
--action=allow \
--expression='origin.region_code == "US"'
# Block malware domains
gcloud compute security-policies rules create 1001 \
--security-policy=dns-policy \
--action=deny \
--expression='evaluatePreconfiguredExpr("xss-stable")'Strategy 2: GKE CoreDNS Rewrite
yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: coredns-custom
namespace: kube-system
data:
internal.override: |
# Block external APIs from prod
template ANY NXDOMAIN payment-processor.com {
rcode NXDOMAIN
}
template ANY NXDOMAIN third-party-service.io {
rcode NXDOMAIN
}
# Redirect staging queries
rewrite name regex (.*)\.staging-api\.example\.com {1}.internal-staging.example.com
# Custom responses
template ANY A compliance-check.internal.company.com {
response "<HEADER> <STATUS> <CTL> <QUESTION> <ANSWER>"
answer "compliance-check.internal.company.com 60 IN A 10.0.1.100"
}Strategy 3: Custom Proxy (Most Control)
dockerfile
# Dockerfile for custom DNS proxy
FROM ubuntu:22.04
RUN apt-get update && \
apt-get install -y unbound && \
rm -rf /var/lib/apt/lists/*
COPY unbound.conf /etc/unbound/
COPY rpz.zone /etc/unbound/
CMD ["unbound", "-d"]Monitoring & Auditing
Enable Query Logging
bash
# Enable DNS query logging
gcloud dns policies create rpz-log \
--description="Log RPZ-matched queries" \
--enable-logging \
--log-network=projects/PROJECT/global/networks/default
# View blocked queries
gcloud logging read "resource.type=dns_query AND jsonPayload.block=true" \
--limit=100 \
--format=json | jq '.[] | {domain: .jsonPayload.qname, action: .jsonPayload.action}'Alert on Policy Violations
bash
# Create alert for high block rate
gcloud alpha monitoring policies create \
--notification-channels=CHANNEL_ID \
--display-name="High DNS block rate" \
--condition-threshold-value=100 \
--condition-threshold-filter='resource.type="dns_policy"'Troubleshooting
Issue 1: RPZ Rule Not Blocking
bash
# Debug:
1. Verify RPZ rule syntax
# Ensure BIND config is valid
named-checkconf /etc/bind/named.conf
2. Check if RPZ applied to right resolver
# Verify zone is in response-policy section
grep -A 5 "response-policy" /etc/bind/named.conf
3. Test query
dig @192.168.1.10 malware.badsite.com +trace
# Check if NXDOMAIN is returned
4. Check logs
tail -f /var/log/syslog | grep rpzIssue 2: False Positives (Blocking Legitimate Domain)
bash
# Symptom: Service broken after RPZ rule added
Debug:
1. Identify which queries blocked
gcloud logging read 'jsonPayload.block=true' --limit=100
2. Verify rule is correct
grep "legitimate-domain.com" /etc/bind/rpz.zone
3. Temporary allow for debugging
Remove rule, test, re-add if needed
4. Implement whitelist
Add: legitimate-domain.com A legitimate.ip
BEFORE blacklist rule (order matters)Security Considerations
Data Privacy
RPZ logs contain all DNS queries:
Queries logged: all domains accessed by internal systems
Risk: Sensitive queries could be exposed
Mitigation:
1. Restrict access to DNS logs
Only ops/security team can view
2. Anonymize/summarize
Log domain only, not full query path
3. Retention policy
Auto-delete logs after 30 daysRPZ Cache Poisoning
If RPZ rules cached incorrectly:
Query: api.example.com
RPZ Match: → Redirect to 10.0.1.100
Cache: 3600 seconds
If RPZ rule removed later:
Old cached result still used
→ Query still redirects incorrectly
Mitigation:
1. Lower cache TTL for RPZ zones (60 seconds)
2. Flush cache when rules change
3. Monitor cache hit ratesBest Practices
- Whitelist-based approach (allow what's needed, block everything else)
- Separate RPZ rules by team/environment (easier audit)
- Regular review of RPZ rules (remove obsolete rules)
- Monitor block rates (spike indicates misconfiguration)
- Document rationale for each block rule
- Test before production (RPZ impact can be severe)
- Logging + alerting (detect policy violations)
- Include bypass mechanism (emergency access)