The point of policy-as-code is to encode the rules a senior engineer would catch in review, so they get caught at 2am when no senior engineer is looking. Done well it speeds teams up because they stop waiting on a human to approve obvious things. Done badly it is a black box that blocks a deploy with a message nobody can decode, and the team's response is to get the policy disabled for their pipeline. We have cleaned up both.
Run policy against the plan, not the code
Linting HCL catches style. It does not catch the security group that ends up open to the world after variable interpolation and module composition. Real guardrails evaluate the planned JSON - the actual set of changes Terraform intends to make - because that is the only artifact that reflects what will really happen.
deny[msg] {
rc := input.resource_changes[_]
rc.type == "aws_security_group_rule"
rc.change.after.cidr_blocks[_] == "0.0.0.0/0"
rc.change.after.to_port == 22
msg := sprintf("SSH open to the world on %s", [rc.address])
}Three tiers: advisory, soft-mandatory, hard
Not every rule deserves to block a deploy. Sentinel's enforcement levels map well to how rules actually graduate, and we mirror the same idea with OPA by routing results into different gates.
- Advisory - prints a warning, never blocks. Where every new rule starts, so you can measure how often it would fire before you turn it on.
- Soft-mandatory - blocks, but a named approver can override with a logged reason. Good for cost ceilings and instance-size limits.
- Hard-mandatory - no override, ever. Reserve this for genuine security and compliance: no public S3 buckets, no unencrypted volumes, no IAM wildcards on admin actions.
Every new policy starts as advisory. You measure how often it would have fired before you let it block anyone.
OPA or Sentinel?
If you are on Terraform Cloud or Enterprise, Sentinel is right there and the integration is clean. Everywhere else - self-hosted runners, GitLab, mixed tooling - we reach for OPA and Conftest because Rego is portable and we can run the same policies against Kubernetes manifests and Dockerfiles too. Rego has a learning curve that is real; budget a week before your team is comfortable writing it, and keep a shared library of tested rules rather than letting every team reinvent the same S3 check.
Write the error message for a tired human
The single highest-impact thing in any policy is the failure message. A good one names the resource, says exactly what rule it broke, and points to how to fix it or who can grant an exception. We treat a policy that fails with `denied by rule 47` as a bug. If the person hitting the guardrail at midnight cannot self-serve their way past it, they will find someone with admin to disable the whole thing, and now you have no guardrail at all.