| Image by Brian McGowan via Unsplash Copyright-free
Modern-day systems are complex, they have many components and many moving parts within these components. To rationalise this complexity and protect the healthy state of the system, engineers or architects may choose to apply policies to individual components or entire systems.
To do this we first need to know what policy is. If we look at Wikipedia, policy is a deliberate system of principles that guide decisions and achieve rational outcomes. Applied to a microservice architecture, policies can be in the form of trusted services, rate-limits, user access, API scopes, etc. These policies are often hard-coded in different parts of the stack, in many different programming languages, have different update methods, and are owned by different parts of the company.
This looks like a mess, doesn't it? That is because it is, and the Open Policy Agent attempts to clean it up.
What is the Open Policy Agent?
The Open Policy Agent (often seen as OPA (not that opa) is an open-source, general-purpose policy engine that enables unified, context-aware policy enforcement across the entire stack.
OPA will keep the policies consistent across our system, all while being fast, μs (microsecond) fast!
It is used by many famous names in the tech industry such as Netflix, Chef, Atlassian, Cloudflare, Pinterest, Goldman Sachs, etc.
How does it work?
At its core, the principle is simple, we have a policy decision that needs to be made, we just query OPA by passing it in a structured query as input (JSON), which it will check against the Data (JSON) and Policy (Rego) and give us a decision (JSON).

There are 4 main topics I want to cover in order to better understand OPA:
- Data
- Policy
- Query
- Decision
Data
Data must be provided to OPA in JSON and it is cached in memory.
The OPA has no rules regarding data structure, input and output. It is recommended to structure it in such a way as to make it easier for us to write policy rules against it.
Example of Data that OPA will use to make Policy decisions
{
"alice": [
"read",
"write"
],
"bob": [
"read"
]
}
Policy
OPA policies are expressed in a high-level declarative language called Rego.
Example of a Rego Policy that is mapped to previously defined Data
package myapi.policy
import data.myapi.acl
import input
default allow = false
allow {
access = acl[input.user]
access[_] == input.access
}
whocan[user] {
access = acl[user]
access[_] == input.access
}
Want to play around with OPA and Rego? Take a look at their interactive online playground.
Query & Decision
This, just like Data, must be JSON. It will contain the values we want to check against.
In the below example we are asking if Alice has the permissions to write
{
"input": {
"user": "alice",
"operation": "write"
}
}
The Input above goes to OPA which checks the Policy against the Data and gives us a Decision
{
"result": true
}
Policy decisions are not limited to simple allow/deny answers. Like query inputs, policies can generate arbitrary structured data as output.
Note that the application still has to implement the enforcement of these decisions. If we ask OPA if foo
can access bar
, and the answer is no, our application should reply with a 403 Forbidden.
How can we use it?
There are several open-source projects that integrate with OPA to implement
fine-grained access control. Some of interest to WorldRemit are SSH, Kong, Kafka, Istio (Envoy), Kubernetes. For a full list see OPA Ecosystem.
We will focus on 2 DevOps use-cases as part of this post:
- Gatekeeper for Kubernetes
- Conftest for Configuration Files
Disclaimer!
There is no guarantee we will actually use these tools in this way, this is just me researching and playing around with them to see what they are capable of. Ahead of advising on our use internally.
Gatekeeper
OPA can be leveraged in use cases beyond access control, Gatekeeper allows us to have fine-grained policies for Kubernetes compute/network/storage resources, for example:
- Limit the use of unsafe images
- Block public image registries
- Disallow certain Egress traffic rules
- Require CPU & memory limits
- Prevent Ingress conflicts
- You get the picture 🖼️
Gatekeeper deploys OPA as an Admission Controller for Kubernetes. Admission Controllers are plug-ins that intercept requests to the master API prior to persistence of a resource, but after the request is authenticated and authorized.

Gatekeeper makes it easy to write Admission Controllers, saving us a lot of hassle building and maintaining them. This is done by defining ConstraintTemplates
, which describe both the Rego policy that enforces the constraint and the schema of the constraint.
The current plan is to use Gatekeeper as part of our overall Kubernetes Governance.
Example
Once Gatekeeper is installed on a cluster, applying the Gatekeeper/OPA policies is simple. The example below gives us the opportunity to request the presence of certain labels before proceeding with the implementation.
Create the Constraint (CRD)
apiVersion: templates.gatekeeper.sh/v1beta1
kind: ConstraintTemplate
metadata:
name: k8srequiredlabels
spec:
crd:
spec:
names:
kind: K8sRequiredLabels
listKind: K8sRequiredLabelsList
plural: k8srequiredlabels
singular: k8srequiredlabels
validation:
openAPIV3Schema:
properties:
labels:
type: array
items: string
targets:
- target: admission.k8s.gatekeeper.sh
rego: |
package k8srequiredlabels
violation[{"msg": msg, "details": {"missing_labels": missing}}] {
provided := {label | input.review.object.metadata.labels[label]}
required := {label | label := input.parameters.labels[_]}
missing := required - provided
count(missing) > 0
msg := sprintf("you must provide labels: %v", [missing])
}
Specify required labels
apiVersion: constraints.gatekeeper.sh/v1beta1
kind: K8sRequiredLabels
metadata:
name: deploy-must-have-labels
spec:
match:
kinds:
- apiGroups: [""]
kinds: ["Deployment"]
parameters:
labels: ["app"]
Cool isn't it? But what about previous deployments? Well, this is where the audit functionality comes in. It allows us to do periodic evaluations of replicated resources against the constraints enforced in the cluster to detect pre-existing misconfigurations. If we inspect the previously applied K8sRequiredLabels
constraint and have violations we will see them under violations
in the status
field.
apiVersion: constraints.gatekeeper.sh/v1beta1
kind: K8sRequiredLabels
metadata:
name: deploy-must-have-labels
spec:
match:
kinds:
- apiGroups: [""]
kinds: ["Deployment"]
parameters:
labels: ["app"]
status:
auditTimestamp: "2020-08-31T08:32:12Z"
byPod:
- enforced: true
id: gatekeeper-controller-manager-0
violations:
- enforcementAction: deny
kind: Deployment
message: 'you must provide labels: {"app"}'
name: service-foo
- enforcementAction: deny
kind: Deployment
message: 'you must provide labels: {"app"}'
name: service-bar
Conftest
People overlook configuration files. But they are important, I would say, to the same extent as access policies.
Scanning configuration files, denying unsecure flags and options only meant for debugging could have prevented many ElasticSearch scandals (1, 2, 3, etc).
Conftest allows us to do exactly this.
Conftest uses OPA to provide a user experience optimised for developers wanting to test all kinds of configuration files.
Now I know what you think, how is this different from Gatekeeper? Well, Gatekeeper focuses on securing a Kubernetes cluster and Conftest’s focus on the upstream development process. Since both tools use Open Policy Agent under the hood, making using them together a real end-to-end solution.
We could use Conftest to deny certain Docker images as part of our CI/CD
package main
image_denylist = [
"openjdk"
]
deny[msg] {
input[i].Cmd == "from"
val := input[i].Value
contains(val[i], image_denylist[_])
msg = sprintf("unallowed image found %s", [val])
}
But why not take it a step further and deny specific tools
package main
image_denylist = [
"openjdk"
]
run_denylist = [
"apk",
"apt",
"pip",
"curl",
"wget",
]
deny[msg] {
input[i].Cmd == "from"
val := input[i].Value
contains(val[i], image_denylist[_])
msg = sprintf("unallowed image found %s", [val])
}
deny[msg] {
input[i].Cmd == "run"
val := input[i].Value
contains(val[_], run_denylist[_])
msg = sprintf("unallowed commands found %s", [val])
}
❯ conftest test Dockerfile
FAIL - Dockerfile - unallowed image found ["openjdk:8-jdk-alpine"]
FAIL - Dockerfile - unallowed commands found ["apk add --no-cache python3
python3-dev build-base && pip3 install awscli==1.18.1"]
2 tests, 0 passed, 0 warnings, 2 failures, 0 exceptions
What about blocking 0.0.0.0
in Security Groups and HTTP
on an ALB in Terraform?
package main
has_field(obj, field) {
obj[field]
}
deny[msg] {
rule := input.resource.aws_security_group_rule[name]
rule.type == "ingress"
contains(rule.cidr_blocks[_], "0.0.0.0/0")
msg = sprintf("ASG `%v` defines a fully open ingress", [name])
}
deny[msg] {
proto := input.resource.aws_alb_listener[lb].protocol
proto == "HTTP"
msg = sprintf("ALB `%v` is using HTTP rather than HTTPS", [lb])
}
❯ conftest test main.tf
FAIL - main.tf - ASG `my-rule` defines a fully open ingress
FAIL - main.tf - ALB `my-alb-listener` is using HTTP rather than HTTPS
2 tests, 0 passed, 0 warnings, 2 failures, 0 exceptions
Conclusion
This blog post was heavily inspired by several lectures I attended (virtually) at KubeCon/CloudNativeCon Europe 2020 and I will be sure to append them to the post once they are released on YouTube sometime in September 2020.
We have also recently gotten to what the Open Policy Agent and the tools built around it are capable of, and there will definitely be more as its popularity grows rapidly.