You've successfully subscribed to WorldRemit Technology Blog
Great! Next, complete checkout for full access to WorldRemit Technology Blog
Welcome back! You've successfully signed in.
Success! Your account is fully activated, you now have access to all content.
Success! Your billing info is updated.
Billing info update failed.
Open Policy Agent at WorldRemit

Open Policy Agent at WorldRemit

. 6 min read

| Image by Brian McGowan via Unsplash Copyright-free

Modern-day systems are complex, they have many components and many moving parts within these components. To rationalise this complexity and protect the healthy state of the system, engineers or architects may choose to apply policies to individual components or entire systems.

To do this we first need to know what policy is. If we look at Wikipedia, policy is a deliberate system of principles that guide decisions and achieve rational outcomes. Applied to a microservice architecture, policies can be in the form of trusted services, rate-limits, user access, API scopes, etc. These policies are often hard-coded in different parts of the stack, in many different programming languages, have different update methods, and are owned by different parts of the company.

This looks like a mess, doesn't it? That is because it is, and the Open Policy Agent attempts to clean it up.

What is the Open Policy Agent?

The Open Policy Agent (often seen as OPA (not that opa) is an open-source, general-purpose policy engine that enables unified, context-aware policy enforcement across the entire stack.

OPA will keep the policies consistent across our system, all while being fast, μs (microsecond) fast!

It is used by many famous names in the tech industry such as Netflix, Chef, Atlassian, Cloudflare, Pinterest, Goldman Sachs, etc.

How does it work?

At its core, the principle is simple, we have a policy decision that needs to be made, we just query OPA by passing it in a structured query as input (JSON), which it will check against the Data (JSON) and Policy (Rego) and give us a decision (JSON).

There are 4 main topics I want to cover in order to better understand OPA:

  • Data
  • Policy
  • Query
  • Decision

Data
Data must be provided to OPA in JSON and it is cached in memory.

The OPA has no rules regarding data structure, input and output. It is recommended to structure it in such a way as to make it easier for us to write policy rules against it.

Example of Data that OPA will use to make Policy decisions

{
  "alice": [
    "read",
    "write"
  ],
  "bob": [
    "read"
  ]
}

Policy
OPA policies are expressed in a high-level declarative language called Rego.

Example of a Rego Policy that is mapped to previously defined Data

package myapi.policy

import data.myapi.acl
import input

default allow = false

allow {
        access = acl[input.user]
        access[_] == input.access
}

whocan[user] {
        access = acl[user]
        access[_] == input.access
}

Want to play around with OPA and Rego? Take a look at their interactive online playground.

Query & Decision
This, just like Data, must be JSON. It will contain the values we want to check against.

In the below example we are asking if Alice has the permissions to write

{
  "input": {
    "user": "alice",
    "operation": "write"
  }
}

The Input above goes to OPA which checks the Policy against the Data and gives us a Decision

{
  "result": true
}

Policy decisions are not limited to simple allow/deny answers. Like query inputs, policies can generate arbitrary structured data as output.

Note that the application still has to implement the enforcement of these decisions. If we ask OPA if foo can access bar, and the answer is no, our application should reply with a 403 Forbidden.

How can we use it?

There are several open-source projects that integrate with OPA to implement
fine-grained access control. Some of interest to WorldRemit are SSH, Kong, Kafka, Istio (Envoy), Kubernetes. For a full list see OPA Ecosystem.

We will focus on 2 DevOps use-cases as part of this post:

  • Gatekeeper for Kubernetes
  • Conftest for Configuration Files

Disclaimer!

There is no guarantee we will actually use these tools in this way, this is just me researching and playing around with them to see what they are capable of. Ahead of advising on our use internally.

Gatekeeper

OPA can be leveraged in use cases beyond access control, Gatekeeper allows us to have fine-grained policies for Kubernetes compute/network/storage resources, for example:

  • Limit the use of unsafe images
  • Block public image registries
  • Disallow certain Egress traffic rules
  • Require CPU & memory limits
  • Prevent Ingress conflicts
  • You get the picture 🖼️

Gatekeeper deploys OPA as an Admission Controller for Kubernetes. Admission Controllers are plug-ins that intercept requests to the master API prior to persistence of a resource, but after the request is authenticated and authorized.

Admission Controller Stages

Gatekeeper makes it easy to write Admission Controllers, saving us a lot of hassle building and maintaining them. This is done by defining ConstraintTemplates, which describe both the Rego policy that enforces the constraint and the schema of the constraint.

The current plan is to use Gatekeeper as part of our overall Kubernetes Governance.

Example
Once Gatekeeper is installed on a cluster, applying the Gatekeeper/OPA policies is simple. The example below gives us the opportunity to request the presence of certain labels before proceeding with the implementation.

Create the Constraint (CRD)

apiVersion: templates.gatekeeper.sh/v1beta1
kind: ConstraintTemplate
metadata:
  name: k8srequiredlabels
spec:
  crd:
    spec:
      names:
        kind: K8sRequiredLabels
        listKind: K8sRequiredLabelsList
        plural: k8srequiredlabels
        singular: k8srequiredlabels
      validation:
        openAPIV3Schema:
          properties:
            labels:
              type: array
              items: string
  targets:
    - target: admission.k8s.gatekeeper.sh
      rego: |
        package k8srequiredlabels

        violation[{"msg": msg, "details": {"missing_labels": missing}}] {
          provided := {label | input.review.object.metadata.labels[label]}
          required := {label | label := input.parameters.labels[_]}
          missing := required - provided
          count(missing) > 0
          msg := sprintf("you must provide labels: %v", [missing])
        }

Specify required labels

apiVersion: constraints.gatekeeper.sh/v1beta1
kind: K8sRequiredLabels
metadata:
  name: deploy-must-have-labels
spec:
  match:
    kinds:
      - apiGroups: [""]
        kinds: ["Deployment"]
  parameters:
    labels: ["app"]

Cool isn't it? But what about previous deployments? Well, this is where the audit functionality comes in. It allows us to do periodic evaluations of replicated resources against the constraints enforced in the cluster to detect pre-existing misconfigurations. If we inspect the previously applied K8sRequiredLabels constraint and have violations we will see them under violations in the status field.

apiVersion: constraints.gatekeeper.sh/v1beta1
kind: K8sRequiredLabels
metadata:
  name: deploy-must-have-labels
spec:
  match:
    kinds:
      - apiGroups: [""]
        kinds: ["Deployment"]
  parameters:
    labels: ["app"]
status:
  auditTimestamp: "2020-08-31T08:32:12Z"
  byPod:
  - enforced: true
    id: gatekeeper-controller-manager-0
  violations:
  - enforcementAction: deny
    kind: Deployment
    message: 'you must provide labels: {"app"}'
    name: service-foo
  - enforcementAction: deny
    kind: Deployment
    message: 'you must provide labels: {"app"}'
    name: service-bar

Conftest

People overlook configuration files. But they are important, I would say, to the same extent as access policies.

Scanning configuration files, denying unsecure flags and options only meant for debugging could have prevented many ElasticSearch scandals (1, 2, 3, etc).

Conftest allows us to do exactly this.

Conftest uses OPA to provide a user experience optimised for developers wanting to test all kinds of configuration files.

Now I know what you think, how is this different from Gatekeeper? Well, Gatekeeper focuses on securing a Kubernetes cluster and Conftest’s focus on the upstream development process. Since both tools use Open Policy Agent under the hood, making using them together a real end-to-end solution.

We could use Conftest to deny certain Docker images as part of our CI/CD

package main

image_denylist = [
  "openjdk"
]

deny[msg] {
  input[i].Cmd == "from"
  val := input[i].Value
  contains(val[i], image_denylist[_])

  msg = sprintf("unallowed image found %s", [val])
}

But why not take it a step further and deny specific tools

package main

image_denylist = [
  "openjdk"
]

run_denylist = [
  "apk",
  "apt",
  "pip",
  "curl",
  "wget",
]

deny[msg] {
  input[i].Cmd == "from"
  val := input[i].Value
  contains(val[i], image_denylist[_])

  msg = sprintf("unallowed image found %s", [val])
}

deny[msg] {
  input[i].Cmd == "run"
  val := input[i].Value
  contains(val[_], run_denylist[_])

  msg = sprintf("unallowed commands found %s", [val])
}
❯ conftest test Dockerfile
FAIL - Dockerfile - unallowed image found ["openjdk:8-jdk-alpine"]
FAIL - Dockerfile - unallowed commands found ["apk add --no-cache python3 
python3-dev build-base && pip3 install awscli==1.18.1"]

2 tests, 0 passed, 0 warnings, 2 failures, 0 exceptions

What about blocking 0.0.0.0 in Security Groups and HTTP on an ALB in Terraform?

package main

has_field(obj, field) {
    obj[field]
}

deny[msg] {
    rule := input.resource.aws_security_group_rule[name]
    rule.type == "ingress"
    contains(rule.cidr_blocks[_], "0.0.0.0/0") 
    msg = sprintf("ASG `%v` defines a fully open ingress", [name])
}

deny[msg] {
    proto := input.resource.aws_alb_listener[lb].protocol
    proto == "HTTP"
    msg = sprintf("ALB `%v` is using HTTP rather than HTTPS", [lb])
}
❯ conftest test main.tf
FAIL - main.tf - ASG `my-rule` defines a fully open ingress
FAIL - main.tf - ALB `my-alb-listener` is using HTTP rather than HTTPS

2 tests, 0 passed, 0 warnings, 2 failures, 0 exceptions

Conclusion

This blog post was heavily inspired by several lectures I attended (virtually) at KubeCon/CloudNativeCon Europe 2020 and I will be sure to append them to the post once they are released on YouTube sometime in September 2020.

We have also recently gotten to what the Open Policy Agent and the tools built around it are capable of,  and there will definitely be more  as its popularity grows rapidly.