Check out Part 1 of this series for an introduction to Kubernetes operators.

Now that we've covered the basics of what a Kubernetes operator is, we can start to dig into the details of each operator component. Remember that a Kubernetes operator consists of 3 parts: a Custom Resource Definition, a Custom Resource, and a Controller. Before we can focus on the Controller, which is where an operator's automation takes place, we first need to define our Custom Resource Definition and use that to create a Custom Resource.

Example: RSS Feed Reader Application

Throughout this series, I will be using a distributed RSS feed reader application (henceforth known as "FeedReader") to demonstrate basic Kubernetes operator concepts. Let's start with a quick overview of the FeedReader design.

FeedReader Architecture

The architecture for our FeedReader application is split into 3 main components:

  1. A StatefulSet running our feed database. This holds both the URLs of feeds that we wish to scrape, as well as their scraped contents
  2. A CronJob to perform the feed scraping job at a regular interval
  3. A Deployment that provides a frontend to query the feed database and display RSS feed contents to the end user

FeedReader Architecture Diagram

The FeedReader application runs as a single golang binary that has 3 modes: web server, feed scraper, and embedded feed database. Each of these modes is designed to run in a Deployment, CronJob, and StatefulSet respectively.

Back to CRDs: Starting At the End

Just to recap, a Custom Resource Definition is the schema of a new Kubernetes resource type that you define. But instead of thinking in terms of an abstract schema, I believe that it's easier to start from your desired result (a concrete Custom Resource) and work backwards from there.

My very basic rule of thumb is that whenever you need a variable to describe an option in your application, you should consider defining a new field and setting sample value in a Custom Resource to represent it. Once you have a Custom Resource with fields and values that clearly explain your application's desired behavior, you can then start to generalize these fields into an abstract schema, or CRD.

Let's perform this exercise with the FeedReader.

FeedReader CRD

To keep the article at a readable length, I'm just going focus on the most critical aspects of the FeedReader. I've extracted two key variables that are required for the application to function at its most basic level:

  1. A list of RSS feed urls to scrape
  2. A time interval that defines how often to scrape the urls in the above list

Thus, I'll add 2 fields to my FeedReader CR:

  1. A list of strings that hold my feed urls
  2. A Golang duration string that specifies the scrape interval

Based on these two requirements, here's an example of a FeedReader Custom Resource that I would like to define:

---
apiVersion: crd.sklar.rocks/v1alpha1
kind: FeedReader
metadata:
  name: feed-reader-app
spec:
  feeds:
    - https://sklar.rocks/atom.xml
  scrapeInterval: 60m

Once created, the above resource should deploy a new instance of the FeedReader app, add one feed (https://sklar.rocks/atom.xml) to the database, and scrape that url every 60 minutes for new posts that are not currently in the feed database.

Note that I could use something like crontab syntax to define the scrape interval, but I find the golang human-readable intervals easier to work with, since I don't always have to cross-check with crontab.guru to make sure that I've defined the correct schedule.

Exercise

Can you think of any more options that the FeedReader would need? Personally, I would start with things like feed retention, logging, and scraping retry behavior, but the possibilities are endless!

From Instance to Schema

But before we can kubectl apply on the above CR yaml, we first need to define and apply a schema in the form of a Custom Resource Definition (CRD) for us to create an object of this type in our Kubernetes cluster. Once the CRD is known to our cluster, we can then create and modify new instances of that CRD.

If you've ever worked with OpenAPI, Swagger, XML Schema or any other schema definition, you probably know that in many cases, the schema of an object can be incredibly more verbose than an instance of the object itself. In Kubernetes, to define a simple Custom Resource Definition with the two fields that I outlined above, the yaml would look something like this:

---
apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
  name: feedreaders.crd.sklar.rocks
spec:
  group: crd.sklar.rocks
  names:
    kind: FeedReader
    listKind: FeedReaderList
    plural: feedreaders
    singular: feedreader
  scope: Namespaced
  versions:
    name: v1alpha1
    schema:
      openAPIV3Schema:
        properties:
          apiVersion:
            description: 'APIVersion defines the versioned schema of this representation
              of an object. Servers should convert recognized schemas to the latest
              internal value, and may reject unrecognized values. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#resources'
            type: string
          kind:
            description: 'Kind is a string value representing the REST resource this
              object represents. Servers may infer this from the endpoint the client
              submits requests to. Cannot be updated. In CamelCase. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#types-kinds'
            type: string
          metadata:
            type: object
          spec:
            properties:
              feeds:
                description: List of feed urls to scrape at a regular interval
                items:
                  type: string
                type: array
              scrapeInterval:
                description: 'Interval at which to scrape feeds, in the form of a
                  golang duration string (see https://pkg.go.dev/time#ParseDuration for valid formats)'
                type: string
            type: object
          status:
            properties:
              lastScrape:
                description: Time of the last scrape
                format: date-time
                type: string
            type: object
        type: object
    served: true
    storage: true
    subresources:
      status: {}

Now that's a lot of yaml! This schema includes metadata like the resource's name, API group, and version, as well as the formal definitions of our custom fields and standard Kubernetes fields like "apiVersion" and "kind".

If you're thinking "do I actually have to write something like this for all of my CRDs?", the answer is no! This is where we can leverage the Kubernetes standard library and golang tooling when defining CRDs. For example, I didn't have to write a single line of the yaml in the example above! Here's how I did it.

Controller-gen to the Rescue

The Kubernetes community has developed a tool to generate CRD yaml files directly from Go structs, called controller-gen. This tool parses Go source code for structs and special comments known as "markers" that begin with //+kubebuilder. It then uses this parsed data to generate a variety of operator-related code and yaml manifests. One such manifest that it can create is a CRD.

Using controller-gen, here's the code that I used to generate the above schema. It's certainly much less verbose than the resulting yaml file!

package v1alpha1

import metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"

type FeedReaderSpec struct {
	// List of feed urls to scrape at a regular interval
	Feeds []string `json:"feeds,omitempty"`
	// Interval at which to scrape feeds, in the form of a golang duration string
	// (see https://pkg.go.dev/time#ParseDuration for valid formats)
	ScrapeInterval string `json:"scrapeInterval,omitempty"`
}

type FeedReaderStatus struct {
	// Time of the last scrape
	LastScrape metav1.Time `json:"lastScrape,omitempty"`
}

//+kubebuilder:object:root=true
//+kubebuilder:subresource:status

type FeedReader struct {
	metav1.TypeMeta   `json:",inline"`
	metav1.ObjectMeta `json:"metadata,omitempty"`

	Spec   FeedReaderSpec   `json:"spec,omitempty"`
	Status FeedReaderStatus `json:"status,omitempty"`
}

//+kubebuilder:object:root=true

type FeedReaderList struct {
	metav1.TypeMeta `json:",inline"`
	metav1.ListMeta `json:"metadata,omitempty"`
	Items           []FeedReaderList `json:"items"`
}

Here, we've defined a FeedReader struct that has a Spec and a Status field. Seem familiar? Almost every Kubernetes resource that we deal with has a .spec and .status field in its definition. But the types of our FeedReader's Spec and Status are actually custom types that we defined directly in our Go code. This gives us the ability to craft our CRD schema in a compiled language with static type-checking while letting the tooling emit the equivalent yaml CRD spec to use in our Kubernetes clusters.

If you compare the CRD yaml with the FeedReaderSpec struct, you can see that controller-gen extracted the struct fields, types, and docstrings into the emitted yaml. Also, since our FeedReader inherits from metav1.TypeMeta and metav1.ObjectMeta and is annotated with the //+kubebuilder:object:root=true comment, the fields from those interfaces (like apiVersion and kind) are included in the yaml as well.

But what about things like the API group, crd.sklar.rocks, and version v1alpha1? These are handled in a separate Go file in the same package:

// +kubebuilder:object:generate=true
// +groupName=crd.sklar.rocks
package v1alpha1

import (
	"k8s.io/apimachinery/pkg/runtime/schema"
	"sigs.k8s.io/controller-runtime/pkg/scheme"
)

var (
	// GroupVersion is group version used to register these objects
	GroupVersion = schema.GroupVersion{Group: "crd.sklar.rocks", Version: "v1alpha1"}

	// SchemeBuilder is used to add go types to the GroupVersionKind scheme
	SchemeBuilder = &scheme.Builder{GroupVersion: GroupVersion}
)

func init() {
	SchemeBuilder.Register(&FeedReader{}, &FeedReaderList{})
}

Here, we initialize a GroupVersion instance that defines our API group (crd.sklar.rocks) and version (v1alpha1). We then use it to instantiate the SchemeBuilder, which is used to register the FeedReader and FeedReaderList structs from above. When controller-gen is run, it will load this Go package, check for any registered structs, parse them (and any related comment markers), and emit the CRD yaml that we can use to install the CRD in our cluster.

Assuming that our api definitions are in the ./api directory and we want the output in ./deploy/crd, all we need to do is run the following command to generate our FeedReader CRD's yaml definition:

$ controller-gen crd paths="./api/..." output:dir=deploy/crd

Installing our Custom Resource Definition

Now that we've generated our Custom Resource Definition, how do we install it in the cluster? It's actually very simple! A CustomResourceDefinition is its own Kubernetes resource type, just like a Deployment, Pod, StatefulSet, or any of the other Kubernetes resources that you're used to working with. All you need to do is kubectl create your CRD yaml, and the new CRD will be validated and registered with the API server.

You can check this by typing kubectl get crd, which will list all of the Custom Resource Definitions installed in your cluster. If you've copy-pasted the CRD yaml and installed the FeedReader CRD in your cluster, you should see the feedreader.crd.sklar.rocks in the list, under the v1alpha1 API version.

With our FeedReader CRD installed, you are free to create as many resources of kind: FeedReader as you wish. Each one of these represents a unique instance of the FeedReader application in your cluster, with each instance managing its own Deployment, StatefulSet and CronJob. This way, you can easily deploy and manage multiple instances of your application in your cluster, all by creating and modifying Kubernetes objects in the same way that you would for any other resource type.

What's Next?

Now that we have a Custom Resource Definition and a Custom Resource that defines our FeedReader application, we need to actually perform the orchestration that will manage the application in our cluster. Stay tuned for the next article in this series, which will discuss how to do that in a custom controller.

If you like this content and want to follow along with the series, I'll be writing more articles under the Operators 101 tag on this blog.