Using QuestDB to collect infrastructure metrics

Originally posted on the QuestDB Blog

Introduction

One of my favorite things about QuestDB is the ability to write queries in SQL against a high-performance time series database. Since I've been using SQL as my primary query language for basically my entire professional career, it feels natural for me to interact with data using SQL instead of other newer proprietary query languages. Combined with QuestDB's custom SQL extensions, its built-in SQL support makes writing complex queries a breeze.

In my life as a Cloud Engineer, I deal with time series metrics all the time. Unfortunately, many of today's popular metrics databases don't support the SQL query language. As a result, I've become more dependent on pre-built dashboards, and it takes me longer to write my own queries with JOINs, transformations, and temporal aggregations.

QuestDB can be a great choice for ingesting application and infrastructure metrics, it just requires a little more work on the initial setup than the Kubernetes tooling du jour. Despite this extra upfront time investment (which is fairly minimal in the grand scheme of things), I think that the benefits of using QuestDB for infrastructure metrics are worth it. With QuestDB, you get industry-leading performance and the ability to interact with the database in the most commonly-used query language in existence. We're even using QuestDB to display customer database metrics in our own QuestDB Cloud!

In this article, I will demonstrate how we use QuestDB as the main component in this new feature. This should provide enough information for you to also use QuestDB for ingesting, storing, and querying infrastructure metrics in your own clusters.

Architecture

Prometheus is a common time series database that is already installed in many Kubernetes clusters. We will be leveraging its remote write functionality to pipe data into QuestDB for querying and storage. However, since Prometheus remote write does not support the QuestDB-recommended InfluxDB Line Protocol (ILP) as a serialization format, we need to use a proxy to translate Prometheus-formatted metrics into ILP messages. We will be use Influx Data's Telegraf as this translation component.

Now, with our data in QuestDB, we can use SQL to query our metrics using any one of the supported methods: the Web Console, PostgreSQL wire protocol, or HTTP REST API.

Here's a quick overview of the architecture:

Infrastructure diagram

Prometheus remote write

While Prometheus operates on an interval-based pull model, it also has the ability to push metrics to remote sources. This is known as "remote write" capability, and is easily configurable in a yaml file. Here's an example of a basic remote write configuration:

remoteWrite:
  - url: http://default.telegraf.svc:9999/write
    name: questdb-telegraf
    remote_timeout: 10s

This yaml will configure Prometheus to send samples to the specified url with a 10 second timeout. In this case, we will be forwarding our metrics on to telegraf, with a custom port and endpoint that we can specify in the telegraf config (see below for more details). There are also a variety of other remote write options, allowing users to customize timeouts, headers, authentication, and additional relabling configs before writing to the remote data store. All of the possible options can be found on the Prometheus website.

QuestDB ILP and Telegraf

Now that we have our remote write configured, we need to set up its destination. Installing telegraf into a cluster is straightforward, just helm install its Helm chart.

We do need to configure telegraf to read from a web socket (where Prometheus is configured to writing to) and send to QuestDB for long-term storage. In a Kubernetes deployment, these options can be set in the config section of the telegraf Helm chart's values.yaml file.

Input Configuration

Since telegraf will be receiving metrics from Prometheus, we need to open a port that enables communication between the two services. Telegraf has an HTTP listener plugin that allows it to listen for traffic on a specified port. We also need to configure the path of the listener to match our Promtheus remote write url.

The HTTP listener (v2) supports multiple data formats to consume via its plugin architecture. A full list of options can be found in the telegraf docs. We will be using the Prometheus Remote Write Parser Plugin to accept our Prometheus messages.

Here is how this setup looks in the telegraf config:

[[inputs.http_listener_v2]]
  ## Address and port to host HTTP listener on
  service_address = ":9999"

  ## Paths to listen to.
  paths = ["/write"]

  ## Data format to consume.
  data_format = "prometheusremotewrite"

When passing these values to the Helm chart, you can use this yaml specification:

config:
  inputs:
    - http_listener_v2:
        service_address: ":9999"
        path: "/write"
        data_format: prometheusremotewrite

Output configuration

We recommend that you use the InfluxDB Line Protocol (ILP) over TCP to insert data into QuestDB. Luckily, telegraf includes an ILP output plugin! But unfortunately, this is not a plug-and-play solution. By default, all metrics will be written to a single measurement, prometheus_remote_write, with the individual metric's key being sent over the wire as a field. In practice, this means that all of your metrics will be written to a single QuestDB table, called prometheus_remote_write. There will then be an additional column for every single metric AND field that you are capturing. This leads to a large table, with potentially thousands of columns, that's difficult to work with and contains all sparse data, which could negatively impact performance.

To fix this problem, Telegraf provides us with a sample starlark script that transforms each measurement such that we will have a table-per-metric in QuestDB. This script will run for every metric that telegraf receives, so the output will be formatted correctly.

This is what telegraf's output config looks like:

[[outputs.socket_writer]]
  ## Address and port to write to
  address = "tcp://questdb.questdb.svc:9009"

[[processors.starlark]]
      source = '''
    def apply(metric):
      if metric.name == "prometheus_remote_write":
            for k, v in metric.fields.items():
                metric.name = k
                metric.fields["value"] = v
                metric.fields.pop(k)
      return metric
    '''

As an added benefit to using ILP with QuestDB, we don't even have to worry about each metric's fieldset. Over ILP, QuestDB automatically creates tables for new metrics. It also adds new columns for fields that it hasn't seen before, and INSERTs nulls for any missing fields.

Helm configuration

I've found that the easiest way to configure the values.yaml file is to mount the starlark script as a volume, and add a reference to it in the config. This way we don't need to deal with any whitespace-handling or special indentation in our ConfigMap specification.

The output and starlark Helm configuration would look like this:

# continued from above
# config:
outputs:
  - socket_writer:
      address: tcp://questdb.questdb.svc:9009
processors:
  - starlark:
      script: /opt/telegraf/remotewrite.star

We also need to add the volume and mount at the root level of the values.yaml:

volumes:
  - name: starlark-script
    configMap:
      name: starlark-script
mountPoints:
  - name: starlark-script
    mountPath: /opt/telegraf
    subpath: remotewrite.star

This volume references a ConfigMap that contains the starlark script from the above example:

---
apiVersion: v1
kind: ConfigMap
metadata:
  name: starlark-script
data:
  remotewrite.star: |
    def apply(metric):
      ...

Querying metrics with SQL

QuestDB has some powerful SQL extensions that can simplify writing time series queries. For example, given the standard set of metrics that a typical Prometheus installation collects, we can use QuestDB to not only find pods with the highest memory usage in a cluster (over a 6 month period), but also find the specific time period when the memory usage spiked. We can even access custom labels to help identify the pods with a human-readable name (instead of the long alphanumeric name assigned to pods by deployments or stateful sets). This is all performed with a simple SQL syntax using JOINs (enhanced by the ASOF keyword) and SAMPLE BY to bucket data into days with a simple line of SQL:

SELECT l.custom_label, w.timestamp, max(w.value / r.value) as mem_usage
FROM container_memory_working_set_bytes AS w
ASOF JOIN kube_pod_labels AS l ON (w.pod = l.pod)
ASOF JOIN kube_pod_container_resource_limits AS r ON (
  r.pod = w.pod AND
  r.container = w.container
)
WHERE custom_label IS NOT NULL
  AND r.resource = 'memory'
  AND w.timestamp > '2022-06-01'
SAMPLE BY 1d
ALIGN TO CALENDAR TIME ZONE 'Europe/Berlin'
ORDER BY mem_usage DESC;

Here's a sample output of that query:

custom_label	timestamp	mem_usage
keen austin	2022-07-04T16:18:00.000000Z	0.999853875401
optimistic banzai	2022-07-12T16:18:00.000000Z	0.9763028946
compassionate taussig	2022-07-11T16:18:00.000000Z	0.975367909527
cranky leakey	2022-07-11T16:18:00.000000Z	0.974941994418
quirky morse	2022-07-05T16:18:00.000000Z	0.95084235665
admiring panini	2022-06-21T16:18:00.000000Z	0.925567626953

This is only one of many ways that you can use QuestDB to write powerful time-series queries that you can use for one-off investigation or to power dashboards.

Metric retention

Since databases storing infrastructure metrics can grow to extreme sizes over time, it is important to enforce a retention period to free up space by deleting old metrics. Even though QuestDB does not support the traditional DELETE SQL command, you can still implement metric retention by using the DROP PARTITION command.

In QuestDB, data is stored by columns on-disk and optionally partitioned by a time duration. By default, when using ILP to ingest metrics, and a new table is automatically created, it is partitioned by DAY. This allows us to DROP PARTITIONs on a daily basis. If you need a different partitioning scheme, you can create the table with your desired partition period before ingesting any data over ILP, since ALTER TABLE does not support any changes to table partitioning. But since ILP does automatically add columns, the table specification can be very simple, with just the name and a timestamp column.

Once you've decided on your desired metric retention period, you can create a cron job that removes all partitions older than your oldest retention date. This will help keep your storage usage in check.

For more information about data retention in QuestDB, you can check out the docs.

Working example

I have created a working example of this setup in a repo, sklarsa/questdb-metrics-blog-post. The entire example runs in a local Kind cluster.

To run the example, execute the following commands:

git clone https://github.com/sklarsa/questdb-metrics-blog-post.git
cd questdb-metrics-blog-post
./run.sh

After a few minutes, all pods should be in ready with the following prompt:

You can now access QuestDB here: http://localhost:9000
Ctrl-C to exit
Forwarding from 127.0.0.1:9000 -> 9000
Forwarding from [::1]:9000 -> 9000

From here, you can navigate to http://localhost:9000 and explore the metrics that are being ingested into QuestDB. The default Prometheus scrape interval is 30 seconds, so there might not be a ton of data in there, but you should see a list of tables, one per each metric that we are collecting:

Metrics in the console

Once you're done, you can clean up the entire experiment by deleting the cluster:

./cleanup.sh

Conclusion

QuestDB can be a very powerful piece in the Cloud Engineer's toolkit. It grants you the ability to run complex time-series queries across multiple metrics with unparalleled speed in the world's most ubiquitous query language, SQL. Every second counts when debugging an outage at 2AM, and reducing the cognitive load of writing queries, as well as their execution time, is a game-changer for me.