Using QuestDB to collect infrastructure metrics

Originally posted on the QuestDB Blog

Introduction

One of my favorite things about QuestDB is the ability to write queries in SQL against a high-performance time series database. Since I've been using SQL as my primary query language for basically my entire professional career, it feels natural for me to interact with data using SQL instead of other newer proprietary query languages. Combined with QuestDB's custom SQL extensions, its built-in SQL support makes writing complex queries a breeze.

In my life as a Cloud Engineer, I deal with time series metrics all the time. Unfortunately, many of today's popular metrics databases don't support the SQL query language. As a result, I've become more dependent on pre-built dashboards, and it takes me longer to write my own queries with JOINs, transformations, and temporal aggregations.

QuestDB can be a great choice for ingesting application and infrastructure metrics, it just requires a little more work on the initial setup than the Kubernetes tooling du jour. Despite this extra upfront time investment (which is fairly minimal in the grand scheme of things), I think that the benefits of using QuestDB for infrastructure metrics are worth it. With QuestDB, you get industry-leading performance and the ability to interact with the database in the most commonly-used query language in existence. We're even using QuestDB to display customer database metrics in our own QuestDB Cloud!

In this article, I will demonstrate how we use QuestDB as the main component in this new feature. This should provide enough information for you to also use QuestDB for ingesting, storing, and querying infrastructure metrics in your own clusters.

Architecture

Prometheus is a common time series database that is already installed in many Kubernetes clusters. We will be leveraging its remote write functionality to pipe data into QuestDB for querying and storage. However, since Prometheus remote write does not support the QuestDB-recommended InfluxDB Line Protocol (ILP) as a serialization format, we need to use a proxy to translate Prometheus-formatted metrics into ILP messages. We will be use Influx Data's Telegraf as this translation component.

Now, with our data in QuestDB, we can use SQL to query our metrics using any one of the supported methods: the Web Console, PostgreSQL wire protocol, or HTTP REST API.

Here's a quick overview of the architecture:

Infrastructure diagram

Prometheus remote write

While Prometheus operates on an interval-based pull model, it also has the ability to push metrics to remote sources. This is known as "remote write" capability, and is easily configurable in a yaml file. Here's an example of a basic remote write configuration:

remoteWrite:
  - url: http://default.telegraf.svc:9999/write
    name: questdb-telegraf
    remote_timeout: 10s

This yaml will configure Prometheus to send samples to the specified url with a 10 second timeout. In this case, we will be forwarding our metrics on to telegraf, with a custom port and endpoint that we can specify in the telegraf config (see below for more details). There are also a variety of other remote write options, allowing users to customize timeouts, headers, authentication, and additional relabling configs before writing to the remote data store. All of the possible options can be found on the Prometheus website.

QuestDB ILP and Telegraf

Now that we have our remote write configured, we need to set up its destination. Installing telegraf into a cluster is straightforward, just helm install its Helm chart.

We do need to configure telegraf to read from a web socket (where Prometheus is configured to writing to) and send to QuestDB for long-term storage. In a Kubernetes deployment, these options can be set in the config section of the telegraf Helm chart's values.yaml file.

Input Configuration

Since telegraf will be receiving metrics from Prometheus, we need to open a port that enables communication between the two services. Telegraf has an HTTP listener plugin that allows it to listen for traffic on a specified port. We also need to configure the path of the listener to match our Promtheus remote write url.

The HTTP listener (v2) supports multiple data formats to consume via its plugin architecture. A full list of options can be found in the telegraf docs. We will be using the Prometheus Remote Write Parser Plugin to accept our Prometheus messages.

Here is how this setup looks in the telegraf config:

[[inputs.http_listener_v2]]
  ## Address and port to host HTTP listener on
  service_address = ":9999"

  ## Paths to listen to.
  paths = ["/write"]

  ## Data format to consume.
  data_format = "prometheusremotewrite"

When passing these values to the Helm chart, you can use this yaml specification:

config:
  inputs:
    - http_listener_v2:
        service_address: ":9999"
        path: "/write"
        data_format: prometheusremotewrite

Output configuration

We recommend that you use the InfluxDB Line Protocol (ILP) over TCP to insert data into QuestDB. Luckily, telegraf includes an ILP output plugin! But unfortunately, this is not a plug-and-play solution. By default, all metrics will be written to a single measurement, prometheus_remote_write, with the individual metric's key being sent over the wire as a field. In practice, this means that all of your metrics will be written to a single QuestDB table, called prometheus_remote_write. There will then be an additional column for every single metric AND field that you are capturing. This leads to a large table, with potentially thousands of columns, that's difficult to work with and contains all sparse data, which could negatively impact performance.

To fix this problem, Telegraf provides us with a sample starlark script that transforms each measurement such that we will have a table-per-metric in QuestDB. This script will run for every metric that telegraf receives, so the output will be formatted correctly.

This is what telegraf's output config looks like:

[[outputs.socket_writer]]
  ## Address and port to write to
  address = "tcp://questdb.questdb.svc:9009"

[[processors.starlark]]
      source = '''
    def apply(metric):
      if metric.name == "prometheus_remote_write":
            for k, v in metric.fields.items():
                metric.name = k
                metric.fields["value"] = v
                metric.fields.pop(k)
      return metric
    '''

As an added benefit to using ILP with QuestDB, we don't even have to worry about each metric's fieldset. Over ILP, QuestDB automatically creates tables for new metrics. It also adds new columns for fields that it hasn't seen before, and INSERTs nulls for any missing fields.

Helm configuration

I've found that the easiest way to configure the values.yaml file is to mount the starlark script as a volume, and add a reference to it in the config. This way we don't need to deal with any whitespace-handling or special indentation in our ConfigMap specification.

The output and starlark Helm configuration would look like this:

# continued from above
# config:
outputs:
  - socket_writer:
      address: tcp://questdb.questdb.svc:9009
processors:
  - starlark:
      script: /opt/telegraf/remotewrite.star

We also need to add the volume and mount at the root level of the values.yaml:

volumes:
  - name: starlark-script
    configMap:
      name: starlark-script
mountPoints:
  - name: starlark-script
    mountPath: /opt/telegraf
    subpath: remotewrite.star

This volume references a ConfigMap that contains the starlark script from the above example:

---
apiVersion: v1
kind: ConfigMap
metadata:
  name: starlark-script
data:
  remotewrite.star: |
    def apply(metric):
      ...

Querying metrics with SQL

QuestDB has some powerful SQL extensions that can simplify writing time series queries. For example, given the standard set of metrics that a typical Prometheus installation collects, we can use QuestDB to not only find pods with the highest memory usage in a cluster (over a 6 month period), but also find the specific time period when the memory usage spiked. We can even access custom labels to help identify the pods with a human-readable name (instead of the long alphanumeric name assigned to pods by deployments or stateful sets). This is all performed with a simple SQL syntax using JOINs (enhanced by the ASOF keyword) and SAMPLE BY to bucket data into days with a simple line of SQL:

SELECT l.custom_label, w.timestamp, max(w.value / r.value) as mem_usage
FROM container_memory_working_set_bytes AS w
ASOF JOIN kube_pod_labels AS l ON (w.pod = l.pod)
ASOF JOIN kube_pod_container_resource_limits AS r ON (
  r.pod = w.pod AND
  r.container = w.container
)
WHERE custom_label IS NOT NULL
  AND r.resource = 'memory'
  AND w.timestamp > '2022-06-01'
SAMPLE BY 1d
ALIGN TO CALENDAR TIME ZONE 'Europe/Berlin'
ORDER BY mem_usage DESC;

Here's a sample output of that query:

custom_labeltimestampmem_usage
keen austin2022-07-04T16:18:00.000000Z0.999853875401
optimistic banzai2022-07-12T16:18:00.000000Z0.9763028946
compassionate taussig2022-07-11T16:18:00.000000Z0.975367909527
cranky leakey2022-07-11T16:18:00.000000Z0.974941994418
quirky morse2022-07-05T16:18:00.000000Z0.95084235665
admiring panini2022-06-21T16:18:00.000000Z0.925567626953

This is only one of many ways that you can use QuestDB to write powerful time-series queries that you can use for one-off investigation or to power dashboards.

Metric retention

Since databases storing infrastructure metrics can grow to extreme sizes over time, it is important to enforce a retention period to free up space by deleting old metrics. Even though QuestDB does not support the traditional DELETE SQL command, you can still implement metric retention by using the DROP PARTITION command.

In QuestDB, data is stored by columns on-disk and optionally partitioned by a time duration. By default, when using ILP to ingest metrics, and a new table is automatically created, it is partitioned by DAY. This allows us to DROP PARTITIONs on a daily basis. If you need a different partitioning scheme, you can create the table with your desired partition period before ingesting any data over ILP, since ALTER TABLE does not support any changes to table partitioning. But since ILP does automatically add columns, the table specification can be very simple, with just the name and a timestamp column.

Once you've decided on your desired metric retention period, you can create a cron job that removes all partitions older than your oldest retention date. This will help keep your storage usage in check.

For more information about data retention in QuestDB, you can check out the docs.

Working example

I have created a working example of this setup in a repo, sklarsa/questdb-metrics-blog-post. The entire example runs in a local Kind cluster.

To run the example, execute the following commands:

git clone https://github.com/sklarsa/questdb-metrics-blog-post.git
cd questdb-metrics-blog-post
./run.sh

After a few minutes, all pods should be in ready with the following prompt:

You can now access QuestDB here: http://localhost:9000
Ctrl-C to exit
Forwarding from 127.0.0.1:9000 -> 9000
Forwarding from [::1]:9000 -> 9000

From here, you can navigate to http://localhost:9000 and explore the metrics that are being ingested into QuestDB. The default Prometheus scrape interval is 30 seconds, so there might not be a ton of data in there, but you should see a list of tables, one per each metric that we are collecting:

Metrics in the console

Once you're done, you can clean up the entire experiment by deleting the cluster:

./cleanup.sh

Conclusion

QuestDB can be a very powerful piece in the Cloud Engineer's toolkit. It grants you the ability to run complex time-series queries across multiple metrics with unparalleled speed in the world's most ubiquitous query language, SQL. Every second counts when debugging an outage at 2AM, and reducing the cognitive load of writing queries, as well as their execution time, is a game-changer for me.

I wrote some Rust!

Thanks to a weekly Rust workshop hosted by one of my colleagues at QuestDB, along with some extra time spent in an airport lounge during a long layover, I was finally able to hack together some Rust that does something useful!

questdb-retention

The project I put together is questdb-retention, a small program that allows you to manage data retention in your own QuestDB instance.

Even though QuestDB does not support a traditional SQL DELETE statement (as of v7.0), you can still purge stale data from you database by using the DROP PARTITION command. Check out the docs for more information about the specifics around data retention.

The questdb-retention package allows you to drop old questdb partitions either interactively (on the command line) or through a yaml config file. Only tables that are partitioned are supported, for obvious reasons.

In my view, the best way to use this package is to write a yaml config, compile the executable, and add a cron job that runs the command at regular intervals. Here's an example of a yaml file that you can use:

---
conn_str: host=localhost user=admin password=quest port=8812
tables:
  my_partitioned_table_by_month: 5
  my_partitioned_table_by_day: 5

The config only specifies the number of partitions to keep per table, but since QuestDB supports partitioning by anywhere from HOUR to YEAR, it's difficult to tell how much data is being retained by just inspecting the config file. This is something that's worth improving in the future. Also, once the package reaches stability, I'll take a look at adding it to crates.io to make it official.

All in all, once I got the hang of some Rust basics, I really enjoyed working in the language! I find Rust's matching to be incredibly useful, and I'm starting to get comfortable with borrows, mutability, and common Rust types like Option and Result. I'm excited to learn more concepts like async and lifetimes, as well as starting to write some lower-level code that really can take advantage of all Rust has to offer.

My Fractal FM3 Setup

My Fractal FM3 Setup

I'm a longtime bass and guitar player who has lived in apartments for the better part of 15 years. Throughout this time, I've had a ton of "apartment friendly" guitar setups that used different combinations of effects pedals, amp simulators, headphones, low wattage amps, recording interfaces, and software plugins. Yet despite all of this tinkering, I've never been completely happy with my setup. Amps are too loud, "amp-like" pedals don't really sound like the real thing, and plugins lack the "feel" of a real amp.

When my family moved last year, I finally had my own office space! So of course, that meant also trying out a new guitar setup to work with the room. I figured that this time around, I would finally try an amp modeler. I wanted something portable with flexible I/O routing options, but the main criteria were tone and feel and a low volume.

I ended up choosing the Fractal Audio FM3, and for the first time in a long time, am completely happy with the result!

I'm not going to go into any of the device's features in this article; there are enough resources out there for that. Instead, I'll describe my ideal tone and setup, and how I got there with the FM3.

The Idea

I'm a fan of the hot-rodded Marshall sound; like modded JCM800s, Friedmans, and Bogners. A few years back, I found this video on r/toobamps and I was immediately hooked. This is my holy grail tone, and I would love to achieve something similar on the FM3.

Even though the FM3 is pretty much the most configurable guitar simulator around, I'm actually a fan of simplicity when it comes to my music setups. I'd prefer to just plug-in and play instead of tweaking lots of meta-parameters searching for the perfect sound. Even my pedal setups were relatively simple; I gave up on any menu-diving, digital pedals a long time ago and stuck to simple designs with only a few knobs to tweak.

I decided that I wanted to create an amp + pedalboard-like experience with the FM3. Given that the pedal only has 3 footswitches, that seems like an impossible task without buying an extender like the FC-6 or FC-12. But that's not what I wanted to do. I wanted the whole thing to fit under my sit/standing desk, and also in a backpack for travel. Luckily for me, the FM3 provides significant levels of programmability for its footswitches, as you'll see later.

The Sound

At its core, the effects chain is relatively standard:

guitar -> wah -> drive -> modulation -> amp -> speakers -> delay -> reverb -> out

For the amp, I chose the FAS Hot Rod, which is actually a custom model that is built on the ideal hot-rodded EL34-powered amp. The speakers are two different mics (and positions) of a classic '68 Marshall 4x12 cab.

Here's a picture of the chain:

Signal Chain

The Controls

This is where things start to get interesting. The FM3 offers 9 "scenes" per preset, where you can toggle different combinations of effects (and channels). As long as you're not changing effect or amp "channels", there is no sound cutoff between scene changes. But even with the number of effects that I have in my chain, there are too many possible combinations to create a scene for each. It is also difficult to seamlessly switch between 9 scenes on the FM3, despite the rich footswitch programmability.

So I decided to go for more of a traditional pedalboard-like experience, using the footswitches to toggle effects. How can I do this using only 3 switches? Here's a picture of my "Performance" footswitch layout to see how I do it:

Footswitches

I use the first switch as a dedicated drive switch, since I toggle the drive fairly often. If you hold this switch, it brings you to a utility menu (basically my tuner, since I don't use tap-tempo). A second tap (or hold) of the switch brings you back to the drive.

I use the second switch to toggle my modulation effect of choice. If I want to change the modulation effect, I just hold the pedal down and it cycles to the next effect type. In actuality, holding down the middle switch toggles a new footswitch view. This view is exactly the same as the first view, just with a different modulation effect assigned to the second pedal. So in practice, holding down the pedal just cycles through the modulation effects!

This can get a bit tricky, since it is possible to toggle an effect on and cycle to the next view, where the effect is still on, but you can't turn it off, since the footswitch is no longer controlling it. To turn it off again would require more holds of the footswitch to cycle back to the effect that you want to turn off. This isn't ideal, but I do my best to work with this limitation by keeping all of my modulation blocks on the unit's display as a visual indicator of what is on and off. Also I try to just use one modulation type-per song so I can reset all of the effects between songs.

The final switch toggles my delay. I also can toggle reverb by holding down the switch. This function actually causes a scene change to a new scene with reverb enabled, which actually creates another problem. When you change scenes, all of your effect settings (drive, modulation, and delay) are wiped out. This means that it's difficult to toggle reverb on-and-off in the middle of a song. This hasn't been a huge deal for me, since reverb is usually an always-on or always-off type of effect based on the situation. If I'm playing through headphones during practice, I like the reverb on. But if I'm in a live room, I might not need it since it can lead to a washed-out sound.

What about the wah? I have it set to be auto-enabled by an expression pedal. When the pedal is disengaged (below 5% of its max value), the wah is turned off. And as soon as I push the pedal up a bit, it engages the wah. I also like a "cocked wah" sound, where the wah is engaged at the low end of its range, which creates a cool filtered effect. I created a custom controller curve that gives me some more wiggle-room to hit that sweet spot before taking off into the higher filter ranges.

Wah Curve

And finally, what about cleans? At first I tried multiple amp-channel setups, but I found it tricky to control yet another variable in the setup, while maintaining the effect flexibility that I desired. Instead, I do this the old-fashioned way, by turning down the guitar volume knob! The Fractal models are incredibly amp-like, and respond very well to the guitar's volume knob. By setting the volume down to around 2-3 on the guitar and lightening my picking strength, I can get a very convincing "clean-enough" tone for my use.

Conclusion

Ok, so I actually did end up tweaking a bunch of meta parameters to get a setup that I like. Almost every time I come home from a jam, I'm back at the computer playing around with the footswitch and effect settings in the FM3-Edit software. But recently, my changes have been getting smaller and smaller; like tweaking the drive control or delay feedback by a few percent. I guess this means that I'm almost happy with my tone? And the added benefit is that the entire setup: FM3, expression pedal, strap, and cables, all fits in my normal-sized backpack!