Using Tansu with Tigris on Fly

Tansu is an Apache Kafka® compatible stateless broker. Rather than replicating data, Tansu relies on the durability of its underlying storage. For example, Amazon S3 is designed to exceed 99.999999999% (11 nines) of durability. Tansu can use any S3 compatible or PostgreSQL database for storage. In this article we deploy Tansu on Fly using Tigris Data's S3 compatible storage. Using flycast we elastically scale up on demand and back to zero when quiescent. We do this without planning, reassigning and waiting to replicate data to other brokers. No waiting for Raft (or ZooKeeper) to reach... consensus about who is a leader or follower. All brokers are Spartacus.

All in about 40 lines of fly.toml configuration.

Firstly, download and install the fly command line with these instructions.

Create a new directory called fly-tigris-demo and use fly launch to clone the tansu-io/fly-tigris-demo template (note the --no-deploy so that we can apply some tweaks first):

mkdir fly-tigris-demo
cd fly-tigris-demo
fly launch --from https://github.com/tansu-io/fly-tigris-demo --no-deploy

When asked Would you like to copy its configuration to the new app?, hit Y.

When asked Do you want to tweak these settings before proceeding?, hit N.

Output will look be something like:

Launching from git repo https://github.com/tansu-io/fly-tigris-demo
Cloning into '.'...
remote: Enumerating objects: 4, done.
remote: Counting objects: 100% (4/4), done.
remote: Compressing objects: 100% (4/4), done.
remote: Total 4 (delta 0), reused 4 (delta 0), pack-reused 0 (from 0)
Receiving objects: 100% (4/4), done.
An existing fly.toml file was found for app tansu
? Would you like to copy its configuration to the new app? Yes
Using build strategies '[the "ghcr.io/tansu-io/tansu:pr-165" docker image]'. Remove [build] from fly.toml to force a rescan
Creating app in /Users/bickle/tmp/fly-tigris-demo
We're about to launch your app on Fly.io. Here's what you're getting:

Organization: Travis Bickle            (fly launch defaults to the personal org)
Name:         tansu                    (from your fly.toml)
Region:       London, United Kingdom   (this is the fastest region for you)
App Machines: shared-cpu-1x, 256MB RAM (from your fly.toml)
Postgres:     <none>                   (not requested)
Redis:        <none>                   (not requested)
Tigris:       <none>                   (not requested)

? Do you want to tweak these settings before proceeding? No
Created app 'tansu' in organization 'personal'
Admin URL: https://fly.io/apps/tansu
Hostname: tansu.fly.dev
Wrote config file fly.toml
Validating /Users/bickle/tmp/fly-tigris-demo/fly.toml
✓ Configuration is valid
Your app is ready! Deploy with `flyctl deploy`

Create a new S3 bucket on Tigris using:

fly storage create

Use the default name provided, or of your choice:

? Choose a name, use the default, or leave blank to generate one: tansu
Your Tigris project (tansu) is ready. See details and next steps with: https://fly.io/docs/reference/tigris/

Setting the following secrets on tansu:
AWS_ACCESS_KEY_ID: tid_YOUR_ACCESS_KEY_ID
AWS_ENDPOINT_URL_S3: https://fly.storage.tigris.dev
AWS_REGION: auto
AWS_SECRET_ACCESS_KEY: tsec_YOUR_SECRET_KEY_ID
BUCKET_NAME: tansu

Secrets are staged for the first deployment

You can verify the secrets that have been created with:

fly secrets list

The secrets AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, AWS_REGION and BUCKET_NAME are automatically used by Tansu:

NAME                    DIGEST                  CREATED AT 
AWS_ACCESS_KEY_ID       8cbc2a5f5f704ae7        1m12s ago       
AWS_ENDPOINT_URL_S3     85e8ac62d7de0c23        1m12s ago       
AWS_REGION              274e16452b90854d        1m12s ago       
AWS_SECRET_ACCESS_KEY   08469f68dfc80810        1m12s ago       
BUCKET_NAME             057611985ecb3ae3        1m12s ago

Our fly.toml is already setup to use these secrets to communicate with Tigris S3 using: STORAGE_ENGINE = "s3://${BUCKET_NAME}":

[env]
RUST_LOG = 'warn,tansu_server=debug,tansu_storage=debug,tansu_schema_registry=debug'
AWS_ENDPOINT = "https://fly.storage.tigris.dev"
ADVERTISED_LISTENER_URL = "tcp://${FLY_APP_NAME}.flycast:9092/"
CLUSTER_ID = "tansu-fly-tigris"
STORAGE_ENGINE = "s3://${BUCKET_NAME}"

The ${FLY_APP_NAME} is part of the environment that all fly machines have. We advertise our address as tcp://${FLY_APP_NAME}.flycast:9092/. This is a flycast private IPv6 address that we allocate to connect with the Tansu brokers using:

fly ips allocate-v6 --private

Finally, deploy the application onto fly with:

fly deploy

Start an interactive shell using the Apache Kafka® Java client to connect to Tansu:

fly machine run --shell apache/kafka:3.9.0

Create a test topic, note that our bootstrap-server is ${FLY_APP_NAME}.flycast:9092 using the flycast address allocated earlier:

/opt/kafka/bin/kafka-topics.sh \
  --bootstrap-server ${FLY_APP_NAME}.flycast:9092 \
  --partitions=3 \
  --replication-factor=1 \
  --create \
  --topic test

A demo, isn't a demo without a Hello World!. Lets produce a message:

echo "hello world" | \
/opt/kafka/bin/kafka-console-producer.sh \
  --bootstrap-server ${FLY_APP_NAME}.flycast:9092 \
  --topic test

Fetch a message, using a consumer group, expect a short delay while the group is formed:

/opt/kafka/bin/kafka-console-consumer.sh \
  --bootstrap-server ${FLY_APP_NAME}.flycast:9092 \
  --consumer-property fetch.max.wait.ms=15000 \
  --group test-consumer-group \
  --topic test \
  --from-beginning \
  --property print.timestamp=true \
  --property print.key=true \
  --property print.offset=true \
  --property print.partition=true \
  --property print.headers=true \
  --property print.value=true

The consumer should output:

CreateTime:SOME_TIME	Partition:1	Offset:0	NO_HEADERS	null	hello world

Hit ^C a couple of times to exit the consumer.

If you wait a minute or so, the Tansu brokers will scaling to zero shutting down automatically. The brokers are stateless, without the overhead of distributed consensus. All data is persisted in S3 using optimistic concurrency control, made possible with conditional write support, offered by most S3 vendors.

You can verify whether the brokers are still running using:

fly machines ls

Checking that they're all stopped. If you now run:

/opt/kafka/bin/kafka-topics.sh --bootstrap-server ${FLY_APP_NAME}.flycast:9092 --list

A broker will automatically restart to handle this request scaling back up from zero. There is a short delay while waiting for the broker to become ready. In environments where such a delay isn't acceptable, scaling to one may be appropriate (using fly scale). Brokers can also run in multiple regions without overhead because they are stateless.

Tansu uses lightweight from scratch super minimal docker images containing only a static binary (plus some certificates because... SSL!). Our footprint is a few megabytes rather than the hundreds commonly found in docker images. First byte response times start with a lightweight container that can be deployed to a new fly machine as we scale up on demand. Tansu is written in Rust which doesn't have the startup overhead of a language VM runtime or JIT compiler needing to warm up. All of which reminds me of Grace Hopper's nanoseconds:

Grace Murray Hopper: Visualizing Nanoseconds

Licensed under the GNU AGPL, Tansu is written in 100% safe 🦺 async 🚀 Rust 🦀 and is available on GitHub.