Small Kafka: Tansu + SQLite on a free t3.micro (AWS Free Tier)

AWS has a free tier that can be used with the t3micro instance, having 1GiB of memory and an EBS baseline throughput of ~10MB/s. Great for kick starting an early stage project. A maximum EBS throughput of ~260MB/s, creates some headroom to flex up. Keep a beady eye on those CPU credits though!

I thought I'd try the Tansu broker on a t3micro using the embedded SQLite storage engine. All meta and message data is stored in this database. To backup or restore an environment, is as simple as copying tansu.db. In a zero downtime environment, the S3 storage engine could be used, allowing multiple stateless brokers to use the same bucket concurrently.

Let's spin up a t3micro instance with Amazon Linux 2023, to setup the following:

The Supplementary Packages for Amazon Linux (SPAL) package repository
Docker Compose to run the Tansu broker docker image
Enable the containerd (docker) service
Add the docker group to the ec2-user so that we can run docker commands
Reboot after installing the services

Using the following commands:

sudo dnf install -y spal-release
sudo dnf install -y docker-compose
sudo systemctl enable containerd.service
sudo usermod -a -G docker ec2-user
sudo /sbin/shutdown -r now

Log back in and create a compose.yaml, that will do the following:

We are using the latest Tansu docker image from ghcr.io/tansu-io/tansu. The image is from scratch containing only the Tansu statically linked binary (well, maybe a LICENSE and some SSL certificates too!).
Substitute the ADVERTISED_LISTENER_URL with the name of your instance
Default logging to the warn level
Use the SQLite STORAGE_ENGINE using the /data directory in the container
Map the /data directory in the container to the current directory on the host
Expose the Kafka API on port 9092

My compose.yaml, ensure that you change ADVERTISED_LISTENER_URL with the name of your instance:

---
services:
  tansu:
    image: ghcr.io/tansu-io/tansu
    environment:
      ADVERTISED_LISTENER_URL: tcp://ec2-35-179-120-103.eu-west-2.compute.amazonaws.com:9092/
      RUST_LOG: ${RUST_LOG:-warn}
      STORAGE_ENGINE: "sqlite://data/tansu.db"
    volumes:
      - ./:/data/
    ports:
      - 9092:9092

Start tansu with:

docker compose up -d

Check on the broker memory being used:

ps -p $(pgrep tansu) -o rss= | awk '{print $1/1024 " MB"}'
18.9336 MB

Create a test topic using the Tansu CLI:

docker compose exec tansu /tansu topic create test

Verify that the test topic has been created with:

docker compose exec tansu /tansu topic list | jq '.[].name'

Hopefully, at this point you see the name of the test topic.

I ran kafka-producer-perf-test on my local Mac Mini 4, connecting to the ec2 instance in eu-west-2 over the internet using a security group to restrict access:

kafka-producer-perf-test \
 --topic test \
 --num-records 200000 \
 --record-size 1024 \
 --throughput 7000 \
 --producer-props bootstrap.servers=ec2-35-179-120-103.eu-west-2.compute.amazonaws.com:9092

34518 records sent, 6892.6 records/sec (6.73 MB/sec), 69.8 ms avg latency, 206.0 ms max latency.
35357 records sent, 7067.2 records/sec (6.90 MB/sec), 42.0 ms avg latency, 109.0 ms max latency.
35031 records sent, 7004.8 records/sec (6.84 MB/sec), 22.7 ms avg latency, 63.0 ms max latency.
35058 records sent, 7010.2 records/sec (6.85 MB/sec), 24.6 ms avg latency, 54.0 ms max latency.
35025 records sent, 7005.0 records/sec (6.84 MB/sec), 26.9 ms avg latency, 91.0 ms max latency.
200000 records sent, 6989.3 records/sec (6.83 MB/sec), 35.56 ms avg latency, 206.00 ms max latency, 26 ms 50th, 89 ms 95th, 163 ms 99th, 184 ms 99.9th.

While the above is not going to beat any speed records, that's kind of the idea. Kick start a project on low cost (virtual) hardware (in this case ~$0/hr). Accumulate credit while the CPU is below baseline. Spend those credits during periods of demand. Scale onto bigger instances with demand, just by copying tansu.db into the new instance.

Breakdown of t3 instance types with an emphasis on CPU baseline and credit accumulation:

name	vCPU	Baseline vCPU	CPU Credit/hr
t3.micro	2	10%	12
t3.small	2	20%	24
t3.medium	2	20%	24
t3.large	2	30%	36
t3.xlarge	4	40%	96
t3.2xlarge	8	40%	192

Broker memory size:

[ec2-user@ip-172-31-47-213 ~]$ ps -p $(pgrep tansu) -o rss= | awk '{print $1/1024 " MB"}'
27.3008 MB

The broker is generally pretty frugal, with tuning to reduce allocations and cpu bottlenecks. It uses bytes extensively, as an efficient container for storing and operating on contiguous slices of memory.

Free and used memory in the system:

[ec2-user@ip-172-31-47-213 ~]$ free -m

Plenty of memory remaining in reserve and for usage as the page cache:

total	used	free	shared	buff/cache	available
916 MiB	230 MiB	96 MiB	0 MiB	589 MiB	552 MiB

Database size:

[ec2-user@ip-172-31-47-213 ~]$ ls -lh tansu.db
-rw-r--r--. 1 root root 265M Jan  9 11:43 tansu.db

tansu.db is a standard SQLite database file, you can use existing tools to inspect and update the data.

Obviously, other cloud providers are available. Any provider that can spin up a docker image can be used in this article. A statically linked Linux binary is also available with each release for environments that can't or if you prefer to run the broker directly.

Want to try it out for yourself? Clone (and ⭐) Tansu at https://github.com/tansu-io/tansu.

Tansu is an Apache licensed Open Source Kafka compatible broker, proxy and (early) client API written in async Rust with multiple storage engines (memory, null, PostgreSQL, SQLite and S3).