Small Kafka: Tansu + SQLite on a free t3.micro (AWS Free Tier)
AWS has a free tier that can be used with the t3micro instance, having 1GiB of memory and an EBS baseline throughput of ~10MB/s. Great for kick starting an early stage project. A maximum EBS throughput of ~260MB/s, creates some headroom to flex up. Keep a beady eye on those CPU credits though!
I thought I'd try the Tansu broker on a t3micro using the
embedded SQLite storage engine.
All meta and message data is stored in this database.
To backup or restore an environment, is as simple as copying tansu.db.
Let's spin up a t3micro instance with Amazon Linux 2023, to setup the following:
- The Supplementary Packages for Amazon Linux (SPAL) package repository
- Docker Compose to run the Tansu broker docker image
- Enable the
containerd(docker) service - Add the
dockergroup to theec2-userso that we can run docker commands - Reboot after installing the services
Using the following commands:
sudo dnf install -y spal-release
sudo dnf install -y docker-compose
sudo systemctl enable containerd.service
sudo usermod -a -G docker ec2-user
sudo /sbin/shutdown -r now
Log back in and create a compose.yaml, that will do the following:
- We are using the latest Tansu docker image from ghcr.io/tansu-io/tansu. The image is from scratch containing only the Tansu statically linked binary (well, maybe a LICENSE and some SSL certificates too!).
- Substitute the
ADVERTISED_LISTENER_URLwith the name of your instance - Default logging to the
warnlevel - Use the SQLite
STORAGE_ENGINEusing the/datadirectory in the container - Map the
/datadirectory in the container to the current directory on the host - Expose the Kafka API on port 9092
My compose.yaml, ensure that you change ADVERTISED_LISTENER_URL with the name of your instance:
---
services:
tansu:
image: ghcr.io/tansu-io/tansu
environment:
ADVERTISED_LISTENER_URL: tcp://ec2-35-179-120-103.eu-west-2.compute.amazonaws.com:9092/
RUST_LOG: ${RUST_LOG:-warn}
STORAGE_ENGINE: "sqlite://data/tansu.db"
volumes:
- ./:/data/
ports:
- 9092:9092
Start tansu with:
docker compose up -d
Check on the broker memory being used:
ps -p $(pgrep tansu) -o rss= | awk '{print $1/1024 " MB"}'
18.9336 MB
Create a test topic using the Tansu CLI:
docker compose exec tansu /tansu topic create test
Verify that the test topic has been created with:
docker compose exec tansu /tansu topic list | jq '.[].name'
Hopefully, at this point you see the name of the test topic.
I ran kafka-producer-perf-test on my local Mac Mini 4,
connecting to the ec2 instance in eu-west-2 over the internet using a
security group
to restrict access:
kafka-producer-perf-test \
--topic test \
--num-records 200000 \
--record-size 1024 \
--throughput 7000 \
--producer-props bootstrap.servers=ec2-35-179-120-103.eu-west-2.compute.amazonaws.com:9092
34518 records sent, 6892.6 records/sec (6.73 MB/sec), 69.8 ms avg latency, 206.0 ms max latency.
35357 records sent, 7067.2 records/sec (6.90 MB/sec), 42.0 ms avg latency, 109.0 ms max latency.
35031 records sent, 7004.8 records/sec (6.84 MB/sec), 22.7 ms avg latency, 63.0 ms max latency.
35058 records sent, 7010.2 records/sec (6.85 MB/sec), 24.6 ms avg latency, 54.0 ms max latency.
35025 records sent, 7005.0 records/sec (6.84 MB/sec), 26.9 ms avg latency, 91.0 ms max latency.
200000 records sent, 6989.3 records/sec (6.83 MB/sec), 35.56 ms avg latency, 206.00 ms max latency, 26 ms 50th, 89 ms 95th, 163 ms 99th, 184 ms 99.9th.
While the above is not going to beat any speed records, that's kind of the idea.
Kick start a project on low cost (virtual) hardware (in this case ~$0/hr).
Accumulate credit while the CPU is below baseline.
Spend those credits during periods of demand.
Scale onto bigger instances with demand, just by copying tansu.db into the new instance.
Breakdown of t3 instance types with an emphasis on CPU baseline and credit accumulation:
| name | vCPU | Baseline vCPU | CPU Credit/hr |
|---|---|---|---|
| t3.micro | 2 | 10% | 12 |
| t3.small | 2 | 20% | 24 |
| t3.medium | 2 | 20% | 24 |
| t3.large | 2 | 30% | 36 |
| t3.xlarge | 4 | 40% | 96 |
| t3.2xlarge | 8 | 40% | 192 |
Broker memory size:
[ec2-user@ip-172-31-47-213 ~]$ ps -p $(pgrep tansu) -o rss= | awk '{print $1/1024 " MB"}'
27.3008 MB
The broker is generally pretty frugal, with tuning to reduce allocations and cpu bottlenecks. It uses bytes extensively, as an efficient container for storing and operating on contiguous slices of memory.
Free and used memory in the system:
[ec2-user@ip-172-31-47-213 ~]$ free -m
Plenty of memory remaining in reserve and for usage as the page cache:
| total | used | free | shared | buff/cache | available |
|---|---|---|---|---|---|
| 916 MiB | 230 MiB | 96 MiB | 0 MiB | 589 MiB | 552 MiB |
Database size:
[ec2-user@ip-172-31-47-213 ~]$ ls -lh tansu.db
-rw-r--r--. 1 root root 265M Jan 9 11:43 tansu.db
tansu.db is a standard SQLite database file, you can use existing tools to inspect and update the data.
Obviously, other cloud providers are available. Any provider that can spin up a docker image can be used in this article. A statically linked Linux binary is also available with each release for environments that can't or if you prefer to run the broker directly.
Want to try it out for yourself? Clone (and ⭐) Tansu at https://github.com/tansu-io/tansu.
Tansu is an Apache licensed Open Source Kafka compatible broker, proxy and (early) client API written in async Rust with multiple storage engines (memory, null, PostgreSQL, SQLite and S3).
Other articles include:
- Tuning the broker with the null storage engine using cargo flamegraph
- CPU bottlenecks starting with a regular expression, stopped copying uncompressed data and used a faster CRC32 implementation using the SQLite storage engine
- Route, Layer and Process Kafka Messages with Tansu Services, the composable layers that are used to build the Tansu broker and proxy
- Apache Kafka protocol with serde, quote, syn and proc_macro2, a walk through of the low level Kafka protocol implementation used by Tansu
- Effortlessly Convert Kafka Messages to Apache Parquet with Tansu: A Step-by-Step Guide, using a schema backed topic to write data into the Parquet open table format
- Using Tansu with Tigris on Fly, spin up (and down!) a broker on demand
- Smoke Testing with the Bash Automated Testing System 🦇, a look at the integration tests that are part of the Tansu CI system