SQLite storage for Tansu
In this article we will use Tansu an Apache licensed Kafka compatible streaming platform, to generate test data from a Protobuf backed topic storing the data in a π SQLite database. Using the Tansu CLI we fetch encoded messages displaying them in JSON without additional tooling.
SQLite uses a single file for storage, making it ideal for local development, creating data that can be repeatably used in a test environment, and smaller scale production deployments.
Lets get straight into using a broker with the SQLite storage engine and generate some protobuf schema backed messages using the Tansu CLI:
tansu broker \
--storage-engine=sqlite://tansu.db \
--schema-registry=file://./etc/schema
The tansu broker will use a SQLite database file called tansu.db
in the current directory for storage of
Kafka messages and other broker metadata (topics, consumer groups, message offsets, etc). When the broker
starts it creates the necessary database schema if it does not already exist.
Tansu supports optional broker validation of Avro,
Protobuf or
JSON Schema messages. In this example we are using
a customer
topic, the broker will search for customer.avsc
, customer.proto
or customer.json
files in the ./etc/schema
directory (S3 is also supported).
When a schema is present for a topic,
the broker will reject message batches with an InvalidRecord
error that are invalid for the schema. Validation is purely on the broker and no client changes are necessary.
Validation also unlocks the real time
conversion of data into Apache Parquet,
Apache Iceberg or Delta Lake
open table formats for downstream processing.
Create a customer topic using the Tansu CLI:
tansu topic create customer
The customer
topic is backed by a Protobuf schema.
Tansu uses FieldOption metadata to
embed rhai scripts into protobuf schemas to generate fake data for the topic.
A customer has an Address
that uses the building_number,
street_name,
city_name,
post_code and
country_name functions to generate a fake address. A customer
is represented by a Value
message with an email address,
full name (combining first_name
with last_name), home address and a random list of between 1 and 3
industries:
message Address {
string building_number = 1 [(generate).script = "building_number()"];
string street_name = 2 [(generate).script = "street_name()"];
string city = 3 [(generate).script = "city_name()"];
string post_code = 4 [(generate).script = "post_code()"];
string country_name = 5 [(generate).script = "country_name()"];
}
message Value {
string email_address = 1 [(generate).script = "safe_email()"];
string full_name = 2 [(generate).script = "first_name() + ' ' + last_name()"];
Address home = 3;
repeated string industry = 4 [(generate).repeated = {script: "industry()", range: {min: 1, max: 3}}];
}
The generate_message_kind
function in Tansu, shows how simple it is to register the fake data functions with the rhai scripting engine.
The full schema is available used by customer is here.
Leaving the broker running, in another terminal we generate some test data using the Tansu CLI generator command:
tansu generator --schema-registry=file://./etc/schema \
--per-second=160 \
--producers=8 \
--batch-size=20 \
--duration-seconds=180 \
customer
The generator uses the generic cell rate algorithm from the governor crate to limit the rate of message generation. In this example, generating 160 messages per second, using 8 producers with a batch size of 20 messages for a duration of 3 minutes.
You can fetch messages from topics using the tansu cat consume
command, decoding
Avro,
Protobuf or
JSON Schema using the schema registry into JSON:
tansu cat consume --schema-registry=file://./etc/schema customer
Will return a series of JSON encoded customer messages showing the generated fake data:
[{"key":null,
"value": {
"emailAddress":"dedric@example.org",
"fullName":"Shawn Auer",
"home": {
"buildingNumber":"108",
"city":"Howell view",
"countryName":"Saint Martin",
"postCode":"66180-9718",
"streetName":"Huel Green"},
"industry":["Restaurants","Newspapers"]}}]
In this article we have used Tansu, a single statically linked ~150MB binary, containing a broker with schema validation for AVRO, JSON and Protobuf, message generator (producer) and consumer, and a topic management CLI.
Tansu has the same Kafka compatible API but with storage options that suit your development and testing process through to production:
- π SQLite (via libSQL)
- π Turso Database an in-process SQL database written in Rust, compatible with SQLite (alpha: currently feature locked)
- memory (for ephemeral environments)
- S3
- PostgreSQL
Other articles in this series include: