SQLite storage for Tansu

In this article we will use Tansu an Apache licensed Kafka compatible streaming platform, to generate test data from a Protobuf backed topic storing the data in a 🆕 SQLite database. Using the Tansu CLI we fetch encoded messages displaying them in JSON without additional tooling.

SQLite uses a single file for storage, making it ideal for local development, creating data that can be repeatably used in a test environment, and smaller scale production deployments.

Lets get straight into using a broker with the SQLite storage engine and generate some protobuf schema backed messages using the Tansu CLI:

tansu broker \
    --storage-engine=sqlite://tansu.db \
    --schema-registry=file://./etc/schema

The tansu broker will use a SQLite database file called tansu.db in the current directory for storage of Kafka messages and other broker metadata (topics, consumer groups, message offsets, etc). When the broker starts it creates the necessary database schema if it does not already exist.

Tansu supports optional broker validation of Avro, Protobuf or JSON Schema messages. In this example we are using a customer topic, the broker will search for customer.avsc, customer.proto or customer.json files in the ./etc/schema directory (S3 is also supported). When a schema is present for a topic, the broker will reject message batches with an InvalidRecord error that are invalid for the schema. Validation is purely on the broker and no client changes are necessary. Validation also unlocks the real time conversion of data into Apache Parquet, Apache Iceberg or Delta Lake open table formats for downstream processing.

Create a customer topic using the Tansu CLI:

tansu topic create customer

The customer topic is backed by a Protobuf schema. Tansu uses FieldOption metadata to embed rhai scripts into protobuf schemas to generate fake data for the topic. A customer has an Address that uses the building_number, street_name, city_name, post_code and country_name functions to generate a fake address. A customer is represented by a Value message with an email address, full name (combining first_name with last_name), home address and a random list of between 1 and 3 industries:

message Address {
    string building_number = 1 [(generate).script = "building_number()"];
    string street_name = 2 [(generate).script = "street_name()"];
    string city = 3 [(generate).script = "city_name()"];
    string post_code = 4 [(generate).script = "post_code()"];
    string country_name = 5 [(generate).script = "country_name()"];
}

message Value {
    string email_address = 1 [(generate).script = "safe_email()"];
    string full_name = 2 [(generate).script = "first_name() + ' ' + last_name()"];
    Address home = 3;
    repeated string industry = 4 [(generate).repeated = {script: "industry()", range: {min: 1, max: 3}}];
}

The generate_message_kind function in Tansu, shows how simple it is to register the fake data functions with the rhai scripting engine. The full schema is available used by customer is here.

Leaving the broker running, in another terminal we generate some test data using the Tansu CLI generator command:

tansu generator --schema-registry=file://./etc/schema \
                --per-second=160 \
                --producers=8 \
                --batch-size=20 \
                --duration-seconds=180 \
                customer

The generator uses the generic cell rate algorithm from the governor crate to limit the rate of message generation. In this example, generating 160 messages per second, using 8 producers with a batch size of 20 messages for a duration of 3 minutes.

You can fetch messages from topics using the tansu cat consume command, decoding Avro, Protobuf or JSON Schema using the schema registry into JSON:

tansu cat consume --schema-registry=file://./etc/schema customer

Will return a series of JSON encoded customer messages showing the generated fake data:

[{"key":null,
  "value": {
    "emailAddress":"dedric@example.org",
    "fullName":"Shawn Auer",
    "home": {
        "buildingNumber":"108",
        "city":"Howell view",
        "countryName":"Saint Martin",
        "postCode":"66180-9718",
        "streetName":"Huel Green"},
    "industry":["Restaurants","Newspapers"]}}]

In this article we have used Tansu, a single statically linked ~150MB binary, containing a broker with schema validation for AVRO, JSON and Protobuf, message generator (producer) and consumer, and a topic management CLI.

Tansu has the same Kafka compatible API but with storage options that suit your development and testing process through to production:

🆕 SQLite (via libSQL)
🆕 Turso Database an in-process SQL database written in Rust, compatible with SQLite (alpha: currently feature locked)
memory (for ephemeral environments)
S3
PostgreSQL