June 22, 2025

Kautos Data Platform

Status

In Progress

Core Components

  • šŸ”„ Ingestion with dlt
    • Pulls structured entities (e.g., events, cultures, gods) from the World Anvil API
    • Handles pagination, auth, schema evolution, and re-ingestion with minimal friction
  • šŸ’½ Storage in DuckDB
    • Local analytical store for fast querying, schema experimentation, and historical lookbacks
    • Lightweight setup with full SQL support for entity joins and exploratory analysis
  • 🌐 Orchestration with Dagster
    • Uses asset-based orchestration to define and visualize entity-level lineage
    • Enables selective re-materialization and testing for individual lore assets
    • Provides a clean abstraction layer to scale up if/when the dataset grows
  • 🧠 Conceptual Impact
    • Introduced a lineage-first mental model for all future pipeline design
    • Reinforced the idea that even non-production projects can teach scalable infrastructure patterns

ā€

Project Details

Premise:
What started as a quirky lore-management experiment became a personal turning point in how I approach data engineering. I built a data platform to ingest and transform conworld data for my fantasy setting, Kautos: pulling from the World Anvil API into DuckDB using dlt, orchestrated by Dagster. It began as an exercise in organizing fictional timelines and divine genealogies, but it unlocked something deeper: a new mental model for pipeline design rooted in assets, lineage, and clarity.

Why it mattered:
This project wasn’t just about managing fictional data: it was about discovering the power of orchestration as a paradigm. Dagster’s asset graph didn’t just help me build a lore pipeline; it reframed how I thought about dependencies, visibility, and reproducibility. That conceptual clarity carried directly into my real-world work: the same patterns I explored here became the backbone of production ML pipelines I later implemented at Givzey. By starting with fake gods and apocalyptic events, I ended up building infrastructure that could support real ones.

What I tried:
I built a pipeline that ingests structured lore data from the World Anvil API using dlt, stages it in DuckDB, and allows me to track lineage, recompute assets, and iterate quickly on data modeling. Dagster’s asset graph gave me an intuitive way to visualize how different entities: gods, events, timelines, locations: related to each other across refresh cycles. I ran daily refreshes, tested schema changes, and experimented with contextual transformations to keep lore data queryable and internally consistent. This became my sandbox for developing ETL discipline—without the pressure of production, but with all the complexity of world-scale storytelling.

ā€