Talk "From Chaos to Control Automating BI Tools with Pydantic and Python"

Topic was presented at Munich Datageeks - February Edition 2026

Abstract

This talk by Patricia, a data/analytics engineer at VMO (a Munich-based startup), covers practical approaches to automating data governance within a BI tooling stack. With 250 employees sharing a single Metabase instance and only one data analyst in the company, manual governance is not viable. Patricia presents three automation solutions built on top of their data stack (Personio → Python/Dagster → BigQuery → Metabase): automated permission management based on HR data, documentation synchronization via dbt YAML files, and automated query adjustment when schema changes occur. The talk is grounded in real production systems and highlights both the technical implementation and the pragmatic reasoning behind each solution, including one clever use of memoization to work around Metabase API limitations.

About the Speaker

Patricia is a data and analytics engineer at Wemolo, a startup based in Munich with around 250 employees. Originally from Brazil, she has lived in Germany for many years and completed her master's degree there. Her work focuses on the data engineering side of analytics, including building and maintaining automated governance systems on top of the company's internal data stack.

Transcript Summary

The Data Stack at VMO

VMO's data stack ingests data from multiple sources including PostgreSQL, Salesforce, and Personio. Python scripts and Dagster (an orchestration tool comparable to Apache Airflow) are used to sync this data into BigQuery as the central data warehouse. Metabase serves as the BI layer, exposed to all internal users for self-service analytics. Because the company has only one data analyst for 250 employees, Metabase plays a central role in enabling non-technical staff to independently explore data and build dashboards.

Why Automate Governance?

The combination of a large user base and a minimal data team makes manual governance impractical. Key challenges include:

Ensuring new employees get the right data access automatically upon joining
Updating permissions when employees change roles or teams
Keeping documentation consistent across all tools without manual duplication
Keeping BI queries intact when the underlying data schema evolves

1. Automated Permission System

Data Source

Employee data is pulled from Personio, VMO's HR system. The relevant attributes are: name, email, department/team, position, and active status. This data flows into BigQuery and is then used to drive permission updates in Metabase.

Implementation

A Python connector abstracts the Personio API (v1), modelling the response as a stream and defining an Employees class that points to the /company/employees endpoint.
A separate Metabase resource module connects to Metabase via an API key, retrieves current users, and issues POST requests to assign users to the appropriate permission groups.
A scheduler (running every 5 minutes via Dagster) joins Personio employee records with Metabase user records by email address (as the unique key) and updates group memberships accordingly.

Example

A new user joining the Sales department as a lead would automatically receive editor and view permissions on the Sales collection in Metabase. A non-lead in the same department would receive view-only access. Changes in role or team trigger corresponding permission adjustments.

2. Documentation Synchronization

Approach

VMO uses dbt (data build tool) for all data models, and dbt natively stores metadata in YAML files. This existing structure is leveraged to create a single source of truth for column and table descriptions. Documentation is version-controlled in GitLab.

Synchronization to Metabase

A library (not built in-house) handles syncing dbt YAML metadata into Metabase automatically. Any description written in the YAML file appears in Metabase with consistent formatting: the column name is displayed without underscores and with an initial capital letter, while the description text is identical to what is in the YAML file.

Extended Metadata

Beyond basic descriptions, the YAML files can also carry additional Metabase-specific metadata such as caveats (e.g., noting that a dataset is updated daily or every 10 minutes). These are also synchronized automatically, ensuring end users always see accurate context alongside the data.

3. Query Adjustment on Schema Changes

The Problem

Metabase questions (saved queries) are built through a point-and-click interface — users select a table, apply filters, choose aggregation columns, and set sorting. These selections are stored by reference to specific column and table names in BigQuery.

When a column is renamed in BigQuery, Metabase marks the corresponding field as "unknown" in any affected questions. When a table is renamed, Metabase goes further and deletes all filters, summarizations, and sorting configurations from the question, requiring users to rebuild from scratch. With 13,000 saved questions in Metabase across 250 users, doing this manually is not feasible.

Implementation

The same Metabase API client used for permission management is reused here. However, the Metabase API does not expose a direct endpoint to query individual column names — only a full-database dump is available. For a database with 400 tables and roughly 100 columns each, loading this in full for every rename operation would take hours.

Memoization as a Solution

To address this, a memoization (caching) strategy is applied: the full database metadata is loaded once on the first rename operation, and the result is cached in memory. Subsequent rename operations within the same run reuse the cached data, making them significantly faster. The first call is slow, but all following calls are near-instant.