Talk "From Bias to Bots"

In this talk Karim Jedda continues his previous talks from "Finding a flat in Munich using Machine Learning" (unfortunately no recording of this one) and Finding Your Next Series to Watch Through Data Analysis. This talk was recorded on the special event of the 100th Munich Datageeks Meetup in October.

Abstract

In 2014-2015, Karim, a French data engineer living in Munich, struggled to find an apartment despite having stable employment. After submitting hundreds of manual applications with no responses, he built an automated bot to test whether his name was the problem. The experiment revealed stark discrimination: personas with Western-sounding names like Hannah received four times more responses than applications submitted under his own name, with all other variables kept constant. Scaling the experiment nationwide with Bayerischer Rundfunk and Spiegel Online confirmed this was a systemic pattern across Germany, not just Munich-specific.

Ten years later, Karim revisited this project to explore how technology has evolved. Modern AI tools make such experiments trivially easy to replicate, but also enable more sophisticated and opaque forms of algorithmic discrimination. Platforms now collect massive amounts of data to combat bots, creating invisible scoring systems that affect access to housing, employment, healthcare, and other opportunities. This leads to "math washing" (justifying discrimination as data-driven decisions) and "social cooling" (people self-censoring to optimize for algorithms).

The talk advocates for legal protections including algorithmic transparency, the right to contest automated decisions, and human involvement in important choices. Karim also promotes privacy-preserving technologies like zero-knowledge proofs that can verify necessary information without excessive data collection. He warns that without intervention, society risks a future where everyone needs bots to access basic opportunities, with discrimination compounding across generations through permanent data storage and AI-generated synthetic datasets.

About the speaker

Karim Jedda is a French data engineer who has lived in Germany for approximately 10 years. He currently serves as Director of Product Engineering at Parity Technologies, where he leads a team of 50 people developing various products. His previous experience includes roles at ProSieben and Audi.

Karim is the creator of "Data with Rust," a free educational website where he teaches data engineering using the Rust programming language. He also maintains a blog where he writes about technical projects and data-related topics.

What began as a personal struggle to find housing in Munich in 2014 led him to combine his data engineering skills with investigative journalism, conducting a groundbreaking experiment on housing discrimination that gained national attention in Germany. His work on algorithmic bias and discrimination has made him an advocate for privacy-preserving technologies and fair algorithmic systems. At Parity Technologies, he works on advanced technologies including zero-knowledge proofs to build more equitable digital systems.

Transcript summary

The Initial Housing Search Problem (2014-2015)

When Karim arrived in Germany in 2014 with only a backpack, he faced unexpected difficulties finding accommodation in Munich. Despite having a stable job and income, he spent years searching for an apartment. His daily routine involved filling out dozens of application forms on platforms like WG-Gesucht, ImmobilienScout24, and ImmoWelt, but he received almost no responses. One memorable incident involved an elderly landlady who took his ID and disappeared for an hour, leaving him anxious in her kitchen.

First Technical Solution: Automation

Initially theorizing that he was simply too slow to respond to listings in Munich's competitive housing market, Karim developed a bot to automate the application process. This system scraped websites, automatically filled out forms with his information, and sent applications in a continuous loop. Over several weeks, he submitted 200-300 applications but still received virtually no responses. The automation captured email responses and stored data, but it only succeeded in efficiently collecting rejection emails.

The Experimental Pivot

After multiple data points suggested a pattern, Karim decided to run a controlled experiment rather than continue assuming he was the problem. He modified his bot to submit applications using different personas with varying names and backgrounds while keeping all other variables constant. The goal was to determine whether his name was influencing response rates.

Experimental Methodology

The technical implementation involved using a fake name generator website to create different personas, each with its own email account. He used spin syntax (a templating technique) to generate application messages. Critically, the same grammatical mistakes were included in all messages regardless of which persona sent them, ensuring the only variable was the name and associated background. When personas other than himself received positive responses, he did not follow through with viewings, simply documenting the results.

The system automatically found new apartments, applied with different profiles, analyzed email responses, and matched responses back to specific applications. He stored the data in MongoDB. One technical challenge involved solving CAPTCHAs, for which he used TensorFlow to train a model that could recognize patterns in the older Java-based CAPTCHA libraries some platforms used.

Initial Results

The experiment revealed significant discrimination based on names. A persona with a Western-sounding name like Hannah had four times higher chances of receiving responses compared to Karim's applications. Even a persona he called Hans (which he later learned he misspelled with a Z instead of an S) had better success rates. These findings suggested that nominative determinism played a real role in housing opportunities.

Public Response and Scaling

Karim shared his findings online through forum posts, which gained attention on Hacker News and other platforms. The response was mixed, with some people questioning his methodology or accusing him of fabricating data. Despite proving discrimination through data, he still remained without an apartment and was living on a friend's couch.

Bayerischer Rundfunk and Spiegel Online approached him to scale the experiment nationwide across Germany. Though the compensation was modest, the collaboration provided legitimacy and resources. The expanded experiment used a similar technical setup with added components: Celery for task queuing and retry logic, and continued use of TensorFlow for CAPTCHA solving.

Nationwide Findings

The larger-scale experiment confirmed that the discrimination pattern was not Munich-specific but rather a nationwide phenomenon. The same biases appeared consistently across German cities including Frankfurt, indicating systemic issues in housing access based on names and perceived backgrounds.

Personal Positive Outcome

While the experiment did not directly help Karim find an apartment, his original talk at Munich Datageeks led to a connection when Linus Blomquist introduced him to someone at Audi, resulting in a good job opportunity. This demonstrated that speaking out about discriminatory experiences could lead to unexpected positive outcomes.

Modern Context: How the Experiment Could Be Done Today

Ten years later, the technical landscape has changed dramatically. Today, the same experiment could be implemented in an afternoon using large language models to generate personas and data, browser automation agents to apply to listings, and AI systems to solve CAPTCHAs automatically. This ease of implementation means anyone can now run similar experiments or use bots for scalping (being first to access high-demand items like concert tickets or apartments).

The Arms Race: Platforms vs. Bots

From the platform perspective, the proliferation of bots creates constant challenges. Platforms respond by collecting more data to distinguish between legitimate users and automated systems. This data gets stored and cross-referenced to make access decisions, creating what Karim calls a bias roulette. The result is an ongoing arms race where more data collection inevitably means more potential for baked-in bias.

Algorithmic Discrimination and Math Washing

Modern systems can now automate discrimination before applications are even submitted. Based on names, social activity, and numerous other parameters, algorithms can predict whether someone will have access to opportunities. Karim introduced the concept of math washing, also called decision bias laundering through math, where discriminatory decisions are justified by claiming they are data-driven and algorithmic rather than human choices. This makes it nearly impossible to identify which single variable caused someone to be denied access, as modern systems use multivariate models with numerous inputs.

The Segmentation Problem

Websites segment users based on numerous behaviors: reading habits, travel patterns, device usage (iPhone vs Android), and countless other factors. With sufficient segmentation, individuals can be uniquely identified, similar to how the Akinator game can identify anything with just 20 yes-or-no questions. This creates invisible scores that function like names but cannot be seen or contested by the affected individuals.

The awareness that algorithms are making decisions about life opportunities leads to self-censorship and conformity. People rationally optimize their behavior to pass algorithmic filters, avoiding certain topics or reactions to stay out of problematic classification spaces. This phenomenon, called social cooling, results in invisible control where platforms and algorithms shape opportunities without transparency.

Current Examples of Algorithmic Bias

The speaker referenced several contemporary examples: hiring platforms using video analysis that can eliminate candidates based on factors like how they smile during an interview, and platforms implementing filters on characteristics that arguably should not serve as filters. These systems learn to discriminate based on biased training data, perpetuating and potentially amplifying existing inequalities.

Data Permanence and Future Implications

Data storage is now so cheap that virtually nothing gets deleted. Even if data is not accessible today, it could become accessible tomorrow through policy changes like chat control legislation. This raises questions about whether past actions could affect not only individuals but potentially their relatives in the future, as connecting people through data relationships is technically trivial.

Generative AI Amplification

Generative AI makes these problems significantly worse by enabling the creation of synthetic datasets that can poison training data. Someone could generate false information about a person they dislike and contaminate datasets without needing to prove accuracy. This represents a new vector for discrimination that is difficult to detect or combat.

Proposed Solutions: Legal Frameworks

Karim outlined four essential rights for democratic societies:

The right to know which algorithms govern your life
The right to understand how decisions affecting you were made (transparency and auditability)
The right to contest your score or classification
The right to human involvement in important decisions rather than purely automated systems

He noted that the EU AI Act of 2024 includes a right to explanation, which is a positive first step, though not sufficient on its own.

Technical Solutions: Privacy-Preserving Technology

The speaker advocated for privacy-preserving technologies that can verify necessary information without collecting excessive data. For housing, essential questions might include citizenship status and sufficient income to pay rent. Zero-knowledge proofs and similar advanced mathematical techniques can prove required facts without revealing unnecessary personal information. Karim mentioned that his company, Parity Technologies, works on building such systems.

Systemic Responsibility

The speaker argued against architecting systems that entrench systemic bias merely to sell more advertisements or optimize short-term metrics. He posed fundamental questions: Should unfair systems be allowed to operate? Should society accept these biases as inevitable, or should there be active resistance and alternative development?

Broader Applications Beyond Housing

While the original experiment focused on housing discrimination, the same methodology and concerns apply to employment, finance, healthcare, information access, and any opportunity accessed through digital means. The speaker questioned whether similar experiments have been conducted in these other domains, suggesting that many discriminatory voices may be lost in the noise because there is not always someone like him documenting the problems with bots and data.

Generational Concerns

The talk raised the prospect that what started as name-based discrimination in housing could evolve into a future where everyone faces algorithmic barriers to opportunities based on digital histories and behaviors. The question of whether today's online decisions could affect relatives in the future highlights how discrimination can compound across generations.

Call to Action

Karim provided specific action items:

Demand explanations for algorithms making decisions about your life
Maintain skepticism around data despite working in data-driven fields
Resist optimizing yourself into conformity
Build alternative technologies that shape a fairer future
Consider whether you need your own bot to access opportunities fairly

He emphasized that hope is not a strategy, which is why he continues to speak about these issues. He expressed a desire that in another 12 years, at the 200th edition of Munich Datageeks, this talk would be obsolete because society would have built systems preventing bias from creeping in, and children would not need bots to get fair chances in life.

Closing Vision

The speaker painted a dystopian image of a future with people using bots against other people's bots, all watched by bots, merely trying to survive and access basic opportunities. He contrasted this with an aspirational vision of fair, privacy-preserving systems that enable greater equality in society, urging the audience to actively work toward the latter outcome.