Talk "The Design Document - A Dull Beginning for a Drama Free Ending"
ML project failures—from misaligned goals to ignored baselines—stem from one root cause: lack of design documentation. Six real-world cases demonstrate how two weeks of upfront doc writing prevents months of wasted effort and embarrassing failures.
Topic was presented at Munich Datageeks - January Edition 2026
Abstract
This talk examines the critical role of design documentation in machine learning projects through the lens of failure. Drawing from real-world cases across banking, car rental, and startup environments, the speaker demonstrates how six different project failures—ranging from misaligned goals to inadequate monitoring—can be traced back to a single root cause: lack of proper design documentation. Despite widespread agreement on its importance, documentation is often neglected due to delivery pressure, perceived tedium, and lack of incentive structures. The presentation advocates for a documentation-first approach, recommending a two-week investment in creating comprehensive design docs before coding begins. This structured approach addresses common pitfalls including goal misalignment, measurement mechanism failures, communication breakdowns, wrong metric optimization, skipping baselines, and inadequate processes. The speaker provides practical guidance on implementing design docs, emphasizing cross-team collaboration, problem-space thinking, and embracing iterative refinement over perfectionism.
About the Speaker
Mark is a long-time attendee of Munich Datageeks meetups, having participated for approximately five years before giving his first presentation. His professional background involves working with machine learning projects and data science teams, where he has directly experienced and observed numerous project failures. His expertise lies in understanding the practical challenges of ML project management and the common pitfalls that teams encounter when deploying models to production. Mark's approach emphasizes learning from failures rather than only celebrating successes, positioning failure stories as valuable teaching moments for professional growth in the data science community.
Transcript Summary
The Documentation Paradox
The fundamental challenge in data science projects stems from a universal contradiction: while most practitioners agree that documentation is useful and important, nearly everyone dislikes writing it. This resistance creates a critical vulnerability at the start of projects that often leads to failure down the line.
Three primary reasons drive this documentation avoidance:
Pressure to deliver represents the most significant barrier. Results are typically wanted immediately, and in machine learning environments, writing documentation is rarely perceived as progress toward project goals.
Perceived monotony creates another obstacle. Data science was famously labeled the sexiest job of the 21st century, an image that conflicts sharply with the reality of documentation work.
Lack of incentives completes the trifecta of discouragement. The speaker notes never having encountered anyone who received a promotion or raise specifically for producing excellent design documentation, raising questions about why individuals should invest effort without recognition.
Case Study 1: The Banking Conversion Rate Confusion
A bank developed a system to optimize debt collection reminders by predicting conversion rates—the probability that debtors would repay after receiving reminders. The system performed well technically and impressed stakeholders to the point where the vice president requested a personal presentation.
During this high-stakes presentation, a critical misalignment emerged. The VP interrupted to ask not about who would pay back, but why people were paying back. This seemingly minor distinction revealed a fundamental problem: the question of who will pay versus why they pay represents entirely different analytical challenges requiring completely different approaches.
This goal misalignment, established at the project's inception, forced the team to completely restart their work. The failure occurred because the team had never properly documented and validated their project objectives with stakeholders before investing significant development time.
Case Study 2: The Measurement Mechanism Failure
The same banking project encountered an earlier failure that proved equally instructive. The model showed exceptional offline performance, achieving a conversion rate of 0.9 on historical data—far surpassing the baseline of 0.5. This impressive result suggested the system would deliver substantial value in production.
However, when deployed, the system produced a conversion rate of only 0.35—worse than the baseline and nearly three times below expectations. This catastrophic performance gap stemmed from measurement methodology issues.
The data science team remained unaware of a critical operational detail: a three-day delay existed between debt repayment and that information appearing in the company database. This lag created scenarios where debtors who had already paid received reminders, ignored them as irrelevant, and were incorrectly classified as false negatives in the conversion rate calculation.
This failure highlighted the essential need for data teams to communicate with infrastructure teams about how metrics are measured and how data flows through systems.
Case Study 3: The Ignored Recommendations
A car rental company sought to optimize its landing page to maximize conversions. The specific focus involved the top row displaying three vehicles—the prime positions where customer attention peaks and every placement decision carries significant weight.
The team developed and deployed a machine learning system to make optimal recommendations for these crucial positions. Initially, deployment appeared successful. However, within weeks, a troubling pattern emerged: the system's recommendations were frequently ignored and not displayed to customers.
Investigation revealed the cause: the business department had independently decided that certain vehicles were prohibited from appearing in the top row for business reasons. This restriction had never been communicated to the ML team during development.
The failure demonstrated how communication breakdowns between departments can render technically sound solutions practically worthless. Despite countless books and resources emphasizing the importance of cross-functional communication, such disconnects continue occurring in organizations.
Case Study 4: The Wrong Metrics Optimization
The same car rental company undertook another project: optimizing staff planning for customer-facing branches where renters collect vehicles. The challenge involved balancing agent numbers—too many agents represent wasted labor costs, while too few create negative customer experiences through excessive wait times.
The solution required forecasting expected demand at specific branches and times. The team developed a forecasting system that performed well on paper but poorly in production, similar to the banking case but for different underlying reasons.
This failure stemmed from optimizing standard regression metrics like root mean square error and mean absolute error. While these metrics are mathematically sound and computationally convenient, they often fail to reflect actual business needs. Improving these metrics does not necessarily translate to practical business improvements.
Teams gravitate toward standard metrics because they are fast and easy to implement. However, this convenience comes at the cost of misalignment with real-world requirements and business value.
Case Study 5: The Baseline That Nobody Ran
A company attempted to improve an existing travel time estimation model through a sophisticated approach: stacking a second model on top of the original to produce corrections to the first model's predictions. This approach seemed to demand complex algorithms, numerous features, and advanced techniques.
The team invested months developing and deploying this complicated solution. Only after deployment did someone suggest testing a simple baseline: using the median of the target variable from the training set as the prediction.
This trivial baseline produced results statistically equivalent to the sophisticated stacked model. Months of work had been invested in recreating performance that could be achieved with basic statistical calculations.
The lesson proves fundamental: always establish baselines before pursuing complex solutions. Skipping this step represents wasted effort and missed opportunities to identify simpler, equally effective approaches.
Case Study 6: The Monitoring and Process Failure
A small startup with a limited data team operated several models in production, including one critical revenue-generating model. Improving this model became a top priority, consuming most of the data department's time and resources as they conducted extensive AB testing to beat the champion model.
Eventually, the team achieved what appeared to be a breakthrough. Company-wide announcements were made, meetings scheduled, and expectations built for bonuses and recognition from leadership.
Days before the planned presentation to the CTO, someone double-checked the predictions. This verification revealed a devastating truth: the impressive results were illusory. Due to inadequate monitoring and reporting procedures, earlier iterations of the challenger model had overwritten the champion model's results.
The team had not beaten the current champion model—they had only outperformed a flawed version of their own challenger model. The breakthrough was entirely artificial, created by poor process management rather than genuine improvement.
This failure demonstrated that success in ML requires more than technical prowess and mathematical sophistication. Proper processes, monitoring systems, and verification procedures are equally essential.
The Common Thread: Design Documentation
Despite their diversity, all six failures share a common prevention mechanism: proper design documentation created at project inception. Each failure corresponds to a specific section in a standard design doc template that would have forced teams to address these issues before they became problems.
The failures were not sophisticated, impossible-to-anticipate challenges. They were simple mistakes that occurred because teams juggling technical requirements, business needs, and multiple competing concerns simply forgot to consider certain aspects. A structured documentation process would have prevented these oversights.
Design docs bring rigid structure to inherently difficult and chaotic processes. They force teams to think through critical aspects of projects and record those thoughts, dramatically reducing the probability of walking into preventable failures.
Practical Implementation Guidance
Documentation-First Methodology
The primary principle involves creating documentation before writing code. Teams should develop comprehensive blueprints before building, ensuring functional, purpose-driven solutions rather than creating work that must be discarded due to misalignment or lack of planning.
Time Investment Framework
A common concern involves how much time to dedicate to documentation when facing pressure from management. Two weeks represents a reasonable baseline investment for most projects.
While this may seem lengthy, it should be viewed as a time investment rather than bureaucratic overhead. The choice becomes two weeks of upfront planning versus potentially months or years of redoing work due to foundational failures. This represents an exceptionally favorable trade-off.
Inclusive Collaboration
Everyone involved in or affected by the project should contribute to or at minimum proofread the design document. This includes all stakeholders across all departments and functions.
Context becomes key at this stage. Teams must understand why they are building what they are building and how they can achieve their goals. These two factors—the why and the how—fundamentally shape what ultimately gets built.
Problem Space Before Solution Space
The documentation process should begin in the problem space and only then transition to the solution space. This concept, popular in software engineering, emphasizes understanding the problem thoroughly before jumping to solutions.
This approach prevents premature commitment to specific technical approaches before fully comprehending the business problem, user needs, and contextual constraints.
Embracing Change Over Perfection
Design documents will change significantly throughout a project's lifecycle. Teams should not pursue perfection or attempt to create definitive, unchanging documents.
The typical design doc lifecycle involves substantial rewriting as initial plans encounter reality. Whatever gets written initially will likely be mostly rewritten when confronted with practical implementation challenges and evolving understanding.
Exclusive Insights from Industry Experts
The speaker reached out to the authors of a key reference book on design documentation for advice specifically tailored to those beginning to write design docs.
Valerii's Perspective on Early Termination
Valerii emphasized that a true mark of good design documentation is killing doomed projects early, thereby wasting paper instead of time and money. This insight carries profound implications.
Many machine learning projects currently in production probably should not be there. They represent overly complicated solutions to problems that could be solved without machine learning at all.
Starting with documentation increases the likelihood of recognizing either that a project should not be undertaken because it is too complicated relative to its value, or that a project cannot be undertaken because it requires technology or resources the organization lacks.
Discovering these issues during the first couple of weeks of project consideration proves far more valuable than making such discoveries after months of implementation effort.
Arseny's Three-Part Guidance
Arseny provided extensive advice synthesized into three key recommendations.
First, use design docs as communication tools rather than bureaucratic checkboxes. Documentation should facilitate understanding and alignment, not merely satisfy procedural requirements.
Second, maintain problem space versus solution space thinking as a productive mindset when solving problems. This mental framework helps teams stay focused on understanding needs before jumping to technical implementations.
Third, prioritize error analysis as a specific documentation section. This involves examining the errors models produce, identifying patterns, and understanding why models behave as they do. This analytical approach proves more efficient than simply tweaking hyperparameters, yet the speaker notes rarely seeing teams actually implement systematic error analysis in practice.