Talk "Using uncertainty estimation to build reliable ML systems for smart manufacturing"

Using uncertainty estimation to build reliable ML systems for smart manufacturing by Lukas Lodes was presented at Munich Datageeks - March Edition 2025

Abstract

Modern production lines generate large amounts of data that can be used to increase the efficiency of both equipment and materials. This includes, of course, predictive maintenance of machines based on recorded data, but also, more recently, predicting product quality as early as possible in the manufacturing process. However, a major concern - especially in SMEs - is the reliability of the underlying ML systems. This talk will first explore the terms of "trustworthy" and "reliable" AI, and then focus on the realization of reliable AI using uncertainty estimation. The talk concludes with an example of how uncertainty estimation techniques in ML and DL can be used to build a cascade of classifiers that delegates predictions to models of different sizes and required compute resources according to how hard instances are to predict.

About the speaker

Lukas Lodes studied computer science at the University of Augsburg and is currently a Ph.D. student at the Technical University of Ingolstadt. His research interests are in the area of Industry 4.0 and Smart Production, with a focus on achieving reliable ML systems through uncertainty estimation.

Transcript Summary

Introduction and Background

Lucas is a PhD student at Technical University of Ingolstadt since April 2021, working under the supervision of Alexander Schindler. He holds a master's degree in computer science from Augsburg University. His research focuses on reliable AI systems for smart manufacturing, with a dissertation topic centered on reliable AI systems in the context of Industry 4.0.

Research Context: Smart Manufacturing

The research considers Industry 4.0 enabled factories where multiple machines perform production steps and generate data throughout the process. These factories require comprehensive sensor infrastructure to capture data points such as temperature curves and pressures. This data must be collected and stored in data warehouses or management systems. The primary use cases include quality predictions during production processes, defect detection, predictive end-of-line testing, and predictive maintenance. While these machine learning applications are theoretically straightforward and do not require large language models, practical implementation faces significant challenges.

Challenges in Manufacturing AI Adoption

Digitalization and Data Collection Issues

Many manufacturing companies, particularly small and medium-sized enterprises, are still undergoing digitalization. These organizations struggle to gather labeled data in sufficient quantity and quality. Setting up proper data pipelines remains a substantial obstacle.

Data Quality and Cultural Barriers

Even when companies establish data warehouses or cloud systems, data and label quality often suffer significantly. Workers on production lines lack data science education and awareness about the importance of high-quality data. Management frequently fails to establish a culture that values data quality and recognizes its potential for generating business value.

Reliability Requirements

The resulting models often lack sufficient reliability for real-world deployment. In industries like automotive manufacturing, where components such as brake parts have strict safety requirements, reliability becomes critical. Parts must be completely defect-free, making model trustworthiness essential.

Reliable AI versus Trustworthy AI

While these terms are often used interchangeably, reliable AI represents a subset of trustworthy AI. Reliable AI encompasses well-calibrated models that avoid overconfidence, dependable systems, robustness against slight data changes, continuous learning capabilities, and uncertainty communication when predictions are uncertain.

Trustworthy AI includes additional human-centered aspects such as fairness, privacy preservation, bias mitigation, and explainability. Although explainability is frequently requested in manufacturing contexts, the focus of this research remains on reliability because it directly integrates into production pipelines without requiring human feedback for every uncertain prediction.

Approaches to Achieving Reliable Machine Learning

Several key strategies enable reliable machine learning systems:

Understanding data and label quality to identify issues such as mislabeled samples in both defect and non-defect categories
Ensuring thorough model testing under all real-world deployment conditions
Developing comprehensive understanding of model performance across different conditions, particularly important in manufacturing where processes depend heavily on environmental factors like seasonal temperature variations affecting processes such as gluing
Ensuring proper model calibration to prevent overconfidence
Implementing uncertainty estimation to quantify and communicate prediction confidence

Methods for Uncertainty Estimation

Bayesian Methods

Bayesian approaches build inherently probabilistic models, such as Gaussian processes or Bayesian networks, using probability distributions instead of fixed weights to produce output distributions rather than single point estimates. However, these methods remain in relatively early development stages with limited well-functioning frameworks and computational feasibility challenges.

Conformal Prediction

This approach constructs prediction regions around predictions. In multiclass classification, it offers sets of predictions containing the correct class with a specified probability (e.g., 95%). Uncertainty is measured by region size, with larger regions indicating higher uncertainty. However, this method has limited applicability to binary defect detection compared to problems with many classes like medical diagnosis with hundreds of diseases or ImageNet with 1,000 classes.

Ensemble-Based Methods

These methods either build or simulate model ensembles, such as random forests or Monte Carlo Dropout, calculating variance across ensemble members to estimate uncertainty. When individual ensemble members disagree significantly in their predictions, this signals higher uncertainty about the instance.

Monte Carlo Dropout

Monte Carlo Dropout is one of the most widely used uncertainty estimation techniques for deep neural networks. Proposed by Yarin Gal and colleagues in 2016, it provides robust uncertainty estimation and works reliably in practice. Since most neural networks already train with dropout, implementation is straightforward.

The technique keeps dropout layers activated during evaluation rather than turning them off as usual. Since dropout randomly drops connections between layers, running the same instance through the model multiple times with active dropout creates slightly different model architectures. This simulates an ensemble of different architectures without explicitly training multiple models, avoiding the high computational cost of training multiple neural networks while still achieving ensemble benefits.

Case Study: Model Cascade for Reduced Cloud Computing

The research presents a practical application of uncertainty estimation to reduce reliance on large models through a cascade architecture.

Computing Infrastructure Levels

Smart factories typically have three abstract levels of computing devices with varying capabilities:

Edge devices: Located close to machines with limited computational power
Intermediate layer: Workstations with solid computational power, though potentially insufficient for all tasks at optimal quality
Cloud environment: Either local data centers or hyperscaler contracts providing nearly unlimited computational power for running the best possible models, but raising privacy concerns and facing increasing costs

Research Questions

The study addresses two main questions: How can compute tasks be assigned to minimize large model usage while maintaining predictive quality, and how can the AI system remain reliable throughout this process.

Cascade Architecture Implementation

Basic Concept

The cascade separates instances into easy or hard to classify categories relative to each computing tier. Easier instances can be reliably classified by smaller models, eliminating the need for more powerful computing resources.

Workflow

The process begins at the edge layer with the smallest device. When an instance is easy to classify and the edge device's computational power suffices, the prediction is accepted immediately without delegating to upper tiers. When the edge model is uncertain, the prediction moves to the next tier. This pattern repeats: certain predictions are accepted, uncertain ones escalate. Even the most powerful model may be uncertain about some instances, requiring manual inspection by skilled workers or special quality control measures.

Selective Classification

This approach uses selection functions specific to each model type (support vector machines, neural networks, etc.). These functions observe models during prediction and determine certainty or uncertainty. Different model types require different selection functions due to their distinct behavioral characteristics.

Technical Infrastructure

Apache Kafka Integration

Apache Kafka serves as the interconnect between models deployed on different compute devices in physically separate locations. Kafka was chosen for its speed, open-source nature, and widespread use in manufacturing for transmitting sensor data to data warehouses.

State transitions between cascade nodes use Kafka producers for sending states and consumers for receiving states, connected via topics. Producers write data to topics when ready, while consumers (such as next-tier models) continuously listen for new data.

Implementation Details

Images are serialized using pickle. Individual applications (edge, intermediate, cloud devices, evaluation, and data production) are asynchronous applications implemented using the Fast Stream library, an open-source tool with approximately 3,000 GitHub stars that simplifies building asynchronous applications.

Each inference application follows this process: reads from the topic, deserializes the image, applies necessary transformations based on the model, performs inference, executes uncertainty estimation, and then either writes to the next topic for the subsequent model or sends the final decision based on the uncertainty estimation result.

Experimental Evaluation

Dataset and Use Case

The evaluation used a prototypical industrial defect detection case with solar cell images. The task involved predicting whether images contain defects, such as visible cracks in solar cells. This dataset was selected for its representativeness of industrial quality assurance applications.

Model Selection for Different Tiers

Edge Device: Support vector machine with the daisy descriptor from the scikit-image library. Despite being classical machine learning, SVMs work well for resource-constrained devices, and appropriate image descriptors are still effective. The accuracy loss was smaller than expected.

Medium Model: LeNet-style convolutional neural network with five convolutional layers and two fully connected layers, keeping the architecture relatively small and basic given the problem's moderate complexity. Implementation used PyTorch with PyTorch Lightning for training.

Large Model: Pre-trained mobile net version three (large variant) from the torchvision model suite, simulating the most powerful model for this problem.

Selection Function Details

Support Vector Machine: Uses a mechanism similar to embedding database distance calculations. The SVM fits a hyperplane between two classes, maximizing the distance from the plane to each class. Instances far from the decision plane are more certain than those close to it. Thresholds were determined experimentally.

Neural Networks: Applied Monte Carlo Dropout, aggregating ensemble member predictions and measuring variance combined with raw model confidence.

Results and Performance

Cloud Usage Reduction

With 522 test instances, only 47% of predictions required the largest model, representing over 50% reduction in large model usage. This translates to halving AWS instance utilization and associated costs.

Accuracy Trade-offs

The accuracy loss was approximately 1.5%, which is acceptable for this use case. While some challenges existed with defect detection specifically, this was not the primary project goal. The results demonstrate that accuracy can be maintained at reasonable levels without excessive reliability sacrifices.

Latency and Throughput

A proof-of-concept setup used basic equipment including a laptop with limited capabilities, two workstations, and standard university network infrastructure. Despite this non-industrial setup, the system processed approximately 10 images per second, a practical value for many production lines. This throughput matched performance when sending everything to the largest model, showing no latency sacrifice.

Conclusion

The research successfully demonstrates that utilizing uncertainty estimation in a cascade architecture can significantly reduce large model usage (by over 50%) while making only reasonable accuracy sacrifices (approximately 1.5%). This approach maintains practical throughput levels and provides a viable path toward more reliable and cost-effective machine learning deployment in manufacturing environments.