Munich Datageeks e.V.
Talk "From Paper to Insight - Medical Document Processing on AWS with Generative AI"
Screenshot from starting the talk

Talk "From Paper to Insight - Medical Document Processing on AWS with Generative AI"

Felix Reuthlinger

Storm Reply and AWS present a serverless AWS pipeline for extracting clinical entities from German healthcare documents, combining Textract, Claude Sonnet, Medical Comprehend, and Claude Opus — cutting doctor pre-consultation review to under two minutes.

Topic was presented at Munich Datageeks - March 2026

Abstract

This talk presents a real-world customer use case developed by Storm Reply in collaboration with AWS, focused on intelligent document processing (IDP) for a healthcare company. The solution extracts relevant medical entities from German healthcare documents — specifically Arztbriefe (doctor's letters) and Befunde (medical findings reports) — to reduce the time clinicians spend reviewing documents before consultations. Built entirely on AWS serverless architecture, the pipeline combines AWS Step Functions, Lambda, Amazon Textract, Amazon Medical Comprehend, and Anthropic's Claude models via Amazon Bedrock. The talk walks through the three pipeline stages — document classification, OCR, and entity extraction — with particular attention to the OCR comparison between AWS Textract, Bedrock Data Automation, and a custom two-step solution using Textract refined by Claude Sonnet. The session closes with broadly applicable lessons learned around engineering pragmatism, iterative development, and domain-expert collaboration.


About the Speaker

The talk is co-presented by three speakers. Nico is a cloud and AI consultant at Storm Reply, an AWS Premier Partner specializing in cloud and AI solutions, part of the wider Reply network of over 200 specialized companies. He leads the framing of the use case, the OCR deep-dive, and the lessons learned. Alena is a Senior Solutions Architect at AWS, providing an overview of the relevant AWS services for intelligent document processing. Hussein is a colleague of Nico's at Storm Reply and presents the architecture and implementation details of the IDP pipeline.


Transcript Summary

Context and Company Background

Reply is an international consulting company with around 17,000 employees, approximately 9,000 of whom are based in Italy, with around 3,000 in Germany. It operates as a network of over 200 specialized companies under one umbrella brand. Storm Reply is the AWS-focused entity within this network and has held AWS Premier Partner status since 2014. This partnership involves active collaboration with AWS on customer projects, maintaining certified staff, and leveraging the full breadth of AWS services — including AI/ML offerings.

The Problem: Intelligent Document Processing

Alena from AWS opened with the observation that 80% of enterprise data still exists in unstructured form, including documents, audio, and video. Manual data extraction from documents is widespread but inefficient. Intelligent Document Processing (IDP) addresses this through three pillars:

  • Core extraction — classification, summarization, and entity extraction from documents
  • Search and knowledge — chat-with-document capabilities, knowledge bases, and indexing
  • Agentic AI — empowering users through AI agents embedded in document workflows

Most organizations, particularly in regulated industries, begin with the core extraction pillar, which is the focus of this talk.

AWS Services for IDP

Alena outlined two approaches available on AWS:

Managed Solutions

  • Amazon Textract — a mature, non-generative ML service for OCR tasks. Handles printed and handwritten text, forms, tables, signatures, and custom queries. Particularly strong at returning bounding boxes for extracted content, which generative models currently handle less reliably.
  • Bedrock Data Automation (BDA) — a managed service within Amazon Bedrock that orchestrates multiple foundation models behind a single API. Handles full document processing pipelines including selective summarization (e.g., extracting just a summary of an anamnesis section rather than the full text). Abstracts model selection and orchestration from the developer.

Custom Solutions via Amazon Bedrock

For use cases requiring more control over prompting, model selection, or output format, Amazon Bedrock provides direct access to a range of foundation models from providers including Anthropic (Claude), Amazon (Nova family), and others — many supporting multimodal input. This path trades simplicity for flexibility and precision.

Developer Tooling

Amazon Kiro, an AI coding assistant with a free tier, was briefly mentioned as a tool that supports end-to-end development workflows — from planning and design through implementation and testing — including a CLI mode and the ability to invoke agents for tasks such as generating Terraform deployment scripts.


The Use Case: Healthcare Document Processing Pipeline

Business Goal

The customer — a healthcare company — needed a solution to help clinicians quickly extract relevant information from incoming medical documents before consultation sessions. The pipeline targets two document types: Arztbriefe (doctor's letters) and Befunde (medical findings/lab reports). The goal is to reduce document review time and surface the most relevant clinical information.

Architecture Overview

The pipeline is built on AWS Step Functions, a serverless workflow orchestration service, and operates across three stages:

  1. Document Classification
  2. OCR (Optical Character Recognition)
  3. Entity Extraction

All processing is serverless (Lambda-based), operates on a pay-as-you-go model, and allows documents to be processed in parallel — bringing total end-to-end processing time to under two minutes.


Stage 1: Document Classification

The pipeline uses Amazon Nova Lite, a lightweight model from Amazon's Nova family, to classify incoming documents. A structured prompt is used that describes the distinguishing features of each document type — for example, the specific letterhead structure of an Arztbrief, or the table-heavy layout of a Befund. If a document matches either category, it proceeds through the pipeline. If not, it is set aside but its metadata is retained for potential clinical reference.


Stage 2: OCR

OCR is the most time-consuming stage and the one examined in greatest detail. The input is a scanned image of a document (potentially degraded by blur, rotation, or physical artifacts like coffee stains). The pipeline uses a two-step approach:

  1. AWS Textract performs an initial text extraction pass.
  2. Claude Sonnet (Anthropic, mid-tier model) refines the Textract output — correcting formatting issues specific to German documents (e.g., commas as decimal separators instead of periods), fixing column misalignment in tables, and ensuring extracted values conform to expected formats.

OCR Comparison

Three solutions were evaluated using a synthetically generated sample Arztbrief:

Solution Strengths Weaknesses
AWS Textract Fast, reliable baseline Formatting errors in tables; misreads characters (e.g., "I" as "1"); missing slashes
Bedrock Data Automation (BDA) Detects embedded images and tags them for downstream use; added a column for arrows in tables Similar formatting errors to Textract; arrows were noted but misclassified
Custom (Textract + Claude Sonnet) Best accuracy; correct formatting; handles nested/lineless tables well Higher cost; requires custom code; slightly higher latency

The document was analyzed in sections to illustrate where each solution struggled:

  • Header — dense, highly formatted, with names and addresses that may or may not be related to each other
  • Tabular metadata — patient name, date, and other structured patient info
  • Free-form text — body of the letter; less problematic for modern OCR
  • Nested tables without visible borders — the most challenging section; spanning across the full document width, potentially rotated
  • Footer — low information density; mainly page number

The custom two-step solution outperformed both managed alternatives, particularly on the nested table section, though with acknowledged trade-offs in cost and complexity.


Stage 3: Entity Extraction

After OCR, structured text is passed to Amazon Medical Comprehend, an AWS service purpose-built for medical NLP. It extracts clinical entities such as diagnoses, medications, and ICD-10 codes from the text.

A final refinement pass is then performed using Claude Opus (Anthropic's highest-capability model at the time of the project). Opus was chosen specifically for its reasoning capabilities, used to:

  • Ensure output consistency and correct any syntax issues
  • Associate related entities with each other (e.g., linking a lab result to its corresponding diagnosis)
  • Produce a coherent, ranked output that helps the clinician understand which entities are most relevant and how they relate

The final structured output (JSON) is stored in Amazon S3 (object storage) and can be displayed in a UI within the overall sub-two-minute processing window.


Lessons Learned

The presenters closed with a set of broadly applicable engineering and project management lessons:

  • Keep it simple from the start. Avoid over-engineering early, especially in proof-of-concept phases. Get a working solution quickly and iterate.
  • Design components to be interchangeable. The pipeline was built so that any stage — including the OCR module — can be swapped out if a managed service eventually outperforms the custom solution, reducing long-term complexity.
  • Start small, scale where necessary. Technology evolves; in the best case, a future managed service will replace custom code entirely and remove complexity.
  • Good enough is often good enough. A 90–95% accurate solution may fully satisfy the customer's requirements. Pushing toward 100% accuracy may not be worth the added cost and complexity.
  • Base decisions on data, not gut feeling. Especially in a consulting context, evaluating solutions objectively and presenting measured results builds customer trust — but ensure the metrics you choose are meaningful.
  • Involve domain experts. Technical expertise does not substitute for domain knowledge. In this case, clinicians were consulted to validate whether extracted medications and ICD-10 codes were correct and relevant — feedback that cannot come from engineers alone.