Talk "Backpropagation Boys"

Backpropagation Boys: How to burn money with AI-generated music by Torsten Schön was presented at Munich Datageeks - September Edition 2025

Abstract

This talk is about how someone with absolutely no musical talent managed to release eight albums within a year. From a first attempt and a somewhat crazy idea, various songs were created that help students better remember the most important points from a lecture and were released under a band called Backpropagation Boys. True to the motto: too lazy to study, better listen to music. Now you can do both. In this talk, I explain how AI music generation works in principle and what the workflow looks like to turn an idea into a song and then release it. You can get an impression of the result here:

About the speaker

Torsten Schön is a research professor for Computer Vision for Intelligent Mobility Systems at the Technische Hochschule Ingolstadt and has terrible taste in music. Since no one wanted to listen to his deathcore metal albums and his students weren't always highly motivated to work through hundreds of lecture slides, he started creating a song for each lecture that captured the key points in the lyrics. In addition to soccer, drawing, and comedy, he has now added another extremely unsuccessful hobby as a music producer to his resume.

Transcript summary

Introduction and Motivation

The speaker, who describes themselves as completely unmusical and unable to read music or play instruments, became interested in AI music generation in 2024 when platforms like Udio and Suno emerged. The initial curiosity was whether AI could enable anyone to create professional-sounding music. The first experiment resulted in a death metal album called Carnage Compiler under the band name Raised Carnage Compiler, with the album titled Venom of Aldes. All tracks focused on AI taking over the world and destroying humanity.

While the music quality was acceptable, the speaker realized death metal had a limited audience and there were already many excellent bands in the genre, making it impractical to pursue commercially.

The Failed Business Model

The speaker initially developed a plan to create random pop music, flood Spotify with AI-generated hits, and become wealthy. This plan failed due to market saturation. According to a Guardian report from April 2025, 18% of all music uploaded to Deezer was AI-generated, amounting to 20,000 AI-generated songs uploaded daily to streaming platforms. Additionally, major companies and established bands already dominate the music industry with superior marketing capabilities and production quality, making it nearly impossible for AI-generated music to gain traction or generate meaningful income.

The Educational Pivot: Back Propagation Boys

After abandoning commercial ambitions, the speaker's colleague Mika suggested a practical educational use case: helping students remember algorithms and concepts from lectures by embedding them in song lyrics. This led to the creation of the Back Propagation Boys.

The first albums, released in early 2025, focused on computer vision concepts, with each song explaining one lecture topic. To accommodate diverse musical preferences, three versions of each album were produced: rock, pop, and hip-hop, all with identical lyrics. A second album series on neural networks followed for the winter semester, also in three genre versions.

Technical Background: How AI Music Generation Works

The speaker explained three main approaches to representing music for AI generation:

Symbolic Representation: Music is translated into symbolic tokens describing individual components like vocals and guitar, which can then be processed auto-regressively using transformers similar to text generation.

Spectrogram-Based Approach: Audio sequences are transformed into mel spectrograms, which involve windowed Fourier transformations of the audio signal. Unlike standard Fourier transformations that assume periodic signals, spectrograms accommodate music's non-periodic nature by analyzing frequency content over time windows. These spectrograms resemble images and can be processed by image encoders.

Raw Audio Data: Direct processing of high-dimensional audio sequences (60 portions per second at 16-bit depth), though this approach is difficult to scale.

Most generation systems use either auto-regressive transformers (similar to GPT models) or latent diffusion models. In diffusion-based systems, prompts can be text descriptions, lyrics, or audio signals that are encoded and fed into a sequence model to generate mel spectrograms, which are then converted back to audio using vocoders.

Available Open Source Models and Their Limitations

Several open-source models exist but none compete with commercial offerings:

Jukebox (OpenAI, 2020): Uses vector-quantized variational autoencoders for audio compression and transformers over codes, but has limited quality and scalability issues
Riffusion: Applies stable diffusion models to spectrogram images and converts them back to audio using vocoders
MusicGen (Meta, 2024): Auto-regressive transformer over discrete audio tokens using the symbolic approach
Other models: AudioLDM 2, SongGen, JEN-1 Composer, and YuE from various sources including Chinese research institutions

The speaker identified several reasons why no competitive open-source music generation models exist:

Limited Training Data: Unlike text or images that can be easily scraped from the internet, music requires subscriptions to streaming platforms, where redistribution and use for training purposes is explicitly forbidden.

Complex Structure: Music contains long-term and hierarchical structures where the beginning must fit the end, operating at multiple levels. These structures are more complicated than text.

Diversity Requirements: Ideal AI-generated music should sound impressive while avoiding similarity to known artists, as prompting for specific artist styles is forbidden in commercial platforms to prevent copyright infringement lawsuits.

Limited User Base: Unlike text or image generation tools used daily for work and presentations, people rarely need custom-generated music. This reduces the potential market and profit expectations, discouraging major companies from investing despite having access to necessary data and resources.

The Production Workflow Reality

The speaker's initial vision of simply prompting songs and automatically uploading them to Spotify proved unrealistic. The actual process required extensive human intervention:

Lyric Creation: Detailed instructions were provided to ChatGPT specifying what content should appear in each verse and chorus. For example, the regularization song explicitly requested L1 in verse one, L2 in verse two, dropout in verse three, early stopping in verse four, and all four techniques in the chorus. The initial AI-generated lyrics were often too verbose, requiring iterative refinement to make them shorter and more song-friendly. This created a constant trade-off between technical precision and musical quality.

Style Engineering: More precise style prompts yielded better results. The speaker experimented extensively with different style descriptions, eventually creating very specific prompts to achieve desired sounds, such as attempting to recreate the early 1990s Limp Bizkit style for the Transformer song (though artist names couldn't be used directly in prompts).

Generation and Selection: Suno generates two versions per prompt. The speaker listened to multiple generations, verifying that all lyrics were accurately included in the songs. Some songs required 10-12 generation attempts before producing acceptable results. The version 3 model sometimes made errors, forgetting final verses or singing words unintelligibly even after numerous attempts.

Album Production: Album covers were generated using Midjourney with custom prompts, but automatic text generation didn't work. The speaker manually created background images, added the Back Propagation Boys silhouette (also generated with Midjourney), manually added shadows, and placed all text and descriptions by hand.

Distribution Setup: Direct upload to Spotify is impossible; a distributor is required. The speaker used DistroKid, one of the cheapest options. For each release, the process involved uploading the album cover, then uploading MP3 files one by one (not in batch), and filling out 10-20 information fields per song including guitarist, vocalist, drummer, production rights holder, and explicit content warnings.

Lyrics Synchronization: Adding synchronized lyrics (the feature visible in Spotify) is not automatic. DistroKid provides a website where users play the music and manually hit the space bar to mark when each line starts and ends. This process took many hours and was frequently interrupted by the speaker's three children, requiring multiple restarts.

The entire process took approximately three weeks (120 hours) of work for the first three albums. After uploading, publication on Spotify takes about a week, while synchronized lyrics can take two to three weeks to appear.

Financial Analysis

Costs:

Suno subscription: €10 per month for 10 months = €100
ChatGPT Plus: €23 per month (company-paid, but would be necessary for independent creators)
Total one-time investment: approximately €330
DistroKid distribution: €42.99 per year
Synchronized lyrics: €14.99 per band per year
Annual recurring costs: approximately €58
Total time investment: 120 hours

Revenue Reality: Spotify pays $0.003 per stream, but only if at least 1,000 streams per song per year from different users are achieved. Without reaching this threshold, no payment is made. Other streaming providers pay with lower stream requirements, which is where the speaker's $142 in earnings originated.

To break even on yearly costs alone through Spotify, the speaker would need 64 streams per day for at least one song. Current performance is nowhere near this target.

Student Reception and Educational Success

From 32 students who attended the computer vision course (31 completed the post-exam questionnaire), nine students listened to the Back Propagation Boys to prepare for the exam. Of these nine, five agreed that the music helped them acquire knowledge, two disagreed, and two were uncertain.

The speaker considers this a success because the revised intention was educational rather than commercial. The songs helped motivate students to remember important facts from lectures. Interestingly, some exam questions could be answered correctly simply by knowing the chorus lyrics, such as listing the four regularization techniques: L1, L2, dropout, and early stopping.

Song Examples and Styles

The presentation included several musical examples demonstrating different approaches:

Open Pose: A catchy track from the first album about multi-person pose estimation, considered one of the best songs from the version 3 model generation.

Hyperparameter Optimization (hip-hop version): A dance-floor-ready track covering learning rates, batch sizes, and dropout.

Attention (pop version): A song explaining the attention mechanism in neural networks with the memorable chorus emphasizing that attention is all you need.

Transformer (rock version): An attempt to capture the early 1990s Limp Bizkit style, demonstrating how detailed style engineering can produce specific musical aesthetics.

LSTM: Given that LSTM was invented by Hochreiter and Schmidhuber in Munich, this song was given a Bavarian touch, covering concepts like forget gates and memory cells.

Critical Considerations and Limitations

Immutability: Released music cannot be changed. Unlike websites that can be updated, once an album is published, it remains permanent. It can only be deleted, not modified. The speaker noted one song that ends abruptly without a proper ending, which cannot be fixed.

Precision vs. Quality Trade-off: Technical accuracy often conflicts with good song lyrics. Overly precise explanations result in lengthy, uncatchy songs. The speaker prioritized memorable choruses containing main lecture concepts over complete technical precision.

Copyright Status: AI-generated music cannot be copyrighted in the United States. Europe has no official blanket statement yet, but a Czech Republic case established that AI-generated content cannot be copyrighted. There is a potential exception: lyrics written by humans could be considered a creative contribution eligible for copyright protection, though the music itself would not be.

Ethical Boundary: The speaker established a rule for the Munich Datageeks community: members should not use the community for personal benefit or financial gain. Therefore, listeners were humorously requested not to stream any single song more than 63 times per day to avoid generating unintended revenue.