Bespoke Software in the Age of AI: Orchestrating Intelligence with Phonosyne
Phonosyne is my working system for sound design. Give it a text prompt, get back structured, validated .wav files. Not a plugin, not a product — a demonstration that software can be handcrafted around intent rather than built line-by-line for a market. Orchestrated through multi-agent AI. Tuned to a maker, not a demographic.

What Is Phonosyne?
Phonosyne is a collaboration. A group of AI agents working together to turn musical ideas into actual sound. A prompt like distorted pirate radio broadcast intercepted mid-transmission gets expanded, mapped, synthesized, and rendered into .wav files without manual coding.
Three dedicated agents handle the core work. The Designer expands the prompt into a structured plan. The Analyzer deepens that plan into synthesis recipes. The Compiler translates those recipes into audio using SuperCollider. Each agent is specialized and modular, coordinated through a central Orchestrator that handles sequencing, parallelization, and error recovery.
But if the system does the work, who’s actually creating the sound?
Building With Behavior, Not Code
The Role of the Orchestrator
The harder part of building Phonosyne wasn’t generating sound — it was designing how the agents behave when things get vague, fail, or go off-script.
In a multi-agent system, you define the intent, not each step. You assign roles. You set the conditions for collaboration: agents that can negotiate, retry, escalate, and adapt. The code stays in dialogue with itself — shaped by the ongoing relationship between intent and execution.
“Developers are increasingly defining what they want the system to achieve, leaving the how to the emergent, collaborative intelligence of the orchestrated agent teams.”
Plaat et al., Agentic LLMs(arXiv, 2025)
That’s the Orchestrator role: shaping interactions between autonomous workers rather than writing lines of logic. Defining how agents perceive, plan, and respond when things go sideways.
Co-Creation Through Debugging
Phonosyne didn’t come from a static design doc. It came from long loops of failure and refinement. I’d sketch a behavior — “retry rendering if the waveform is silent” — and the Compiler would misread it. I’d revise the prompt, adjust the retry logic, watch it run again. Over time, the system started handling my edge cases, and I got better at speaking clearly to a team that wasn’t human, but was learning.
What kind of authorship emerges when you debug through conversation?
Creative Systems, Not Creative Outputs
Multi-Agent Collaboration in Practice
Most models respond to prompts one at a time. Multi-agent architectures distribute that work — planners that sketch intent, analyzers that add detail, renderers that produce results, critics that refine and respond.
Projects like LVAS-Agent use this structure to break long-form video dubbing into storyboard segmentation, script synthesis, sound layering, and final mix. Audio-Agent pairs LLMs with diffusion engines to turn descriptions into editable audio, atomized by task. SonicRAG, the closest to Phonosyne, lets agents retrieve and blend samples from a library, mixing language and signal as modular inputs. These systems generate and collaborate.
Specialization Over Generalization
Phonosyne shares their DNA but doesn’t try to cover every use case. It doesn’t generalize. It was built for intimacy — for creative work where the system learns my aesthetic logic, adapts to my pacing, renders sound in a way that fits how I hear.
Designing for One
Most software is built to scale. Phonosyne was built to fit. There’s no GUI, no knob trees or parameter menus — you describe the sound: sonically, metaphorically, spatially, and the agents interpret that into action.
What makes it bespoke is the alignment. Each agent is prompted with my aesthetic values. The Designer knows how I outline sonic ideas. The Analyzer knows which timbres I chase and which I avoid. The Compiler knows when to let a shimmer through, and when to try again.
Authorship in the Age of Agents
Who made the sound?
That’s the question people keep asking. But it sidesteps something more interesting. The sounds Phonosyne creates aren’t composed in any traditional sense — played or programmed. They’re orchestrated through intent, system behavior, and a back-and-forth between me and a machine ensemble trained to speak my sonic language.
“The intentionality gap between human creators and AI-generated content forces a critical reevaluation of authorship itself.”
— Harvard Law Review, Artificial Intelligence and the Creative Double Bind
Exactly. That gap is where authorship lives now.
This isn’t like debates over AI image generation — consent, appropriation, stolen style. Sound, especially in experimental and electronic music, has always been a collage: samples, algorithms, found noise. Reuse is the baseline. What’s unusual isn’t that I use machine-generated samples. It’s that I use them intentionally, inside a system I built to reflect my aesthetic.
Phonosyne’s outputs aren’t precious or sacred. They’re raw material — structured enough to feel like memory and flexible enough to break apart. What matters isn’t who technically generated them. It’s what happens next.
And what happens next is: I play.
Phonosyne feeds my live rig: loopers, samplers, granular tools. That’s where meaning takes shape — in the way a warped radio fragment catches on tape heads, in the moment a failed synth glitch becomes the emotional center of a set. That’s not prompt engineering. That’s instrumental authorship.
Most generative music tools aren’t made for that. They’re built for clean outputs — one prompt, one product. Phonosyne comes from a different lineage: algorithmic composition, procedural sound, interactive systems. A spiritual cousin to Xenakis, Oval, Autechre, algorave. Process over product. Performance over artifact.
I didn’t write every waveform.
But I built the ensemble. Trained its behavior. Tuned it to my taste. Fed it my metaphors. Pushed it to fail in interesting ways. And from that, shaped something playable, personal — mine.
That’s authorship. That’s agency.
That’s the whole fucking point.
