Bespoke Software in the Age of AI: Orchestrating Intelligence with Phonosyne
Software used to be about reuse. Build a module once, apply it everywhere. But the next wave isn’t about general-purpose tools,it’s about deeply personal systems. Not just customizable interfaces, but workflows encoded with your taste, your rhythm, your constraints. Tools that don’t just wait for commands but anticipate and adapt,intelligent collaborators that move with you.
Phonosyne is my personal experiment in this direction. It’s not a plugin or product, it’s a working system for sound design that interprets my text prompts and returns structured, validated .wav
files. But more than that, it’s a philosophical statement: a demonstration that software can be handcrafted again, not line-by-line, but by intent. Orchestrated through multi-agent AI. Tuned not to a market, but to a maker.
“Agentic AI is proactive, pursuing objectives end-to-end… where generative AI waits for prompts, agentic systems book the studio, mix the track, and release the album.”
Aalpha, Agentic AI vs. Generative AI
What Is Phonosyne?
Phonosyne isn’t just code, it’s a collaboration. A group of AI agents working together to turn musical ideas into actual sound. A prompt like distorted pirate radio broadcast intercepted mid-transmission
becomes not just interpretable, but actionable, mapped, expanded, synthesized, and rendered into .wav
files with no manual coding.
The core system is composed of three dedicated agents. The Designer expands the prompt into a structured plan. The Analyzer deepens that plan into synthesis recipes. The Compiler translates those recipes into audio using SuperCollider. Each agent is specialized, modular, and coordinated through a central Orchestrator that handles sequencing, parallelization, and error recovery.
They don’t just relay information, they adapt. They coordinate, troubleshoot, and validate their own work. Phonosyne isn’t a plugin you tweak. It’s not a tool you operate. It’s a purpose-built team of collaborators, designed not to be flexible for everyone, but to be fluent in the specific, evolving vocabulary of my sound design practice.
But if the system does the work, who’s actually creating the sound?
Building With Behavior, Not Code
The Role of the Orchestrator
The challenge in building Phonosyne wasn’t generating the sounds, it was designing how the agents behave when things get vague, fail, or go off-script. That’s not traditional coding. That’s orchestration.
In a multi-agent system, you’re not specifying what to do step by step. You define the intent. You assign roles. You set the conditions for collaboration, agents that negotiate, retry, escalate, and adapt. The code isn’t static, it’s dialectic, shaped by an ongoing conversation between intent and execution.
“Developers are increasingly defining what they want the system to achieve, leaving the how to the emergent, collaborative intelligence of the orchestrated agent teams.”
Plaat et al., Agentic LLMs(arXiv, 2025)
System Design Through Prompts
This is the Orchestrator role: shaping interactions between autonomous workers, not lines of logic. You’re not implementing features, you’re building relationships under pressure. This is where prompt engineering becomes a form of systems design: not just writing the right string of text, but shaping how your agents perceive, plan, and respond.
Prompts become architecture. A role prompt sets the scope of agency. A system prompt defines the collective mission. Retry logic isn’t just a failsafe, it’s a contract negotiation. Debugging becomes less about syntax and more about resolving misunderstandings between cooperating minds.
Co-Creation Through Debugging
Phonosyne didn’t emerge from a static design doc. It came from iteration, long loops of failure and refinement. I’d sketch a behavior: “Retry rendering if the waveform is silent.” The Compiler would misread it. I’d revise the prompt. Adjust the retry logic. Watch again. Eventually, the system learned my edge cases, and I learned how to speak clearly to a team that wasn’t human, but was learning.
What kind of authorship emerges when you debug through conversation instead of code?
Creative Systems, Not Creative Outputs
Why One Model Isn’t Enough
Even solo creative work is a form of internal collaboration, between drafts, tools, layers, and decisions. Composing music, designing sounds, or shaping a story isn’t a straight line. It loops back, forks off, and accumulates nuance through revision. These workflows are inherently complex, not because they’re inefficient, but because they’re alive.
That complexity is where traditional generative AI starts to fall apart. Most models are built to respond to prompts one at a time, one model, one input, one output. But real creative work isn’t a transaction. It’s a system.
Multi-Agent Collaboration in Practice
Multi-agent architectures offer something closer to that system logic. Instead of collapsing everything into one model, they distribute roles: planners who sketch intent, analyzers who add detail, renderers who produce results, critics who refine and respond.
Projects like LVAS-Agent use this structure to break down long-form video dubbing into storyboard segmentation, script synthesis, sound layering, and final mix. Audio-Agent pairs LLMs with diffusion engines to turn descriptions into editable audio, atomized by task. SonicRAG, the closest to Phonosyne, lets agents retrieve and blend samples from a library, mixing language and signal as modular inputs. These systems don’t just generate, they collaborate.
Specialization over Generalization
Phonosyne shares their DNA but leans in a different direction. It isn’t trying to cover every use case. It doesn’t generalize. It specializes. It’s built not for scalability but for intimacy, for the kind of creative work where the system learns your aesthetic logic, adapts to your pacing, and renders sound in a way that feels like your hands.
Are we still designing software, or assembling teams?
Designing for One: Phonosyne as Bespoke Tool
What Makes It Personal
Most software is built to scale. Phonosyne was built to fit.
It wasn’t made for “audio professionals.” It doesn’t support every DAW, format, or genre. It was designed around one person, me. The workflows mirror how I think. The defaults reflect what I prefer. The language model is tuned to how I describe sound. Not adaptable, accurate. Not general-purpose, deeply personal.
Language-Native, Not GUI-First
There’s no GUI. No knobs, no sliders, no parameter trees. You don’t sculpt the sound by hand, you describe it. Sonically, metaphorically, spatially. And the agents interpret that into action. This isn’t GUI-first design. It’s language-native orchestration.
Judgment Over Flexibility
What makes it bespoke more than the interface is the alignment. Each agent is trained or prompted with my aesthetic values. The Designer knows how I outline sonic ideas. The Analyzer knows which timbres I chase and which I avoid. The Compiler knows when to let a shimmer through, and when to try again. They don’t just follow instructions, they share my judgment.
If a system reflects one person’s taste perfectly, is that still “software”, or something else entirely?
Authorship in the Age of Agents
Who made the sound?
That’s the question people keep asking. But it misses the point. The sounds Phonosyne creates aren’t composed in any traditional sense. They’re not played or programmed. They’re orchestrated through intent, system behavior, and a back-and-forth between me and a machine ensemble trained to speak my sonic language.
“The intentionality gap between human creators and AI-generated content forces a critical reevaluation of authorship itself.”
— Harvard Law Review, Artificial Intelligence and the Creative Double Bind
Exactly. That gap is where authorship lives now.
This isn’t like AI image generation, where debates revolve around consent, appropriation, or stolen style. Sound, especially in experimental and electronic music, has always been a collage: samples, algorithms, found noise. Reuse is the baseline. What’s unusual isn’t that I use machine-generated samples. It’s that I use them intentionally, within a system I built to reflect my aesthetic.
Phonosyne’s outputs aren’t precious or sacred. They’re raw material, structured enough to feel like memory and flexible enough to break apart. What matters isn’t who technically generated them. It’s what I do with them.
And what I do is play.
Phonosyne doesn’t generate “songs.” It’s not trying to impress anyone with end-to-end genre emulation. It feeds my live rig: loopers, samplers, granular tools. That’s where meaning takes shape. In the way a warped radio fragment catches on tape heads. In the moment a failed synth glitch becomes the emotional center of a set. That’s not prompt engineering. That’s instrumental authorship.
Most generative music tools aren’t made for that. They’re built for clean outputs—one prompt, one product. But Phonosyne comes from a different lineage: algorithmic composition, procedural sound, interactive systems. It’s a spiritual cousin to Xenakis, Oval, Autechre, algorave. It’s about process, not product. Performance, not artifact.
So no, I didn’t write every waveform.
But I built the ensemble. I trained its behavior. Tuned it to my taste. Fed it my metaphors. Pushed it to fail in interesting ways. And from that, I shaped something playable, personal—mine.
That’s not just authorship. That’s agency.
That’s orientation.
That’s the whole fucking point.
Beyond Tools: A Personal Philosophy
Orchestration as Identity
Phonosyne isn’t just a sound design system, it’s a way of thinking about what software can become.
It says software doesn’t have to be universal to be powerful. It can be particular. Personal. It can fit the shape of one person’s thinking, not just the contours of a market. When language becomes the interface, taste becomes the architecture. Your sensibilities aren’t settings, they’re the system.
I didn’t build this to solve sound design. I built it to see what happens when the way I think creatively becomes executable. Not metaphorically, but literally. Phonosyne doesn’t care about the average user. It isn’t built to scale or generalize. It’s built to resonate, with one mind, one voice, one set of strange, recursive instincts.
There are no knobs or sliders here. No menus. No DAW integration roadmap. Just a group of agents who speak in my dialect of desire. One expands my prompt into a structure. Another deepens it into synthesis instructions. Another renders it into sound. Not just AI as collaborator, AI as ensemble. A system that knows how I mishear, how I revise, how I get it wrong on purpose.
So listen. Listen to what they reveal about the system that made them. Imagine a future where your tools don’t just take instructions. They learn your voice.
Not just UI. Not just UX. You.