Jaycee Lydian

Intersecting AI, community, and creativity

Bespoke Software in the Age of AI: Orchestrating Intelligence with Phonosyne

Phonosyne is my working system for sound design. Give it a text prompt, get back structured, validated .wav files. Not a plugin or a product — a demonstration that software can be handcrafted again, not line-by-line, but by intent. Orchestrated through multi-agent AI. Tuned not to a market, but to a maker.

An open mouth mid-scream, surrounded by swirling purple and magenta water.

What Is Phonosyne?

Phonosyne is a collaboration. A group of AI agents working together to turn musical ideas into actual sound. A prompt like distorted pirate radio broadcast intercepted mid-transmission becomes actionable, mapped, expanded, synthesized, and rendered into .wav files with no manual coding.

The core system is composed of three dedicated agents. The Designer expands the prompt into a structured plan. The Analyzer deepens that plan into synthesis recipes. The Compiler translates those recipes into audio using SuperCollider. Each agent is specialized, modular, and coordinated through a central Orchestrator that handles sequencing, parallelization, and error recovery.

But if the system does the work, who’s actually creating the sound?

Building With Behavior, Not Code

The Role of the Orchestrator

The challenge in building Phonosyne wasn’t generating the sounds, it was designing how the agents behave when things get vague, fail, or go off-script. That’s not traditional coding. That’s orchestration.

In a multi-agent system, you’re not specifying what to do step by step. You define the intent. You assign roles. You set the conditions for collaboration, agents that negotiate, retry, escalate, and adapt. The code isn’t static, it’s dialectic, shaped by an ongoing conversation between intent and execution.

“Developers are increasingly defining what they want the system to achieve, leaving the how to the emergent, collaborative intelligence of the orchestrated agent teams.”

Plaat et al., Agentic LLMs(arXiv, 2025)

System Design Through Prompts

This is the Orchestrator role: shaping interactions between autonomous workers, not lines of logic. You’re not implementing features, you’re building relationships under pressure — defining how agents perceive, plan, and respond when things go sideways.

Co-Creation Through Debugging

Phonosyne didn’t emerge from a static design doc. It came from iteration, long loops of failure and refinement. I’d sketch a behavior: “Retry rendering if the waveform is silent.” The Compiler would misread it. I’d revise the prompt. Adjust the retry logic. Watch again. Eventually, the system learned my edge cases, and I learned how to speak clearly to a team that wasn’t human, but was learning.

What kind of authorship emerges when you debug through conversation instead of code?

Creative Systems, Not Creative Outputs

Multi-Agent Collaboration in Practice

Most models respond to prompts one at a time. Multi-agent architectures distribute that work — planners who sketch intent, analyzers who add detail, renderers who produce results, critics who refine and respond.

Projects like LVAS-Agent use this structure to break down long-form video dubbing into storyboard segmentation, script synthesis, sound layering, and final mix. Audio-Agent pairs LLMs with diffusion engines to turn descriptions into editable audio, atomized by task. SonicRAG, the closest to Phonosyne, lets agents retrieve and blend samples from a library, mixing language and signal as modular inputs. These systems don’t just generate, they collaborate.

Specialization over Generalization

Phonosyne shares their DNA but isn’t trying to cover every use case. It doesn’t generalize. It’s built not for scalability but for intimacy — for creative work where the system learns my aesthetic logic, adapts to my pacing, renders sound in a way that fits how I hear.

Designing for One

Most software is built to scale. Phonosyne was built to fit. It wasn’t made for “audio professionals.” It doesn’t support every DAW, format, or genre. It was designed around one person. The workflows mirror how I think. There’s no GUI, no knobs or parameter trees — you describe the sound, sonically, metaphorically, spatially, and the agents interpret that into action.

What makes it bespoke more than the interface is the alignment. Each agent is prompted with my aesthetic values. The Designer knows how I outline sonic ideas. The Analyzer knows which timbres I chase and which I avoid. The Compiler knows when to let a shimmer through, and when to try again.

Authorship in the Age of Agents

Who made the sound?

That’s the question people keep asking. But it misses the point. The sounds Phonosyne creates aren’t composed in any traditional sense. They’re not played or programmed. They’re orchestrated through intent, system behavior, and a back-and-forth between me and a machine ensemble trained to speak my sonic language.

“The intentionality gap between human creators and AI-generated content forces a critical reevaluation of authorship itself.”
Harvard Law Review, Artificial Intelligence and the Creative Double Bind

Exactly. That gap is where authorship lives now.

This isn’t like AI image generation, where debates revolve around consent, appropriation, or stolen style. Sound, especially in experimental and electronic music, has always been a collage: samples, algorithms, found noise. Reuse is the baseline. What’s unusual isn’t that I use machine-generated samples. It’s that I use them intentionally, within a system I built to reflect my aesthetic.

Phonosyne’s outputs aren’t precious or sacred. They’re raw material, structured enough to feel like memory and flexible enough to break apart. What matters isn’t who technically generated them. It’s what I do with them.

And what I do is play.

Phonosyne doesn’t generate “songs.” It’s not trying to impress anyone with end-to-end genre emulation. It feeds my live rig: loopers, samplers, granular tools. That’s where meaning takes shape. In the way a warped radio fragment catches on tape heads. In the moment a failed synth glitch becomes the emotional center of a set. That’s not prompt engineering. That’s instrumental authorship.

Most generative music tools aren’t made for that. They’re built for clean outputs—one prompt, one product. But Phonosyne comes from a different lineage: algorithmic composition, procedural sound, interactive systems. It’s a spiritual cousin to Xenakis, Oval, Autechre, algorave. It’s about process, not product. Performance, not artifact.

So no, I didn’t write every waveform.

But I built the ensemble. I trained its behavior. Tuned it to my taste. Fed it my metaphors. Pushed it to fail in interesting ways. And from that, I shaped something playable, personal—mine.

That’s not just authorship. That’s agency.
That’s orientation.
That’s the whole fucking point.

Profile

Available for consulting
Let's chat!