Bespoke Software in the Age of AI: Orchestrating Intelligence with Phonosyne

Phonosyne is my working system for sound design. Give it a text prompt, get back structured, validated .wav files. Not a plugin, not a product — a demonstration that software can be handcrafted around intent rather than built line-by-line for a market. Orchestrated through multi-agent AI. Tuned to a maker, not a demographic.

An open mouth mid-scream, surrounded by swirling purple and magenta water.

What Is Phonosyne?

Phonosyne is a collaboration. A group of AI agents working together to turn musical ideas into actual sound. A prompt like distorted pirate radio broadcast intercepted mid-transmission gets expanded, mapped, synthesized, and rendered into .wav files without manual coding.

Three dedicated agents handle the core work. The Designer expands the prompt into a structured plan. The Analyzer deepens that plan into synthesis recipes. The Compiler translates those recipes into audio using SuperCollider. Each agent is specialized and modular, coordinated through a central Orchestrator that handles sequencing, parallelization, and error recovery.

But if the system does the work, who’s actually creating the sound?

Building With Behavior, Not Code

The Role of the Orchestrator

The harder part of building Phonosyne wasn’t generating sound — it was designing how the agents behave when things get vague, fail, or go off-script.

In a multi-agent system, you define the intent, not each step. You assign roles. You set the conditions for collaboration: agents that can negotiate, retry, escalate, and adapt. The code stays in dialogue with itself — shaped by the ongoing relationship between intent and execution.

“Developers are increasingly defining what they want the system to achieve, leaving the how to the emergent, collaborative intelligence of the orchestrated agent teams.”

Plaat et al., Agentic LLMs(arXiv, 2025)

That’s the Orchestrator role: shaping interactions between autonomous workers rather than writing lines of logic. Defining how agents perceive, plan, and respond when things go sideways.

How Phonosyne Works

Orchestrator: accepts prompt and controls other agents
  → Designer: generates soundscape of 18 samples
→ Orchestrator: attempts to generate a sample for each description
  → Analyzer: turns each description into synthesis instructions
  → Compiler: generates SuperCollider code from instructions

1. User Prompts the Orchestrator

It starts with a detailed prompt describing the vibe of the soundscape.

**Back Alley Boom Bap:** The sound of cracked concrete and rusted MPCs. Thick, neck-snapping kicks loop under vinyl hiss and analog rot. Broken soul samples flicker in the haze, distorted claps punch through layers of magnetic tape dust, and modular glitches warp the swing like a busted head nod in an abandoned lot. Pure rhythm decay.

2. Designer Creates a Soundscape Plan

The Orchestrator sends this to the Designer, which expands it into a structured plan.

**DESIGNER**

Sample L1.1:
Crunchy 808 kick drum with saturated analog drive pulses on every quarter note, layered with faint vinyl crackle and a low-passed urban hum (cutoff 900 Hz). Gentle chorus widens the stereo image, while a slow LFO at 0.4 Hz modulates tape flutter for lo-fi authenticity.

Sample L1.2:
[...]

3. Analyzer Generates Synthesis Instructions

The Analyzer takes each sample description and duration and expands it into detailed synthesis instructions.

**ANALYZER**

This effect combines classic drum synthesis, layered noise, and analog-style processing to create a modern yet lo-fi urban beat texture. Layer 1: The core is a synthesized 808 kick drum, generated by a sine oscillator (SinOsc) at 41 Hz with a pitch envelope that sweeps from 60 Hz down to 41 Hz over 70 ms, layered with a short-decay triangle wave click (TriOsc, 1800 Hz, decay 18 ms) for transient definition. Drive pulses are created by routing the kick through a waveshaping distortion (Shaper) with a tanh transfer curve, input gain automated to add saturation on every quarter note—this is achieved by modulating drive depth with an LFPulse at 1 Hz synced to the tempo (quarter notes), producing aggressive, crunchy peaks while preserving low-end punch. Layer 2: A subtle vinyl crackle bed is constructed using filtered WhiteNoise gated by random short envelopes (EnvGen with randomized attack/release between 12–32 ms), panned randomly per event for stereo width and set at -30 dB relative to the kick. Layer 3: An urban hum is synthesized using two LFPar oscillators at 55 Hz and 67 Hz with slow amplitude modulation (LFNoise1.kr(0.15)), then passed through an LPF at a fixed cutoff of 900 Hz (resonance Q=0.7), providing a dark, atmospheric undertone. The hum's amplitude envelope fades in over the first 2 seconds and out over the last 2 seconds for gentle entry/exit. Global FX: The summed signal is processed through a stereo chorus (Chorus effect: delay modulated between 13–24 ms, rate set to 0.23 Hz LFO, depth at 45%) for spatial widening. Tape flutter is simulated by inserting a DelayL line on the master bus, with delay time modulated by an LFNoise2.kr(0.4) LFO between ±5 ms; this imparts slow wow/flutter artifacts characteristic of vintage tape. Final chain includes a Compander set to -16 dB threshold, ratio 2.5:1 for glue, followed by Normalizer to -1 dBFS peak. All layers are carefully mixed: Kick/drive at full level; vinyl at -30 dB; hum at -18 dB; master output stereoized by chorus but mono-compatible. The result is a punchy, saturated drum groove with authentic lo-fi movement and deep urban ambience.

4. Compiler Generates SuperCollider Code

The Compiler takes the Analyzer’s instructions and generates a SuperCollider script to synthesize the sound. It runs the script, checks and fixes errors, and returns the path to a validated .wav file.

Co-Creation Through Debugging

Phonosyne didn’t come from a static design doc. It came from long loops of failure and refinement. I’d sketch a behavior — “retry rendering if the waveform is silent” — and the Compiler would misread it. I’d revise the prompt, adjust the retry logic, watch it run again. Over time, the system started handling my edge cases, and I got better at speaking clearly to a team that wasn’t human, but was learning.

What kind of authorship emerges when you debug through conversation?

Creative Systems, Not Creative Outputs

Multi-Agent Collaboration in Practice

Most models respond to prompts one at a time. Multi-agent architectures distribute that work — planners that sketch intent, analyzers that add detail, renderers that produce results, critics that refine and respond.

Projects like LVAS-Agent use this structure to break long-form video dubbing into storyboard segmentation, script synthesis, sound layering, and final mix. Audio-Agent pairs LLMs with diffusion engines to turn descriptions into editable audio, atomized by task. SonicRAG, the closest to Phonosyne, lets agents retrieve and blend samples from a library, mixing language and signal as modular inputs. These systems generate and collaborate.

Specialization Over Generalization

Phonosyne shares their DNA but doesn’t try to cover every use case. It doesn’t generalize. It was built for intimacy — for creative work where the system learns my aesthetic logic, adapts to my pacing, renders sound in a way that fits how I hear.

Designing for One

Most software is built to scale. Phonosyne was built to fit. There’s no GUI, no knob trees or parameter menus — you describe the sound: sonically, metaphorically, spatially, and the agents interpret that into action.

What makes it bespoke is the alignment. Each agent is prompted with my aesthetic values. The Designer knows how I outline sonic ideas. The Analyzer knows which timbres I chase and which I avoid. The Compiler knows when to let a shimmer through, and when to try again.

Authorship in the Age of Agents

Who made the sound?

That’s the question people keep asking. But it sidesteps something more interesting. The sounds Phonosyne creates aren’t composed in any traditional sense — played or programmed. They’re orchestrated through intent, system behavior, and a back-and-forth between me and a machine ensemble trained to speak my sonic language.

“The intentionality gap between human creators and AI-generated content forces a critical reevaluation of authorship itself.”
— Harvard Law Review, Artificial Intelligence and the Creative Double Bind

Exactly. That gap is where authorship lives now.

This isn’t like debates over AI image generation — consent, appropriation, stolen style. Sound, especially in experimental and electronic music, has always been a collage: samples, algorithms, found noise. Reuse is the baseline. What’s unusual isn’t that I use machine-generated samples. It’s that I use them intentionally, inside a system I built to reflect my aesthetic.

Phonosyne’s outputs aren’t precious or sacred. They’re raw material — structured enough to feel like memory and flexible enough to break apart. What matters isn’t who technically generated them. It’s what happens next.

And what happens next is: I play.

Phonosyne feeds my live rig: loopers, samplers, granular tools. That’s where meaning takes shape — in the way a warped radio fragment catches on tape heads, in the moment a failed synth glitch becomes the emotional center of a set. That’s not prompt engineering. That’s instrumental authorship.

Most generative music tools aren’t made for that. They’re built for clean outputs — one prompt, one product. Phonosyne comes from a different lineage: algorithmic composition, procedural sound, interactive systems. A spiritual cousin to Xenakis, Oval, Autechre, algorave. Process over product. Performance over artifact.

I didn’t write every waveform.

But I built the ensemble. Trained its behavior. Tuned it to my taste. Fed it my metaphors. Pushed it to fail in interesting ways. And from that, shaped something playable, personal — mine.

That’s authorship. That’s agency.
That’s the whole fucking point.