MyStation
Browser-based Conversational DJ uses voice commands and multimodal synthesis to generate personalized, real-time audio Flows instantly.
Loom Video
Project Description
MyStation Conversational DJ Agent is a voice-first AI agent that generates personalised audio “Flows” based on the user’s spoken request and optional context they choose to provide. The user can simply say: “Give me a morning brief with my saved articles,” or “Create a 25-minute deep work session with today’s calendar.”
The agent interprets the voice command, summarises the user-selected context, generates narration in natural ElevenLabs voices, composes background music with ElevenLabs Music, and streams a seamless audio experience back to the browser.
Working Prototype
The prototype runs end-to-end:
The browser captures the user’s voice.
The agent parses intent (moment type, duration, topics, mood).
Optional user context is included — e.g., text from a reading list item, saved link, note, or calendar event.
The system generates a structured script using an LLM.
Narration is produced using ElevenLabs TTS (SSML-based, multi-voice support).
Background tracks are generated using ElevenLabs Music Generation.
Narration + music is mixed into a unified “Flow” and streamed to the browser.
This entire process can be shown in a live three-minute demo.
Technical Complexity & Integration
The system coordinates:
Browser voice capture
LLM-driven intent parsing
Optional user-selected context ingestion (e.g., reading list snippets, meeting notes, calendar summaries, saved URLs)
Script generation + planning using an LLM
ElevenLabs TTS for narration (including SSML and multi-speaker support)
ElevenLabs Music Generation for mood-aligned ambient tracks
Audio mixing and client-side playback
Real-time progress communication to the UI
A lightweight orchestration layer sequences tasks, handles fallbacks, and returns a playable audio file.
Innovation & Creativity
Instead of answering questions like a chatbot, this project turns the agent into a personal radio producer. It transforms scattered inputs — such as saved articles, notes, or daily plans — into a continuous, ambient audio show.
It offers a new way to experience personal information: not through reading, scrolling, or checking apps, but through a hands-free, AI-generated radio format.
Real-World Impact
People accumulate newsletters, links, and notes they rarely have time to read. This agent turns that digital backlog into a simple listening experience:
“Turn my reading list into a 10-minute morning brief.”
or
“Use today’s calendar to shape my focus session.”
For knowledge workers, commuters, and busy creators, it removes cognitive load and delivers personalised audio reality — instantly and seamlessly.
Theme Alignment
This project strongly embodies the Conversational Agents theme:
Listens to voice
Understands natural-language intent
Uses LLM reasoning + tool use
Generates narration through ElevenLabs
Creates custom music using ElevenLabs Music
Produces a fully autonomous audio response
The entire workflow is agentic: listen → understand → plan → generate → speak back.
Technologies
Typescript / Next.js for the frontend
Firebase for orchestration
LLM for intent parsing, summarisation, and planning
ElevenLabs TTS for narration
ElevenLabs Music Generation for ambient or energetic background
Browser Web APIs for speech input and streaming playback
Deployment on Vercel + Firebase Hosting
Prior Work
Before the hackathon, we had a few generic text templates we previously used for experimenting with LLM content formatting (e.g., “news summary” and “music style” prompt examples). These were static text snippets and not connected to any system. + Some default Music template we used Suno and Eleven Labs