The vocals make or break an AI-generated song. You can nail the genre, the instruments, the tempo — but if the vocal performance sounds flat, robotic, or generic, the whole track falls apart.
Most SUNO users treat vocals as an afterthought. They pick a genre and let the AI decide how to sing. That's leaving the most emotionally powerful element of your song to chance.
We didn't want to guess at vocals. So we mapped the entire vocal space systematically: 8 voice types, 44 tones, 31 techniques, and dozens of phonetic tricks — all tested and verified. This article covers the most impactful findings from that research.
The Vocal Control Map#
SUNO processes vocal instructions from two places, and they do different things:
Style field → Sets the global vocal character. Voice type, overall tone, and general delivery style. Think of it as casting the singer.
Lyrics field → Controls moment-to-moment delivery. Bracket tags change how specific sections or lines are performed. Think of it as directing the singer between takes.
Understanding this split is essential. Putting whispered in the Style field makes the entire song whispered. Putting [whispered] in a bracket tag before Verse 2 makes only that section whispered while the rest of the song sings normally.
The most powerful vocal arrangements use both: a global voice type in the Style field, with section-specific delivery instructions in the Lyrics field.
Want to apply these techniques?
AceTagGen builds optimized SUNO prompts using all these rules automatically.
The 8 Voice Types#
SUNO recognizes these core voice types as Style field tokens:
| Voice Type | What You Get | Best For |
|---|---|---|
| `Male vocalist` | Standard adult male voice | Rock, Pop, Hip-Hop |
| `Female vocalist` | Standard adult female voice | Pop, R&B, Country |
| `Deep male voice` | Low baritone/bass register | Blues, Soul, Spoken Word |
| `High female voice` | Soprano-range female | Opera, Power Ballads, Pop |
| `Androgynous vocals` | Gender-ambiguous voice | Indie, Art Pop, Electronic |
| `Children's choir` | Young group singing | Holiday music, Folk, Soundtrack |
| `Raspy vocals` | Gravelly, textured voice | Blues Rock, Grunge, Country |
| `Smooth vocals` | Clean, polished delivery | R&B, Jazz, Synthpop |
Pro tip: Voice type should be one of the last tags in your Style field, not one of the first. Genre and mood carry more weight and should anchor the prompt. Voice type is a modifier, not a foundation.
Combining Voice Types
You can combine voice type with a modifier for more specificity:
Style: Neo-Soul, warm, smooth female vocals, intimate, 85 BPM
Style: Grunge, raw, raspy male vocals, distorted, 128 BPM
Style: Art Pop, airy, androgynous vocals, ethereal, 100 BPMThe 44 Tones: Emotional Color for Vocals#
Tone descriptors modify the emotional quality of the vocal delivery. We verified 44 tones that SUNO consistently responds to. Here are the most impactful ones, organized by emotional territory:
Warm/Positive:
warm, tender, joyful, uplifting, bright, hopeful, playful, sweet, soulful, passionate
Dark/Intense:
dark, haunting, menacing, brooding, aggressive, fierce, ominous, gritty, raw, tortured
Sad/Reflective:
melancholic, wistful, nostalgic, bittersweet, longing, somber, mournful, vulnerable, fragile, aching
Cool/Detached:
ethereal, dreamy, airy, distant, spacious, hypnotic, trance-like, meditative
Energetic:
anthemic, energetic, triumphant, defiant, euphoric, empowering, rebellious
The one-tone rule: Use one tone descriptor per prompt. Two tones can complement each other (
warm+nostalgic), but three or more start to dilute the signal. Pick the one word that best captures the emotional center of your song.
The 31 Techniques: Performance Instructions#
Technique tags are bracket instructions that control specific vocal behaviors. These go in the Lyrics field and affect individual sections.
The Essential 10
These are the 10 most reliable vocal technique tags, ranked by consistency:
- `[whispered]` — Soft, breathy, intimate delivery. One of the most reliable tags in SUNO.
- `[belted]` — Full-power, chest-voice singing. Great for choruses and climactic moments.
- `[spoken word]` — Switches from singing to speech. SUNO maintains the backing music.
- `[rap]` — Rhythmic spoken delivery. Works even in non-hip-hop genres for interesting contrasts.
- `[falsetto]` — High, airy vocal register. Works better with male voice types.
- `[harmonized]` — Adds vocal harmony layers. SUNO generates actual complementary notes.
- `[a cappella]` — Vocals only, no instruments. Extremely reliable.
- `[humming]` — Closed-mouth melody. Beautiful for intros and transitions.
- `[growling]` — Guttural vocal technique. Perfect for metal and hardcore genres.
- `[chanting]` — Repetitive, rhythmic group vocal. Great for ritual or tribal feels.
Section-Specific Technique Stacking
The real power of technique tags comes from varying them across sections to create emotional arcs:
[Intro]
[humming]
Mmm, mmm, mmm...
[Verse 1]
[whispered]
I found your letter in the drawer
The ink was fading, words unsure
You wrote goodbye like it was nothing
But the paper's creased from all my reading
[Pre-Chorus]
[building, emotional]
And I can feel it rising in my chest
[Chorus]
[belted, powerful]
Don't tell me that it's over!
Don't say we're done!
I'll carry this forever
Under the weight of the sun!
[Verse 2]
[spoken word]
You know what's funny?
I still set two places at the table.
Force of habit, I guess.
Or maybe hope.
[Bridge]
[falsetto, vulnerable]
Ooh... if you could see me now...
[Final Chorus]
[belted, harmonized]
Don't tell me that it's over!
Don't say we're done!Notice how the vocal delivery progresses: humming > whispered > building > belted > spoken word > falsetto > belted with harmonies. This arc takes the listener on an emotional journey, and it's all controlled through bracket tags.
Duets: Two Voices in One Song#
Creating duets in SUNO requires combining tags in both the Style field and the Lyrics field:
Style field:
Style: Pop Ballad, duet, male and female vocals, emotional, 72 BPMLyrics field:
[Verse 1 - Male Singer]
I've been standing at your door
Wondering if you're still inside
[Verse 2 - Female Singer]
I've been hiding in the dark
Wondering if you'd ever find me
[Chorus - Both]
But here we are, beneath the stars
Two broken pieces fitting perfectlyKey requirements for duets:
- •The word "duet" must appear in the Style field — this is the trigger
- •Use
[Male Singer]and[Female Singer]bracket tags in the Lyrics field - •Mark shared sections with
[Both]or[Together] - •Keep verse structure balanced — similar length for each singer
Duets don't work perfectly every time — SUNO sometimes defaults to a single voice despite the tags. Success rate improves significantly when you include "male and female vocals" in the Style field alongside "duet."
Crowd Singing: The Stadium Effect#
One of the most emotionally impactful vocal effects is crowd singing — the sound of a massive audience singing along. Here's how to trigger it:
Style field:
Style: Anthemic Rock, live recording at a concert, crowd singing, powerful, 128 BPMLyrics field:
[Chorus]
[crowd sings]
We are the ones who never quit!
We are the fire that never dies!
Oh-oh, oh-oh-oh!
Oh-oh, oh-oh-oh!The keys to crowd singing:
- •"Live recording at a concert" in the Style field sets the spatial environment — room ambiance, audience noise, stadium reverb
- •`[crowd sings]` in the Lyrics field triggers the crowd vocal effect in that specific section
- •Simple, repetitive lyrics work best for crowd sections — real crowds sing simple melodies
- •"Oh-oh" patterns strongly reinforce the crowd singing effect — they give the AI a clear, simple melodic pattern for group vocals
This technique is remarkably effective for anthemic rock, pop, and sports-style music. The crowd effect adds a sense of scale and emotion that no other vocal technique can match.
Phonetic Tricks: Writing for the Voice, Not the Page#
These tricks exploit how SUNO's text-to-speech engine converts written text into vocal performance. They work by manipulating the phonetic input rather than using descriptive tags.
Stuttering
I-I-I can't believe you're gone
W-w-wait, don't leave me hereRepeating the first consonant with hyphens creates a natural-sounding stutter. Effective for emotional vulnerability or nervousness. Keep it to 2-3 repetitions maximum.
Stretching
I'm fa-a-a-alling for you
So-o-o-o deep in my mindExtending vowels with hyphens creates sustained, melismatic phrases. The singer holds the note longer, creating emphasis and emotional weight.
Sobbing Effect
[whispered, emotional]
And I... I just... I can't anymore
*sniff*
Why did you... have to go?The combination of ellipses (...), fragmented phrases, and the whispered tag creates a realistic sobbing/crying vocal effect. The broken sentence structure makes the AI pause and hesitate naturally.
Laughing
Ha ha ha, you thought I'd cry?
He he, no no no
I'm laughing at the mess we made, ha!Written laughter (ha ha, he he) in the lyrics triggers actual laughing-while-singing delivery. Works best in upbeat or ironic contexts.
Screaming
[belted, aggressive]
I WON'T BE SILENT ANYMORE!
NO! NO! NO!ALL CAPS text combined with an aggressive delivery tag ([belted] or [screaming]) triggers a shouted/screamed vocal. Exclamation marks and short, punchy phrases reinforce the effect.
The Whisper-to-Scream Progression
[whispered]
They told me to be quiet
(softly)
They told me to sit down
They told me to behave
[belted, powerful]
BUT I WON'T! I WON'T! I WON'T BACK DOWN!This is one of the most dramatic effects you can create in SUNO — the contrast between whispered delivery and a sudden belted explosion. The dynamic shift creates genuine emotional impact.
The Whisper Effect: Short Lines Create Intimacy#
Here's a subtle trick most people miss: line length affects delivery intensity.
Short 2-word lines naturally produce more intimate, whispered-quality delivery — even without a [whispered] tag:
[Verse]
Come closer.
Hold tight.
Stay here.
Tonight.Compare with a normal-length verse:
[Verse]
Come a little closer and hold me tight tonight
Stay right here beside me until the morning lightThe first version sounds more intimate and controlled. The second sounds more like normal singing. Short lines give the AI less syllabic material per phrase, naturally producing a more restrained, intimate delivery.
Combine this with an actual whisper tag for maximum effect:
[Verse]
[whispered]
Just us.
Right here.
No words.
Just fear.This produces an extremely quiet, breathy, ASMR-like vocal that's perfect for intimate ballads, ambient tracks, or dramatic spoken-word sections.
Emotional Delivery Per Section: The Blueprint#
Here's a complete section-by-section vocal blueprint for a power ballad, using everything we've covered:
| Section | Voice Tag | Tone | Technique | Lyrics Style |
|---|---|---|---|---|
| Intro | `[humming]` | — | Humming | No words, melodic "mmm" |
| Verse 1 | `[soft, intimate]` | Vulnerable | Natural phrasing | Medium-length lines |
| Pre-Chorus | `[building]` | Hopeful | Increasing intensity | Slightly longer lines |
| Chorus 1 | `[belted]` | Passionate | Full power | Short, punchy phrases |
| Verse 2 | `[spoken word]` | Reflective | Speaking | Conversational rhythm |
| Pre-Chorus 2 | `[emotional, building]` | Desperate | Cracking voice | Fragmented lines |
| Chorus 2 | `[belted, harmonized]` | Triumphant | Harmony layers | Same lyrics + ad-libs |
| Bridge | `[falsetto, fragile]` | Vulnerable | High register | Very short lines |
| Final Chorus | `[belted, crowd sings]` | Euphoric | Full production | Simplified hook |
This blueprint creates a complete emotional arc using only vocal technique tags. No changes to the Style field needed — all the variation comes from the Lyrics field brackets.
The Full Vocal Control Panel#
We've covered the highlights, but the complete picture is larger: 8 voice types, 44 verified tones, 31 techniques, and dozens of phonetic formatting tricks. Together they form a vocal control system with hundreds of possible combinations.
The challenge isn't knowing that these options exist — it's knowing which combinations work for your specific genre, mood, and song structure. A "whispered, falsetto" combination works beautifully in ambient but fails in metal. "Belted, aggressive" is perfect for punk but ruins a jazz ballad.
AceTagGen's Questionnaire handles this matching automatically. Select your genre and mood, and the vocal options adapt — showing only the voice types, tones, and techniques that work for your specific combination. One-click tags handle the formatting so you never have to remember bracket syntax.
Stop leaving vocals to chance. Start directing every note — try the Questionnaire now.