SUNO Vocal Performance: Ad-libs, Screams, Stutters Guide

SUNO is commonly described as a "text-to-song" model, but that framing misses what actually makes it powerful. The vocal model isn't just singing your lyrics — it's performing them. If you hand it a flat line, you get a flat performance. If you direct it with performance cues, stutters, ad-libs, emotional notes, and phonetic acting, you get a song that sounds like a real singer made interpretive choices.

The catch is that almost none of this is documented in SUNO's official UI. The entire performance layer lives in side-channel syntax — parentheses, curly braces, hyphens, ALL CAPS phonetic spellings, and bracket tags — that you have to know about to use.

This article is the complete field manual. Every technique is community-verified across NotebookLM extractions, YouTube tutorials, and real production testing.

The Three Bracket Types: What Each One Actually Does#

Before any specific technique, you need to understand the syntax system. SUNO uses three different bracket types in the Lyrics field, and each one has a completely different purpose:

Syntax	Purpose	Gets Sung?
`[Square brackets]`	Director instructions, section tags	No — invisible to singer
`(Parentheses)`	Performed content: ad-libs, delivery cues	Yes — sung as ad-lib or performed
`{Curly braces}`	Sound effects and backup vocals	No — triggers sound effect only

Mix these up and your song breaks. A common mistake: putting (whispered) hoping the vocal whispers — that works. But putting (building intensity over 8 bars and then landing on a final high note) — SUNO sings that entire phrase as lyrics.

Rule of thumb: parentheses are for things that should be performed. Keep them short — 1 to 5 words. Longer instructions go in square brackets.

Want to apply these techniques?

AceTagGen builds optimized SUNO prompts using all these rules automatically.

Try Free

Ad-libs and Layered Harmonies#

Ad-libs are SUNO's bread and butter for professional-sounding vocals. You can place them inline or on their own line:

Missing you every single day (every day)
Waiting for the sun to rise
(whoa oh oh)
Nothing ever feels the same

The content in parentheses is sung as a background layer — a harmony, an echo, or an ad-lib. Short syllables like (oh), (yeah), (uh) work especially well because they resemble natural vocal improvisation.

Warning: this is "hit or miss." SUNO sometimes sings the parenthesized text as a regular line instead of an ad-lib. If it does, try moving the ad-lib to its own line below, or use shorter content. One-word or two-word ad-libs are the most reliable.

Curly Braces for Sound Effects#

If you want sound effects — vinyl scratching, crowd noise, explosions, records spinning — use curly braces:

{record scratching}
{backup singers}

The curly-brace syntax triggers sound-effect generation that is not sung. The record-scratch example works almost every time: "record scratching — this works almost every time. You can have it actually scratch a record like a DJ would."

Use curly braces for effects that shouldn't be part of the vocal line but should exist in the audio — backing vocals, crowd sounds, environmental atmosphere, DJ scratches.

Stuttering: The Hyphen Syntax#

SUNO performs natural-sounding stutters if you write them phonetically in the lyric line with hyphens between the repeated syllables:

I-I-I never thought you'd leave

The verified example from community research: "I-I-I love you" with a trap/EDM style. SUNO performed a real vocal stutter — not a typo reading. This technique is especially effective in Trap, EDM, Phonk, and modern pop where vocal stuttering is a stylistic choice.

The number of repetitions matters. I-I-I is noticeable but natural. I-I-I-I-I becomes over-the-top. Two or three letter repetitions is the sweet spot.

Vowel Stretching (Melisma)#

You can force SUNO to stretch a single syllable over multiple notes — the technique known musically as melisma — by placing hyphens inside a word:

I lo-o-o-ve you forever

Or after a word to stretch it across a long note:

I love--- you forever

The hyphens signal to SUNO that the vocalist should sustain the vowel across multiple melodic notes. This is how professional R&B, Gospel, and Soul vocals are written in notation. Use it sparingly — one or two melisma moments per verse is impactful; every word stretched becomes exhausting to listen to.

Crying, Sobbing, and Choked-Up Delivery#

For emotional vulnerability, use ellipses inside the lyric line combined with a bracketed delivery cue:

[sobbing voice, choked up]
I... I miss you so much

The ellipses force SUNO to perform audible breaths between the repeated word. The bracketed [sobbing voice, choked up] tag primes the vocal delivery to include cracks, catches, and breath-heavy phrasing. Combined, these two techniques produce one of SUNO's most surprisingly convincing emotional effects.

Key rule: the bracketed cue must be on its own line immediately before the lyric it affects. SUNO has a short attention span for bracket instructions, so proximity matters.

Laughing#

To get SUNO to laugh mid-song, you need both phonetic acting and a tag:

haha [laugh]
Running on the beach with you

The haha is the phonetic spelling of the laugh sound — SUNO performs the actual laugh. The [laugh] tag signals to the model that this is an intentional laugh, not a lyric word. Without the `haha` phonetic, SUNO often ignores the [laugh] tag. Both are required.

Related variations: heh heh [chuckle], hahaha [giggle], haaaa [laugh-sigh]. Write the sound phonetically, label it with a bracketed tag.

Screaming#

Screams need three elements: ALL CAPS, phonetic spelling, and a bracketed tag.

We don't need a reason
AAAA-AAAHHH [scream]

The community-verified approach: spell out the scream phonetically (AAAA-AAAHHH), use all caps for volume emphasis, and add the [scream] tag on the same line or immediately after. SUNO performs the scream as a real vocal scream, not a sung-word approximation.

This is most effective in Rock, Metal, Punk, and high-energy EDM drops. Placement matters: a scream is best at the end of a chorus, the peak of a bridge, or the climax of a buildup — not randomly mid-verse.

Crowd Vocals#

One of the most cinematic techniques: make the lyrics sound like a live crowd is singing along:

[crowd sings]
Oh oh oh we are the ones

For a cheering crowd effect instead of singing:

[crowd yells]

Required combo: you also need Live recording at a concert in your Style field. Without that anchor, [crowd sings] sometimes gets ignored because SUNO doesn't think there's a crowd in the scene. With it, SUNO generates a convincing "crowd singing along" effect — perfect for anthemic choruses, football songs, or live-feel recordings.

Per-Section Emotions#

You can assign a different emotional character to each section of the song using bracketed emotion tags:

[Verse 1: sad, melancholic, slow]
Empty teacup on the windowsill
Thinking about the way you used to be

[Chorus: defiant, powerful, anthemic]
But I won't stay here in the dark
I'm coming back into the light

This technique creates a clear emotional arc across the song. Verses can be sad and slow; choruses can be defiant and powerful; bridges can drop into introspection before the final explosive chorus. This is how real songs work, and SUNO handles per-section emotion surprisingly well.

Alternative syntax: you can also use a separate tag line before the section:

[melancholic]
[slow]
[Verse 1]
Empty teacup...

The inline colon syntax ([Verse 1: sad, melancholic]) is slightly more reliable in V5; the separate-line syntax is slightly more reliable in V4.5. Both work.

The Backup Singers Technique#

To add explicit backup vocal layers, combine curly braces with specific instruction:

[Chorus]
{backup singers}
Main lyric line that leads the chorus
(yeah yeah yeah)

The {backup singers} triggers the backing-vocal layer generation. The parenthesized (yeah yeah yeah) becomes the actual backup vocal content. Use this in choruses, bridges, and outros where you want full-arrangement vocals.

Whispered Delivery#

For whispered or ASMR-style vocals:

[whispered vocals]
(softly)
Just between you and me

The [whispered vocals] tag on its own line, combined with (softly) as a delivery cue, produces a convincing whispered performance. An additional trick: short lines (2-3 words) get whispered more reliably than long lines. Long lyric lines default back to singing regardless of the tag, because the vocal model can't sustain whisper dynamics across a full phrase.

The Male/Female Singer Swap#

For duets or mid-song vocal changes:

[female singer]
Morning breaks over the skyline

[male singer]
Shadows fade into the light

[duet]
Together we watch the day arrive

Required combo: include duet in your Style field. Without it, the singer swap is hit-or-miss — sometimes SUNO treats them as stylistic hints rather than actual voice changes. With the style anchor, it becomes much more reliable.

Full Example: An Emotionally Directed Verse#

Here's a verse that stacks multiple performance techniques — each one doing a specific job:

[Verse 1: intimate, confessional]
[whispered vocals]

I... I thought you'd stay (stay)
Waiting up until midnight
I-I-I never saw the sign

[Pre-Chorus: building intensity]
[Crescendo]
But now I know (now I know)
Everything changed (changed)

[Chorus: defiant, anthemic]
AAAH
I'm running through the flames
haha [laugh]
Never looking back again
(never again)

Read the layers: intimate whispered verse with broken delivery (I... I), an ad-lib echo ((stay)), a stutter for vulnerability (I-I-I), a pre-chorus crescendo into the anthem chorus, the chorus opens with a vocal scream (AAAH), drops into defiant laughter (haha [laugh]), and closes with an ad-lib echo ((never again)). Every bracket is doing work that a professional vocalist would do instinctively — but SUNO needs you to spell it out.

What Not to Do#

A few common mistakes to avoid:

Long parenthesized instructions — SUNO sings them. Keep parentheses to 1-5 words.
More than 2 bracket tags per section — SUNO ignores after 2-4. Pick the most important ones.
Performance cues without a style anchor — [duet] without "duet" in Style field, [crowd sings] without "Live recording at a concert" in Style field. Always anchor the global style.
Conflicting delivery cues in the same section — (whispered) AND (belted) together. Pick one.
Artist names — "sing like Beyoncé" triggers rejection filters. Describe the vocal style instead: "powerful female vocals, vocal runs, R&B phrasing."

The Core Mental Model#

SUNO's vocal model is essentially a session singer who reads your notes. Vague notes produce a generic performance. Specific notes — "whisper this line, stutter on the first word, laugh at the end, scream in the bridge" — produce a specific performance.

The tags in this article are that notation system. They look like syntax, but they're really performance direction. Every bracket, parenthesis, and hyphen is telling the vocal model which interpretive choice to make.

Most SUNO users never touch this layer. They write plain lyrics, get generic vocal performances, and wonder why their songs sound like they were generated by a computer. The ones who do use the performance layer — ad-libs, stutters, crying, laughing, screaming, whispering, per-section emotions — get vocal performances that feel human.

Try the Questionnaire → to see how AceTagGen applies these techniques automatically based on what you describe.

Ad-libs, Screams, and Stutters: Directing a SUNO Vocal Performance Line by Line