Back to blog
Pillar GuideMay 10, 202624 min read

The Complete Guide to Mixing Vocals: Recording, Editing, Tuning, Processing, and Print

If you only ever read one guide on mixing vocals, this is the one to read. It is the comprehensive end-to-end walkthrough — from microphone choice through final master — covering every stage of the vocal mixing pipeline that determines whether a vocal sounds professional or amateur. Vocals are the most important element in 95% of modern mixes, and they are the element where the difference between competent and expert is most audible. The good news is that vocal mixing is a largely solved problem: the same techniques work across pop, rock, country, folk, R&B, and singer-songwriter styles, with only modest variations between genres.

This guide assumes you are mixing recorded vocals (not live sound, which has different constraints). It assumes you are working in a DAW with standard plugins. It does not require you to own expensive gear — the techniques here work with stock plugins in any DAW. What this guide will not do is hand you presets. Presets do not work across recordings because every voice, room, microphone, and song is different. What this guide gives you is the underlying mental model and the parameter ranges that make every preset start. Once you understand the why, the what becomes obvious.

We will walk through the pipeline in order: pre-mix preparation, gain staging, subtractive EQ, compression (multiple stages), additive EQ, de-essing, saturation, time-based effects (reverb and delay), backing vocal treatment, bus processing, and finalization. At each stage we will cover the specific moves, the parameter ranges, the common mistakes, and how to know when the stage is finished. By the end you should have a complete working framework you can apply to any vocal recording you have.

PRE-MIX: WHAT HAPPENS BEFORE YOU OPEN A PLUGIN. The single biggest determinant of how good your vocal mix sounds is how good the recording is. If the recording is excellent, you can produce a great mix with three plugins. If the recording is poor, no amount of mixing will fully fix it. The recording stage is therefore the highest-leverage step in the entire pipeline, and it deserves at least as much attention as the mixing stage itself.

Microphone choice matters less than amateurs assume. A well-recorded $100 microphone in a treated room sounds dramatically better than a poorly-recorded $3000 microphone in an untreated room. Pick a large-diaphragm condenser for most pop, country, and folk vocals (Shure SM7B, AT2020, Rode NT1, Lewitt LCT 440 Pure). Pick a dynamic microphone for loud vocalists or untreated rooms (Shure SM7B again, or SM58 if budget is tight). The room matters more than the mic.

Mic technique matters far more than mic choice. The singer should be 6 to 12 inches from the microphone, on-axis (the mic pointing at the singer's mouth). A pop filter is essential — it removes plosives that no amount of mixing can perfectly fix. The singer should not move significantly during a take; consistent distance produces consistent tone. The mic should be on a sturdy stand that does not transmit floor vibrations.

Room treatment matters most. The room contributes more to the recorded sound than any plugin you will apply later. At minimum, treat the singer's first reflection points with absorption (foam panels, blankets, mattresses, anything dense). A small treated room (4'x4') outperforms a large untreated room every time. If you cannot treat the room, sing into a closet with hanging clothes — it is acoustically excellent for vocals.

Take selection is the next high-leverage step before you open any plugin. Comp together the best phrases from multiple takes. The goal is a single performance track with the best emotional moment from take 1's verse, the best chorus from take 3, the best ad-lib from take 6. Comping is where amateur engineers shortcut and where pros invest hours. Time spent comping pays compound dividends through every downstream stage.

TUNING AND TIMING. Pitch correction goes at the front of the chain. Apply it before any dynamics processing because the compressor will react to dynamics that the pitch correction will alter, compounding artifacts. Most modern pop is heavily tuned; most country, rock, and folk is lightly tuned to taste. Auto-Tune (or its equivalents — Melodyne, Waves Tune) operates in two modes: graphical (you correct each note manually for transparency) and automatic (the plugin corrects in real time, producing the audible effect popularized by T-Pain).

For modern pop, automatic mode with retune speed around 5-15ms produces the standard Auto-Tune sound. For transparent correction in country or rock, graphical mode is the answer — you adjust only the notes that need adjusting and leave human imperfection in everything else. For absolutely transparent correction (where listeners cannot tell the vocal was tuned), Melodyne in DNA mode is the gold standard. Time spent in Melodyne is the difference between a vocal that sounds 'fixed' and a vocal that sounds 'right.'

Timing edits matter as much as pitch edits. If a phrase comes in slightly late, drag it forward. If a syllable lands slightly behind the beat, slide it. The standard pop convention is to align every consonant attack to the grid; the standard folk/country convention is to leave human timing alone. Pick the genre's convention and commit. Doing pitch and timing edits well takes hours per song; this is normal and worth the time.

GAIN STAGING. The single most-skipped step in amateur mixing is gain staging, and the reason your compressor sounds wrong no matter what settings you use. Gain staging means setting the level of the vocal track so that the loudest peaks hit around -6 dBFS and the body of the performance averages around -18 to -12 dBFS RMS. If your singer was hot and you are clipping, ride the gain down with clip gain (Pro Tools), gain plugin, or volume automation. If your singer was quiet, ride quiet sections up.

The deeper insight: every plugin in your chain has a sweet spot for input level. Compressors expect input around -18 dBFS to behave like the analog gear they are emulating. EQs and saturators have similar level expectations. Feeding any of them with too-hot or too-cold input produces unpredictable results. Gain staging gives every downstream plugin clean, consistent input and dramatically improves the predictability of every move you make afterward.

Practical workflow: insert a gain plugin (or use clip gain) as the first plugin on the vocal track. Set the gain so that the vocal averages -18 dBFS. Look at your meter while a typical vocal phrase plays. Now you have a clean, consistent signal that every subsequent plugin will react to predictably.

SUBTRACTIVE EQ. The first EQ move in the chain is subtractive — removing what does not belong. The goal is to clean up the recording before the compressor reacts to its problems. The standard subtractive moves are a high-pass filter, a mud cut, and a honk cut. Each addresses a specific problem zone and each is worth applying to almost every vocal recording.

High-pass filter at 80-130 Hz. The exact frequency depends on the singer (lower for male voices, higher for female voices) and the song (higher cuts in dense mixes, lower in sparse mixes). The HPF removes mic stand thumps, plosives that survived the pop filter, low-end mud that has nothing to do with the voice, and HVAC bleed. There is essentially no musical content in a vocal below 80 Hz; everything down there is contamination.

Mud cut at 200-400 Hz. Sweep with a narrow boosted band of 4-6 dB through this range until you find the most boxy, hollow, cardboard-sounding frequency. That is your problem frequency. Cut by 2-4 dB with a moderate Q. Almost every vocal benefits from this cut. The vocal will sound thinner in solo, but it will sit dramatically better in the mix and the body of the voice will become audible without the mud.

Honk cut at 700 Hz - 2 kHz (optional). Some voices develop nasal honkiness in this range. Sweep with a narrow boosted band; if you find an offending frequency, cut by 2-3 dB. If the voice sounds natural in this range, leave it alone — the honk cut should fix a problem, not create one.

FIRST-STAGE COMPRESSION. The first compressor is doing slow, musical level control. The classic choice is an opto-style compressor (Teletronix LA-2A emulation), characterized by slow attack and slow release. Set the ratio to 2:1 or 3:1, attack as slow as the plugin allows (or 10-30 ms on a flexible compressor), and release such that the gain reduction needle pumps with the syllables. Aim for 3-5 dB of gain reduction on the loudest words.

What this stage accomplishes: it makes the vocal sit consistently in the mix. Loud words come down; quiet words become more present (relatively). The compressor is doing what a fader rider would do manually but doing it automatically and faster than a human can. The LA-2A is the workhorse here because its slow time constants are ideal for musical leveling.

If you only own one vocal compressor plugin, get an LA-2A clone. Universal Audio makes the gold standard ($299), Waves makes a budget alternative ($30 on sale), and IK Multimedia bundles a serviceable one in their package. Stock DAW compressors can also do this work — set them with the parameters above and they will sound similar.

SECOND-STAGE COMPRESSION. The second compressor catches the peaks that the slow first compressor missed. The classic choice is a fast FET compressor (Universal Audio 1176 emulation). Set the ratio to 4:1 or 8:1, attack as fast as needed to catch peaks (1-5 ms), release fast (50-200 ms), and aim for 2-4 dB of gain reduction on top of what the first compressor already did.

Stacking two compressors with different time constants is the open secret of modern vocal mixing. Slow musical leveling plus fast peak catching, without either compressor having to work hard enough to sound bad. A single compressor doing 8 dB of reduction sounds squashed; two compressors each doing 4 dB sound musical and natural. This serial-compressor approach is used on essentially every modern pop vocal and most rock, country, and folk vocals.

ADDITIVE EQ. After compression, the dynamic range is controlled, so adding presence and air no longer creates painful spikes. The standard additive moves are a presence boost in the upper-mid range and an air shelf in the high range.

Presence boost at 3-5 kHz. A narrow boost of 1-3 dB makes the vocal cut through the mix on any speaker. This is the band where consonants and intelligibility live. Be careful — every dB you add at 3-5 kHz also adds harshness; the de-esser will need to work harder if you boost aggressively here.

Air shelf at 10-15 kHz. A high shelf boost of 2-4 dB at 10-15 kHz adds the polished, expensive sound that defines modern pop. This boost lifts the breathiness of the vocal and gives it a sheen that translates well to streaming. Be careful with extremes — too much air boost adds noise floor and digital harshness.

DE-ESSING. Sibilant frequencies (5-9 kHz typically) need targeted reduction. A de-esser is a frequency-targeted compressor that engages only when sibilance is present. Set the threshold so it kicks in only on the harsh 's,' 'sh,' and 't' sounds, not on the body of the vowels. 3-6 dB of reduction on offending consonants is usually enough.

If your de-esser is constantly engaged, your additive EQ is too aggressive — pull back the high shelf or the presence boost. The de-esser is fixing a problem your earlier moves created; if you tune the earlier moves better, the de-esser does less work.

Modern de-essers offer two modes: split (the offending frequency is dynamically attenuated) and wideband (the entire signal is compressed when sibilance is detected). Split is more transparent. Wideband sounds more 'pulled back' on sibilant words. Split is the default for transparent vocals; wideband is occasionally useful for very aggressive sibilance.

SATURATION. Light tape, tube, or transformer saturation adds harmonic richness without obvious distortion. A 1-3% drive on a tape emulation, or the 'low' setting on a tube preamp emulation, is enough. Saturation makes vocals sound expensive and finished. Skip this if the vocal already sounds rich and characterful; add it if the vocal sounds clinical and digital.

The reason saturation matters: digital recordings are harmonically pure. Analog recordings (tape, tubes, transformers) add subtle harmonics that human ears interpret as 'warmth.' Saturation plugins recreate this analog-style harmonic content. The result is a vocal that sounds like it was recorded on real gear, even when it was recorded into a clean digital signal chain.

REVERB SENDS. Use sends, not inserts. The reverb bus lives separately so multiple sources can share it and so you can EQ the reverb return without affecting the dry vocal. Send -15 to -25 dB to the reverb bus.

Plate reverb (1.5 to 2.5 seconds decay) is the safest choice for most styles. Pre-delay of 30-50 ms keeps the dry vocal forward while still adding space. EQ the reverb return: high-pass at 300 Hz, low-pass at 8 kHz. This keeps mud out of the low end and harshness out of the top, leaving a band of 'air' that sits behind the dry vocal without competing with it.

Genre-specific choices: pop and singer-songwriter use plate (1.5-2.5s); ballad pop uses hall (3-5s); hip-hop and modern R&B usually use room or no reverb at all (the vocal stays dry and immediate); country uses plate or hall (1.5-3s). Match the genre's convention before deviating.

DELAY SENDS. Delay is where modern vocal production gets interesting. Three standard delay treatments are widely used:

Quarter-note delay synced to tempo, sent at -20 to -30 dB below the dry vocal, adds depth without being audible as a discrete echo. The delay essentially fills the spaces between phrases.

Slap delay (80-120 ms, single repeat, no feedback) adds vintage rock-and-roll character. Famous on Elvis recordings and used widely in country and rockabilly to this day.

Both can be sent to the reverb afterward for a delay-then-reverb texture that adds depth without adding obvious wetness. This stacking is the secret weapon of modern vocal production.

BACKING VOCAL TREATMENT. Backing vocals are the difference between a verse that sounds modest and a chorus that sounds enormous. The treatment is fundamentally different from lead vocal treatment: backing vocals get less low-end, less presence, more compression, more reverb, and they sit further back in the mix. The lead vocal must always be the most forward, driest, most present element.

Three categories of backing vocals: unison doubles (the lead vocal recorded again, panned hard left and right at -12 to -18 dB below the lead, EQ'd more aggressively); harmonies (a third or fifth above or below the lead, panned 60-80% wide at -8 to -12 dB below); ad-libs (the responsive lines that fill space, panned where the lead is not, EQ'd with character).

Modern pop chorus stacks have 8-16 backing vocal tracks total. Each pair is panned wide. Each layer is heavily EQ'd to take up only its specific frequency band, so 16 voices fit in the same space that 4 voices would otherwise occupy. The backing vocal stack creates the cinematic stadium-chorus effect that defines modern pop production.

BUS PROCESSING ON THE VOCAL BUS. Route the lead vocal and all backing vocals to a single stereo aux. On that bus, apply light bus compression (2:1 ratio, slow attack, 1-2 dB reduction) to glue the vocal stack together. Optional: a saturator on the bus adds harmonic richness across all voices simultaneously. Optional: a parallel compressor (slammed copy blended back in at -10 to -15 dB) adds presence and excitement without compressing the dry vocal further.

Bus processing is what makes a stack of separately-processed vocals sound like a single coherent vocal performance. Without it, each voice sounds processed individually; with it, they sound like they belong together.

REFERENCE TRACKS BY GENRE. The single most-skipped pro workflow is reference tracks. Load 2-3 finished, mastered vocals from songs in the same genre and emotional zone as your song. Level-match (LUFS or RMS) so neither is louder than the other. A/B during mixing decisions. Pop reference: Billie Eilish 'bad guy', Adele 'Hello', Taylor Swift 'cardigan'. Rock: Foo Fighters 'Best of You'. Country: Chris Stapleton 'Tennessee Whiskey'. Singer-songwriter: Phoebe Bridgers 'Motion Sickness'. Hip-hop: Kendrick Lamar 'HUMBLE.'. Folk: Bon Iver 'Holocene'.

Listen for: tonal balance (is the reference brighter, darker, mid-heavier than yours?); vocal placement (is the lead more forward or further back?); compression depth (does the reference's vocal feel more or less dynamic?); reverb amount (how dry or wet is the reference?); air content (how much top-end does the reference have?). Each of these tells you what to adjust in your mix.

COMMON MISTAKES THAT KILL AMATEUR VOCAL MIXES. The patterns that separate amateur from pro vocal mixes are almost always the same: too much compression early in the chain (kills the life), no high-pass filter (muddy mix), no de-essing (painful highs), reverb without high-pass on the return (smears mud across the mix), trying to fix tuning with EQ (use Melodyne or Auto-Tune, not EQ), no reference tracks (drift compounds across the mix session), and too many plugins doing too little each (six EQs each cutting 0.5 dB is worse than one EQ doing the same total work).

The single biggest improvement most amateur mix engineers can make is to high-pass everything at 80-130 Hz, cut the 250-400 Hz mud zone by 3-4 dB, compress with two stages (3-5 dB each) instead of one (8 dB), and reference against a finished pop vocal. Doing those four things alone produces a dramatically more professional vocal sound.

MIXING FOR TRANSLATION. A vocal mix that sounds great on studio monitors but disappears on a phone speaker has failed. Vocals must be audible and intelligible on every playback system: studio monitors, accurate headphones, AirPods, phone speakers, car stereos, Bluetooth speakers. The single biggest factor in vocal translation is the presence band (3-5 kHz). If the vocal has enough energy in this band, it will cut through on small speakers; if it doesn't, the vocal will feel buried on phone playback.

Always check your mix on the actual playback systems your audience uses. After mixing on monitors, check on AirPods, phone speakers, and your car. Identify what gets lost on each system and decide whether to fix the mix or accept the limitation. Most home producers mix exclusively on one system and ship mixes that don't translate; reference checking on multiple systems is the cheapest improvement available.

WHEN TO STOP. The last skill in vocal mixing is knowing when to stop. Every vocal mix can be tweaked indefinitely. The discipline is to recognize when additional changes stop improving the mix and start risking it. The marker: when you make a change, A/B with bypass, and you cannot tell which is better. At that point, stop. Print the mix. Move on.

The best vocal mixes feel inevitable rather than processed. The vocal sits where it must sit in the mix. The dynamics feel natural. The character of the singer is enhanced rather than masked. Achieving this takes practice, reference listening, and the discipline to leave well enough alone. The pipeline in this guide gives you the technical framework; the artistic judgment of when to stop comes only with thousands of hours of focused listening.

Apply this pipeline to one vocal you have on hand. Print the result. Wait 24 hours. Listen again. Compare to a reference. Note what is missing or excessive. The gap between what you produced and the reference is your next month's practice list.

Start Free