Throughout most of the 70s and early 80s his downtown Boston facility was New England's top advertising shop, with three mix rooms, a large support staff, and a steady stream of work from national clients like McDonalds and Miller Beer as well as local advertisers and politicians. He sold the studio in 1984 and went to work for an even larger film/video/sound company, Century III (now the post-production facility at Universal Studios Florida). In 1988 he left Century to build his personal-use advertising studio, the Digital Playroom, which is where we caught up with him. We'll describe some of the Playroom's toys later in this article.
Jay's won hundreds of major ad awards (including a dozen Clios), helped Eventide and Orban fine-tune audio production products, and been an officer of the AES. He agreed to this interview to share some of his techniques and tell Recording's readers how a small studio owner/engineer can succeed in the ad business.
RM Let's start with technique. What's the most important part of a good commercial production? Mic technique? Processing?
JR I'll treat that like a trick question. The most important part of a good spot is the communication. Everything else, and I include the writing and performance as well as the mix, is secondary. If you can link the product's message to what the listener already believes -- something pioneer sound philosopher Tony Schwartz calls striking a responsive chord -- you've got a good ad. Otherwise it's just another commercial on the radio.
But once you buy into that, the engineering implications are obvious. Mic technique and processing are there to help you get the words across more clearly. So is directing the talent, but that depends on the situation. And of course there are specific techniques for the mix, but let's talk about all of that in a moment. By far, the most important skill the engineer brings to an ad session is the ability to edit a voice track.
RM Because sometimes the announcer makes a mistake?
JR If they're any good, they can read sixty seconds and get all the words right. But a sixty second commercial has to take the listener on a journey. There are a lot of concepts and emotions to get across. No human being can read each one perfectly, each time. So you have to take the best parts from each reading and join them together in a coherent whole. I'll use as many as a dozen different takes in the same spot, sometimes three different takes in the same sentence, and occasionally three different takes in the same word.
This isn't a knock on the actors -- I'm more likely to do this kind of intense editing with a James Earl Jones than with a local car dealer, because the dealer is consistently awful. Think of it as the logical extension of Glenn Gould's ideas about recorded vs live performance. You do the multiple takes, and make notes on the script where each take has a golden moment. Then you splice the golden moments together and you've got a golden reading.
You also edit to make a perfect reading even better. A good commercial has a definite rhythm, and the rhythm is determined by the words, not by the music. With editing, you can stretch or close pauses to make the rhythm work better. A thirtieth of a second between words can make a world of difference. Try it some time: Cut the pause and the glottal shock -- the little 'click' sound -- between two words and you'll turn a stiff reading into a friendly one. Add a similar pause between two words and you'll stress the second one... even though it isn't any louder. I'll often add a pause right before the client's name.
There's also the nasty habit announcers have of breathing. In normal conversation you're not aware of the breaths. But an announcer, knowing he has to cover three or four lines of script, will take a big suck of air. Even the catch-breaths in the middle of a sentence are unnatural. Besides, no matter how subtle the breaths are in a raw take, they'll turn into hurricanes by the time you've finished compressing. Editing breaths out also helps the pacing: you can usually replace a breath with a pause that's only two-thirds of its length. The sentence sounds exactly the same, but the spot moves better.
Back in the days of quarter-inch tape, I'd spend twenty minutes editing and repacing a spot. This could be as many as forty splices over the course of a sixty-second reading. Even though the agency folks are impatient by nature, they'd sit back and read their newspapers because they could hear how much it improved the reading. Of course, now I do as many edits on the Orban [DSE 7000 workstation] in less than five minutes.
RM So what's your advice on editing?
JR Two things: first, don't waste a generation recording voice tracks directly to a multitrack. It's tempting to have the announcer nicely lined up against the music track, but you will have to edit. Even if you're using a DAW, it's a good idea to record everything to a more stable medium like DAT. Then transfer the pieces you want to edit; it'll take only a minute or so. This isn't to guard against system crashes -- most of today's DAWs are rock-solid and will run forever -- but in case you or the client decides to try a different approach next week. One of my agency-producer friends described a session where the engineer, raised on music production, naturally recorded the announcer to multitrack. As the talent was packing up, he told the engineer to "use the first paragraph from take 7, the middle from take 3, and the ending from take 12." "Oh, you wanted me to save all those passes?" Needless to say, this was both the engineer's and the producer's final session at that studio.
The second thing is build up your editing skills. Practice, and practice some more. Grab a newscast or political speech, and edit it to say something totally different. Learn how to edit inside words --look for hard consonants to hide your edit points -- and train yourself to listen to the melody of each word so you can tell in advance which syllables can fit together and which ones won't. By the way, this kind of precise editing is incredibly difficult if you're using software or a workstation that doesn't let you scrub through the voice smoothly. It's probably faster to edit a 1/4" dub, and then transfer it to the computer.
RM How about the voice recording itself? Any special techniques to share?
JR You want it as clean and as dry as possible. My favorite announce booths have reverb times under a half second. The back and sides are soft, and the front has only a moderate reflection (to reassure the announcer that he still has a voice). Then I place the mic -- usually a small-capsule cardiod like a CK1, so the announcer isn't tempted to work too intimately with a big fuzzball -- slightly above and at an angle from the mouth. Maybe a small foam screen on the mic, but no nylon mesh. The dead side of the mic points to that slightly live side of the room. If the studio doesn't have ideal acoustics (my present Playroom is a little too wet, since I use it both for recording and for mixing) I'll use the same technique with a short shotgun. You don't need the warmth and deep bass of a large capsule, since you're going to toss those frequencies out anyway. But you do need a crisp high end, so dynamic mics aren't usable.
But there's nothing magic about my mic technique. The important thing is to have a technique, and be able to use it quickly. Agency producers have no patience for long setups. You can see how tall the talent is, and can get a sense of how he'll stand, while he's warming up. As soon as he's ready to read, be ready to record. On the other hand, be ready to stop the session if the sound isn't good: the agency won't forgive you if they discover a problem after the talent leaves. If the talent has gone off-mic, fix it on the next take. If you start hearing dry-mouth snaps, offer to get some water.
RM "Dry-mouth snaps?"
JR Little clicks buried in the words, often in consonants like /l/ where the tip of the tongue has to snap against part of the mouth. It happens in dry climates, and during the winter where central heating removes moisture. What happens -- and this is slightly gross -- is that saliva starts to thicken up. As the mouth moves, it stretches and snaps. The mic and subsequent processing magnify the sound, and since it's a fairly serious click you can't filter it out. You can remove occasional snaps with a workstation, but if the talent's really dry it's a lot faster to give them a drink.
RM You keep talking about processing. Do you have a standard setup for that too?
JR Yes and no. I have a few standard things I'll do to every voice track, and some other things I do to the entire mix, but it's always tweaked to the individual announcer and spot.
For the voices, I'll start with a sharp rolloff around 90 Hz. Sounds below that are just wasting power without contributing to intelligibility, and the rolloff also acts as a pop filter. But it has to be sharp, or else you start thinning out the voice and everything sounds bad. Most console filters are too gentle, and you can forget about trying a cutoff with a graphic. Back in the analog days, I ran voices through a Crown VFX -- it's a sound reinforcement crossover, with 18dB/octave filters. I'd set it as if I were biamping at 90 Hz, but never hook up a woofer. These days I use a filter section in an Eventide DSP4000 UltraHarmonizer. The chassis says "Harmonizer", but it's also a very flexible equalizer and dynamics control.
Once the voice is filtered, I'll add a tiny amount of eq if necessary. Never more than 3 dB, just a gentle warming around 200 Hz and some extra intelligibility around 1.75 kHz. Then I'll crunch, first with a very slow look-ahead agc (it delays the signal 2/3 second and ramps the gain to accommodate peaks before they happen), then a 10:1 compressor around -15 dB with some extra de-essing in the sidechain, and then a very fast hard limiter at -2 dB. When you do all this right, you're not aware of any compression at all. There's none of that "AM Disk Jockey" sound, just an overall loudness. I've built all these processes into a standard voice processing patch that I keep in the Eventide, and I save different versions for different announcers.
This much processing requires a very clean signal path. Compression tends to make a distorted track sound even more distorted, and while a certain amount of fuzziness can be nice in music it never helps a commercial. Both the Eventide DSP4000 and the Orban workstation run at 24 bits, and I pass signals around as AES/EBU, so there aren't any extra conversions to add distortion. If I'm getting the announcer via ISDN or DAT, the signal never enters the analog domain at all (except for my monitors). This extra cleanliness means I can add more processing without a fatiguing, squashed sound.
Of course, not everyone values a clean signal. The operations manager of one of the larger Boston ad studios was once quoted in the trade press as saying "Don't waste time chasing down that last percent or two of distortion. It's a lot more important to have a neatly-typed label." I don't agree... but you can't deny they're making a lot of money.
RM I want to get on to neatly typed labels and agency psychology later. But first, let's deal with the production process. What happens after you've got a good voice track?
JR You need music. This usually doesn't mean a jingle, since most of the music used in commercials is instrumental. Surprisingly, most of that instrumental music -- even under national spots -- is from stock libraries. It's been carefully edited to fit the vo [voice-over] and the picture, but it's not written for the spot.
RM Why so much reliance on existing music?
JR It's a matter of economics. I can grab a CD with any style from grunge to country to symphonic, with clever arrangements and hot session players (some of the symphonic cuts were recorded by the London Philharmonic), and license the rights for a hundred dollars or so. Add half an hour of editing time, and maybe a string pad or some drum hits from the Kurzweil, and it's completely customized. It sounds like it was written for the spot. It's hard for a MIDI musician to compete with that.
RM So the library business has gotten big?
JR It's grown enormously over the past few years, driven both by the production needs of cable tv and multimedia, and by the explosion in low-cost digital audio technology. This can mean an opportunity for the small studio composer/engineer.
Stock music houses follow two business models: needle-down and buyout. The better needle-down libraries try for the best productions, charge a small amount for the discs, and make their money with a license fee every time you use a cut. Buyout libraries, on the other hand, sell the rights along with the disc. For an average of $75 -- less than a single needle-down license -- you can use any cut on the CD, in any production you do.
But there's no free lunch. Needle-down discs are usually jammed with interesting and usable cuts, because the publishers want as much chance as possible to sell you something you'll like. On the other hand, buyout discs are often padded with broadcast-length edits of longer songs, and the longer songs are mostly repeated choruses. I don't see any reason to pay for a :60 version of a piece, when you can edit your own in just a few minutes and fit the words better.
Both kinds of libraries often buy music from freelance composer/producers, sometimes as a buyout and sometimes on a percentage of the licenses sold. Competition is stiff -- a lot of national jingle and film scoring composers recycle their tracks through needle-down libraries -- but if your stuff is very good you can make some money this way. There's a listing of a couple of dozen music libraries with contact information available through my web page [www.tiac.net/users/jcrose]. But don't send them demos. Libraries are in the business of buying and selling finished tracks, not developing new artists. You'll have to produce the music yourself. You can also follow in the footsteps of a lot of MIDI musicians and publish your own buyout library, advertising it in the back of the radio and video production magazines.
RM How about sound effects?
JR You may need them, depending on the script. While you can always go out and record them to order, every studio should have at least a few sound effects discs standing by. Again, it's economics: it's a lot faster to cue up and play a car horn then to find a quiet exterior and record a friend's car. You should also gather some noisemaking gadgets -- hinges and ratchets, random pieces of wood and metal, a couple of cups -- for foleys [creating a sound to match on-screen action].
Professional sound effects CDs are a bargain. You pay between $50 - $100 per disc, but a disc can include hundreds of cleanly-recorded sounds. And the price includes a perpetual license to use the effects in your productions. You can also buy effects CDs for $15 at your local music store, but these are often of marginal quality and usually don't include rights. A few are even bad repressings of effects stolen from commercial vinyl libraries.
You can also download some effects from the internet, usually posted by movie enthusiasts who have digitized them off-air or from VHS copies. Aside from the quality issue, they're a definite infringement. Sound effects have been protectable by copyright since 1978. I have to assume that Recording's readers, particularly those who hope to make a living with their own creativity, respect the rights of other creative people.
RM Okay. So you've got the edited voice track, edited music, and sound effects. Any secrets for mixing a commercial?
JR You're mixing for a very tiny window. FM and TV transmitters cut off at 15 kHz, but most sets are useless above 12 k. Car radios rarely go beyond 9 or 10 k, unless they've been customized. The bass can be halfway decent, but there's no intelligibility down there. So you're really just mixing for the midrange.
Then you have to deal with the dynamics. Broadcasters are notorious for squeezing the signal into a 20 dB range, but they don't do it just to destroy the music. 80% of adults listen to radio in their cars, where the sound is mixed with traffic and highway noises. TV sets are typically in an echoey corner of the room, and compete not only with noise but with conversations. Without processing, most programs wouldn't be heard at all.
So you have to make most of this tiny window. This dictates not only the processing but also the mix technique, and how you move the faders. Broadcast mixes are usually more active than music ones. The music has to be loud enough to be heard against all this competition, but it can't obscure anything the announcer says. So you have to constantly adjust its level, often dipping the music on a syllable-by-syllable basis. Sound effects have to pop up since they're part of the message, but backgrounds can't get in the way of the voice or music, even if they're setting a scene. Your fingers are always moving: I'd say there are more fader moves in a good :60 spot than in most three-minute songs.
Music mixes build up from the bass, but a spot mix builds down from the voice. Make the voice sound right by itself, adding whatever compression and eq is necessary for strength. Then adjust the other elements so they complement it. I think it's a mistake to add reverb to the announcer track, unless you're looking for a special effect. Verb adds distance, not size... and you don't want to send the most important element to the background. This is totally different from a music mix, where you process the voice to make it sound nicer. Advertising is about communicating a message, and sounding nice is incidental.
Then compress the music or pull its mids out so it doesn't interfere with the voice. I'll sometimes bring music up on two sets of faders. One set is processed, squashed and with no midrange, to run under the voice. The other faders carry the full-fidelity music, so I can bring them up during holes in the voice. Keep those fingers moving.
RM What about monitoring?
JR Some people say you should mix commercials on awful speakers, since that's what the listeners use. But I've always found that leads to awful mixes. You tend to compensate for the specific monitor's problems, and end up with a mix that sounds even worse on other brands of awful speaker. You also pump a lot into the extremes of the band, where it's wasted. Worst case, you've got so much extra energy in the top and bottom octave that the station's limiters push your whole mix down.
I mix at normal listening levels, on a pair of JBL 4410s in the nearfield. Then I check the mix on Auratones. Then I check it on Auratones again, but at the softest volume I can hear. If the mix is good, you isn't any change in the voice when you switch speakers. The only differences should be that the music is slightly louder on the fullrange boxes, and that it really goes away (thanks of Fletcher-Munson) when you listen very softly. As a final check, I listen to the whole mix in mono. If the mix is good, the only difference is the size of the field.
Actually, there's one more way I check the mix. As much as possible, I'll listen to the spot on the air... even calling stations to get the broadcast schedule. If it sounds clean, strong, and full-range on the air then I've done my job right.
RM How do you get the final mix to the stations?
JR Most stations want a 7.5 ips open-reel dub, no n/r, no lineup tones, just a spoken slate with the name of the client, agency, and spot title, its running length, and whether it's stereo. Five-inch large hub reels are customary. Don't try to force extra quality on them: many stations don't want to bother with 15 ips, and DATs are very hard to cue. To make it worse, your tape doesn't go on the air. They take your 7.5 ips copy, and dub it to a continuous-loop cartridge or onto a compressed hard disk playback system. So I'll make those 7.5 ips tapes in realtime, directly from my digital master, on 456 or better tape, and on a well-aligned deck.
If there are a lot of stations I'll send a DAT to a dub house, and they'll run the open-reel tapes or ship it out via a private modem network like Digital Generations or Digital Courier International. If there are a heck of a lot of stations and a reasonable deadline I'll make a CDR and send it to a replicator.
TV tracks often go back to the video house for layback and duplication, so timecode DAT is the accepted medium.
RM Let's get to the toys in your Digital Playroom. You've mentioned the Orban DSE7000 workstation and Eventide DSP4000. Are those your primary tools?
JR The Orban is the most important piece of gear because it lets me work so quickly. It has eq and compression, and Lexicon reverb, built in. I have two DSP4000s, which may be overkill, but I wrote a lot of their production software so one of them serves as a development machine. Between those three devices I'm covered for just about any processing or trick sound I can imagine. Other digital tools include an Otari DTR90 timecode DAT and the ubiquitous Panasonic SV3700, a Marantz varispeed CD player and one of their CDRs, and a Zephyr. That last item is an MPEG device that compresses audio into low bitrate (128 kbps) data: I can call an announcer or radio station, and transfer 20 kHz stereo in realtime via ISDN. The quality is good enough to put on the air, and I've used Zephyr tracks on CDs. The digital signals are all switched through a Lighthouse 16x16 AES/EBU matrix.
There's also a bunch of analog processors from my past lives, but these days I use them just for the synths: a Kurzweil K1000 and AX+, a Roland JX8P for wooshes and laser effects, and a bunch of general MIDI modules controlled by an ancient, steam-driven Mac IIfx. I've also got BetaSP and U-matic video decks, and some other timecode equipment. The video sync generator is routed to the Orban and the Otari, so everybody stays nicely phase-locked.
Add about 400 library music and sound effect CDs, a few comfortable chairs and a spare phone for the agency folks, and you're in business.