But while Dan's hearing extends almost half an octave higher than mine, it doesn't make any difference in the productions we mix. Video sound just isn't a high-quality medium. It's limited both by technology and the way the industry operates. Understanding these limitations will do more to improve your tracks than having super-sensitive hearing. In fact, some of the assumptions made by the golden-eared hifi crowd are just plain wrong for today's media production.
This is not an abstract issue confined to broadcasters or set manufacturers. Any video producer who's creating for broadcast has to consider it as well. The problem is that conventional TV transmitters filter out signals above 15 kHz to protect their stereo subcarriers. But these filters aren't perfect. If there's a lot of high frequency audio in your mix, it can slip past and cause momentary interference. Most modern sets -- even mono ones -- mute the audio when this happens. Bottom line: a very bright track may have dropouts whenever there is sibilance, a cymbal crash, or any other sudden high-frequency burst. You won't hear this problem in the studio because it's actually happening in the home receiver.
So even if you know that your programs will eventually be broadcast digitally, you have to consider analog now or risk problems on the air. This wasn't a consideration until recently, because conventional on-line production naturally loses highs over multiple analog generations. But now that NLEs and audio workstations are capable of much better audio, you have to be extra careful. If you don't have a spectrum analyzer (see last September's Audio Solutions), use your equalizer very carefully and avoid any boosts above 12 kHz or so.
Most people -- even among the hi fi crowd -- have no idea how high that really is. Download the tiny "Hi_end.zip" file. It contains a high-quality .wav file with about four seconds each of tones at 7.5 kHz, 10 kHz, 12.5 kHz, and 15 kHz. Glance at it in a waveform editor, and you'll see that the tones are equally pure and at the same volume. Then play it on the best speakers you own. Chances are, if you hear anything at all for the higher frequencies, it'll be just the soft buzzing caused by distortion in your playback system.
[By the way the file is zipped because test tones are particularly friendly to that kind of compression, because of their extreme redundancy. Normal audio signals don't zip or stuff very well at all.]
Even if your audience is purely web-based or uses other digital media, extreme high frequencies are often more trouble than they're worth. They can cause problems in some equipment used for transmission and DVD mastering. And they can actually make compression algorithms like .mp3 and AAC sound worse at a given bitrate, by robbing data from the more important mid-range.
Some of my friends in music and film mixing have the luxury of working almost exclusively in surround. For broadcast projects, stereo is usually the nominal standard. But you should avoid the wide soundstage preferred by my golden-ear friends: good TV stereo is actually glorified mono.
The problem, again, is those darned viewers. Many don't have stereo receivers at all. Those that do often have the set off to one side of the viewing position, or sit so far away from the speakers that any stereo effect is destroyed. Some small home-theater setups have the channels reversed, and I've seen a few where one channel was completely missing. Stereo reception may be impossible in fringe areas, and some cable systems carry stereo satellite channels as mono.
For these viewers, some stereo sounds will seem softer than they did in your studio. A few may disappear entirely if there are phase errors. Either way, what they hear will be different from what you mixed. And your track can turn unintentionally funny if you've panning sounds across the screen to match visuals, but the channels are reversed at the home set or have been unintentionally flipped at the station or dub house.
Two things will help you avoid all these problems. First, adopt the networks' attitude to stereo: all dialog and plot-critical effects are centered mono (large effects can be stereo, but still must be centered). Usually the background ambiences, musical score, and crowd reactions are the only elements in true stereo. If you're mixing surround, don't send anything important to the rear: it will disappear for most listeners.
Second, check your mix frequently in mono. This not only alerts you to phase problems, it also keeps you aware of how the balance will be affected by width (a voice-over may sound fine against a wide stereo score in the studio, but be too soft when the image narrows to mono). Every professional mixing suite has a mono button in the monitor circuit. If you're mixing on a desktop, you can add one. Instructions are in my September 1998 column.
There's one area where the golden-ears types and I agree. When it comes to bits, more is better. The number of bits determines how much subtle detail can be recorded without being lost to system noise: each bit pushes the noise back another 6 decibels, which can translate to louder playback levels without noticeable hiss.
This makes the choice of 12-bit recording in your camera a thorny issue. Many cameras have a mode that shaves a few bits (and also lowers the sample rate, which is more benign) to accommodate 4-track recording. The extra tracks can seem attractive if you need to use extra mics but don't have anyone to control them at the shoot. With each mic on its own track, mixing can be postponed until post-production.
But it's self-defeating, because it's almost impossible to get good 12-bit sound without constantly tweaking the levels during recording. Figure 1 shows why.
Figure 1: Dialog can actually be much softer than its nominal meter reading, and approach the noise floor if you're using 12-bit mode.
Twelve bits represents 72 dB between the loudest possible sound (zero dBfs, where anything louder is just distortion) and noise. Camera manufacturers recommend staying below -12 dBfs as a safety margin. Most people record dialog with the meter reading around -18 dBfs for even more safety. That's still 54 dB louder than the 12-bit noise level, which is moderately acceptable.
But dialog doesn't stay at one level. Even in a reasonably controlled performance, important parts of some words can be 40 dB softer than others in the same sentence. That's barely two bits above the noise floor. When a viewer turns up the volume to hear those words (or you use a level compressor on the mix, which accomplishes the same thing) the noise will be overwhelming. Only a true 16 bit recording would let you capture those sounds cleanly.
Twenty-four bits is becoming a standard at film shoots, even though the final release format is often 16 bits, because there are advantages to using the highest possible word length whenever possible. It's a good plan for video production as well: if you absolutely must shoot in 12-bit mode (and I don't recommend it), convert the clip to 16 bits as soon as it enters your NLE. If you can't do that in software, consider redigitizing with a good sound card or external converter. It won't eliminate the noise -- virtually nothing can -- but will prevent things from getting worse when you equalize, add fades, or mix the track.
Jay Rose (firstname.lastname@example.org) is a Clio- and Emmy-winning sound designer and author of Producing Great Sound for Digital Video. Read about it and his other audio antics at www.JayRose.com. Jay's formerly-golden ears are now barely silver, matching clumps of his hair.