How digital audio ate itself and the music industry
Part One: The birth of a new science
The numbers game
U-matic companion: Sony's PCM-1630 adapter
With digital, go too loud and you’ve run out of road. Excessive peaks can’t be digitally encoded accurately because the A/D converter simply hasn’t enough numbers to do it, and so it just flatlines at the top level. There’s no harmonic pleasantries to be had up there, just a disturbing clipping effect. Yet go too quiet, and you get granulation noise – it’s when the A/D conversion can’t detect significant changes in level and so the binary signal stays at a high or low state for "unnatural" durations which deliver erroneous tones to the quiet passages. Sounds like a problem or two there.
Well, the first one is fixed easily: don’t record too loud, or have a limiter (dynamics processor) in tow to flatten any wayward peaks. Going from 16-bit to 24-bit converters also allows significantly more headroom too. The second issue with low-level recording is fixed by introducing a randomising element, so those quiet passages don’t sound weird. And you know what that randomising element is? Noise. Low-level noise, called dither, fixes the problem at the other end of the dynamic range. So don’t ever let anyone tell you digital systems aren’t noisy – digital audio depends on analogue noise patterns to mask the presence of its own artifacts, granulation noise.
As for the sample rates, have you ever wondered why the seemingly arbitrary 44.1kHz and 48kHz were chosen? So, let’s start off with the upper limits of human hearing, say 20kHz. Now let’s double it, because A/D conversion needs to capture the peaks and the troughs of sine waves at this frequency, and let’s add a bit more top end for good measure.
Now, what can we record all this data on the technology available back in the late 1970s? Enter Sony’s trusty U-matic video recorder.
Before DAT, Sony migrated its U-matic-based
digital recording to Betamax.
Rather than have tape spinning at enormous speeds, you could use a rotary head machine and spray the stereo digital audio data as a multiplexed video signal diagonally onto tape. Both the 44.1kHz and 48kHz sampling frequencies were derived because they were mathematically convenient for both PAL and NTSC U-matic recorders. For early CD mastering, these video tape recorders were fitted with PCM-1610/1630 (pulse code modulation) adaptors and the fact remains that the 44.1kHz sample rate of most of the music we listen to today has its origins in a video recording system that went into production exactly 40 years ago.
Still, the ideas behind the U-matic (and later, Betamax variants) as the CD mastering machine were not without merit and RDAT, in essence, was just a miniaturised version. And whereas the analogue VCR had led the way to provide a storage mechanism for digital recordings, the RDAT recorder evolved into a convenient storage system for back-up and archiving computer data. Sony only discontinued RDAT in 2005. Eighteen years is a pretty good innings for a format.
Perhaps more interesting still, are the parallels between the first analogue noise reduction systems from the likes of Dolby and dbx, and later compression techniques in the MP3 era, in use today.
Dolby B-type Signetics IC from 1973
Back in the day, just about every pre-recorded cassette tape had Dolby B stamped on the cover. The Dolby NR (noise reduction) technology owed a lot to a process called "compansion" – compression on recording and expansion on playback – with a few clever tweaks. Compression meant attenuating the loud bits and leaving the quiet passages unaffected. It has many creative uses, and crucially with analogue recording it enabled quieter sections to be recorded at higher levels (minimising the noise) because the louder sounds were being levelled out automatically, thus avoiding overloading.