AUDIO BASICS
The characteristics of any wave, and therefore any sound, can be roughly described by using two simple variables, frequency and amplitude.
Frequency
It is a measure of how frequently a wave cycle repeats, which is calculated in cycles-per-second, or Hertz (Hz) for short. Frequency is related to pitch, our perception of whether a sound is a low rumble or a high squeal. The frequency range of human hearing is theoretically from 20 Hz at the lowest end to 20,000 Hz at the high end (your actual mileage may vary). You'll often see high-frequency numbers written as kiloHertz (kHz), which is metric-speak for ‘thousand Hertz’.
Amplitude
The second variable that describes a wave is amplitude, which is a measure of the wave's energy level. Amplitude relates to our perception of volume or loudness. Big waves with lots of energy are high-amplitude, and sound loud. Small waves with little energy are low-amplitude, and sound soft or quiet. Amplitude is measured in decibels (dB). Decibels are talked about in a couple of different ways. When measuring sound pressure levels, zero dB is defined as the effective bottom limit of human hearing, the point of silence. 120 dB is the effective top limit or human hearing, the veritable threshold of pain, exemplified by the sound of a jet aircraft (or a Who concert). Another common use of decibels is the full scale measurement. For example, the dynamic range of a compact disc is 96 dB, with 0 dB representing maximum loudness and -96 dB representing silence. We'll dispense with the dBSPL measurement; assume that we're talking about dBFS (Full Scale).
Pitch
As we have seen before, pitch is the term for how high or low sound is perceived by the human ear. It is determined by a sound’s frequency. Middle C on the piano, for example, vibrates at 261 cycles per second and its frequency is measured in Hertz (Hz). The higher the frequency, the higher the pitch. But most sounds are a mixture of waves at various frequencies, and musical tones always contain many pitches, known as harmonics.
Here is a harmonics series (N, 2N, 3N, 4N according to Fourrier’s Theorem):
· 200 Hz: fundamental
· 400 Hz: second harmonics
· 600 Hz: third harmonics
· 800 Hz: fourth harmonics, etc.
The various frequencies that comprise a sound can be amplified or reduced with equalization to change the sound’s overall tone and character.
Timbre
This notion is difficult to quantify. Timbre is defined as the tone, color, or texture of a sound. It enables the brain to distinguish one type of instrument from another.
Effects
Sound waves reflect and disperse off various surfaces in our environment such as the walls of a concert hall. We rarely ever hear the pure direct vibration of a sound wave before it is masked or altered by the coloration of thousands of small reflections. Sounds are coloured by the material and substances they travel through. Changing the environment creates changes in tone quality, equalization and timbre of a sound. By using audio effects, you will create these changes yourself.
EQUALIZATION, GATES & DYNAMICS PROCESSING
Equalization
EQ for short is best known from the bass and treble knobs found on any home stereo. In the most basic scenario, the range of frequencies across the audible spectrum is divided into two bands by a filter; one band contains the low end (bass) and the other contains the upper range (treble). You'd use the bass and treble controls to boost or cut the volume energy of the signal within that band.
Graphic EQ
A more complicated type of equalizer you may have seen is the graphic EQ. The typical graphic EQ filters the frequency spectrum into many bands, perhaps ten or twenty, so you can make more precise adjustments to the sound by boosting or cutting the volume level of narrow frequency ranges.
Gates & Dynamics Processing
A gate is a common audio circuit which lets you turn on, or off, the flow of a signal. The gate continuously measures the signal which is being fed to it. If the input signal is at a low amplitude (quiet) then the gate stays shut, allowing no signal to pass. If the amplitude of the input signal rises above an arbitrary line (i.e., is ‘loud enough’) then the gate opens and passes the signal to its output. This arbitrary ‘loud enough’ line, which triggers the gate's opening and closing, is known as the threshold.
Downward Compression
A simple use for a gate is the process known as limiting. A limiter measures an input signal; when the amplitude is below the threshold, the signal passes untouched. As the input amplitude rises above the threshold, attenuation (cut) is applied to the signal, so as to reduce unwanted peaks in the audio material. This is also commonly known as downward compression.
Upward Compression
This is a process similar to limiting, except in this case gain (volume boost) is applied to signals which fall below the threshold. This increases the volume level of soft passages; signal that exceeds the threshold is passed unamplified.
The Compressor/ Limiter
A common studio tool is the compressor/ limiter, which is typically a hardware device that combines the two functions described above. Imagine that you're watching a volume meter and you have your hand on a volume knob; when the signal is low you crank it up, when the signal is too hot, you turn it down. Thus, soft program material is boosted in volume, loud program material is dropped in volume, and the dynamic range (the difference between softest and loudest) of the signal is reduced. Compressor/limiters are very useful for smoothing out uneven volume levels in recordings.
The Expander
An expander is essentially the opposite of a compressor/limiter; it expands the dynamic range by exaggerating the differences between soft and loud passages. Expanders attenuate (cut) the volume of low-amplitude signals and/or add gain to (boost) the volume of high-amplitude signals. The process of attenuating low-amplitude signals is called downward expansion; the corresponding process of adding more gain to signal peaks is called upward expansion. Downward expansion is helpful for noise reduction.
DIGITAL AUDIO, SAMPLE RATE AND RESOLUTION
From Analog to Digital
As you know, a sound wave is a series of periodical vibrations. A microphone, for example, ‘translates’, or converts, these acoustic waves into electrical ones. At this analog state, every new conversion will degrade the sound a little more. Even the smallest amplitude variation provokes a distortion of the signal; every copy brings along a flattening, a loss of dynamics, more background noise, etc. With digital sound on the contrary, making a copy equals copying a list of numbers, a trifle for the computer. The most current format for the digital representation of an audio signal is PCM (Pulse Code Modulation); sound waves are translated into a series of numbers. When we use a mike to convert sound into electrical signals, the latter is then translated into a numeric value by an ADC (A/D converter, Analog to Digital Converter). And, as it is impossible in the digitizing process to record the infinite number of data that characterize a sound wave, samples are selected at regular intervals, like ‘snapshots’ of sound, with the sample rate corresponding to the number of samples per second. The digital signal is therefore discontinuous. It is neither definable at every moment, nor for every amplitude; the computer will have to reconstruct the wave form by stringing the samples back together, more precisely, by calculating the most likely curve between two samples.
Sample Rate
Sample rate has a direct bearing on two things, audio quality and file size. So, when sound is being converted into digital information, the number of samples has to be considered. And that’s where Nyquist & Shanon’s Theorem comes in: Sample rate must be equal or superior to twice the maximum frequency of a given signal. Why? Sample rate defines an audio file’s upper frequency limit. As we have seen, the human ear perceives sounds up to about 20,000 Hz. This means that the sample rate should be at least 40,000 Hz. Luckily, many applications can handle relatively low sample rates. The human voice, for example, contains frequencies around 10 kHz; it theoretically needs a sample rate of 20 kHz. Nevertheless, at 4 kHz, that means a sample rate of 8 kHz, the human voice is still comprehensible and this is what the telephone uses, for long distance transmissions. But sometimes, big surprise! So you better always make some tests and systematically listen to the results. When using a low sample rate, too low with regard to the frequencies of the original audio signal, you get aliasing, a special sort of background noise / distortion.
Some standard sample rates:
· 32 KHz: digital FM radio (bandwidth limit 15 kHz)
· 44.1 KHz: professional audio and audio CD
· 48 KHz: recording standard for MiniDisc and some DATs, as well as some professional digital multitrack recorders
· 96 KHz, and up to 192 KHz (2 x 96): DVD
Bit-rate Resolution
Bit-rate resolution is another key factor for defining digital audio quality. It’s the number of values used to digitally represent data and determines how precisely a sound’s dynamic range is represented. To understand this notion, a little detour on the binary system’s wild side… The binary system is based on two values only, 0 and 1. Binary coding produces a digital signal composed of a series of numbers called bits (short for binary digits), organized in a very specific way. An 8-bit series is called a byte; it has 2
8 (or 256) possible combinations between 0 and 255 (from ‘00000000’ to ‘11111111’). 16 bits have 2
16 (65,536) combinations, and 24 bits have 2
24 (16,777,216) combinations, 256 times more than 16 bits! That’s why resolution is essential for sound quality. Remember, an audio CD has 16 bits. Practically speaking, 16-bit-files have a better signal-to-noise ratio than 8-bit-files, which means they have much less audible noise. But, the lower the sample rate, the lower the resolution, the smaller the audio file in terms of memory or binary volume. And this is where the dilemma begins… the choice of sample rate and bit-rate resolution will drastically define sound quality.
Keep in mind:
· Sample rate, expressed in kHz, corresponds to the number of samples per second. It has to be equal or superior to twice the signal’s maximum frequency.
· Digital resolution, expressed in bits, is the number of values used to represent digital data.
· The quality of digital audio is defined by both sample rate and bit-rate resolution.
Digital Audio Compression
A codec, coding and decoding, corresponds to a whole set of compression and decompression algorithms. There is a surprising number of codecs; it would be difficult to make their list complete. The compression bit rate corresponds to the number of bits that one second of data occupies in the compressed file. You can also talk about compression ratio or compression rate, and express it like this: 10 : 1, 12:12, etc. Digital compression is close to the methods used for A/D converting.
There is two types of digital compression:
· Destructive, or lossy compression, compression with loss of data, that eliminates bits, sometimes without loosing quality
· Undestructive, or lossless compression, with no data loss, corresponding to a set of algorithms which preserve the original data by way of a compression / decompression process.
Destructive compression is based on the fact that humans almost never hear frequencies above 20 kHz; that’s why it’s also called perceptual encoding. It also takes advantage of the fact that certain frequencies are masked by others. For comparison, the JPEG format used for images is based on destructive compression.
Compression technologies can be standard or proprietary. The Moving Picture Expert Group works under the co-direction of both the ISO (International Standards Organization) and the IEC (International Electro-Technical Commission), in order to establish video and audio compression standards. The MPEG format is a type of audio compression based on the perceptual encoding techniques mentioned above. In the audio field, the most popular format (and one of the most powerful within the MPEG family) is MPEG level 3, MP3 for short, developed in 1987 by the Fraunhofer Institut. Level 3 allows a reduction of down to 1/12th the size of the original signal without much sacrificing quality. But careful, once again everything depends on the original signal. An example: MP3 is excellent for electronic music, but much less so for jazz, classical, and other acoustic music, the latter generating many harmonics which, as you know, determine timbre and tone color of an instrument. MP3 very badly digests accumulated harmonics series and might transform them into some sort of ‘mash’ in songs with many acoustic instruments.