Digital Sampling and Fidelity Issues
From DSPWiki
Analog vs. Digital
The difference between analog and digital recording, is that with analog recording, a continuous representation of a signal is recorded, and the voltage carried along a wire represents the signal. With digital recording, the signal is stored and transmitted in 1s and 0s. The signal exists as voltage only before the ADC (Analog to Digital Converter), and after the DAC (Digital to Analog Converter).
Analog: Voltage in wire represents signal Digital: 10100101001001
Positive and Negative Frequency
All real signals have equal amounts of both positive and negative frequencies. According to Jorge Costa, some guy I met who's writing a Reaktor-like program called Outsim [url]http://www.outsim.com[/url], "One way to realize that the negative spike exists is to shift it up in frequency until it becomes another positive spike. That can be done by multiplying two sine waves. In general, when we multiply a wave by a sine wave, we're convolving the two frequency domain transforms which is the same as shifting all the wave's frequencies so they become centered around the frequency of the sine wave rather than the zero frequency origin. That way we can understand that the negative frequencies exist".
Nyquist Frequency: The Nyquist frequency is the highest frequency which needs to be sampled for an adequate representation of the sound.
Fidelity Issues
The main fidelity issues in recording digital audio are the sampling rate, the bit depth, and the encoding scheme.
Sampling Rate:
Frequency Limits of Human hearing: 20-20,000 Hz Sampling Theorem: In order to create an adequate digital representation of an analog signal, the sample rate must be twice as high as the highest component to be sampled.
Typical rates are: 8 kHz (dark and dull, but sufficient for speech intelligibility) 11 kHz 22 kHz 44.1 kHz (bright and clean) 48 kHz
Fletcher and Munson Curve:
The Fletcher and Munson Curve shows that it's more difficult to hear low sounds (< 500 Hz), and high sounds (> 10,000 Hz). Fletcher and Munson performed these tests at Bell Labs by asking people to judge when pure tones of two different frequencies were the same loudness.
Standardization of CD Format:
Why 44.1 kHz?
Phillips and Sony met with lawyers to standardize and cross-license the CD format. The 44.1 kHz sampling rate was a compromise, but also a typo in the document. It was easier to just use 44.1 kHz than it was to change the documents. --Sony VP
From John Watkinson, The Art of Digital Audio, 2nd edition, pg. 104:
"In 60 Hz video, there are 35 blanked lines, leaving 490 lines per frame or 245 lines per field, so the sampling rate is given by :
60 X 245 X 3 = 44.1 kHz
In 50 Hz video, there are 37 lines of blanking, leaving 588 active lines per frame, or 294 per field, so the same sampling rate is given by
50 X 294 X3 = 44.1 kHz.
The sampling rate of 44.1 kHz came to be that of the Compact Disc. Even though CD has no video circuitry, the equipment used to make CD masters is video based and determines the sampling rate."
Foldover in Time Domain:
Foldover, which is also called aliasing is caused by undersampling, which occurs when a signal is sampled that is higher in frequency than one-half the sample rate (SN/2). During playback, the reconstructed signal will sound at a different pitch than the original waveform, which causes catastrophic problems in recording. As long as there are two samples per period for the original waveform, the resultant waveform will have the same frequency. If the signal has been undersampled, the resultant wave frequency can be found with the formula: new frequency = sampling frequency - original frequency.
Foldover / Aliasing problems lead to the Nyquist Theorem, which states: In order to be able to reconstruct a signal, the sampling frequency must be at least twice the frequency of the signal being sampled
Digital Spectra:
This image shows, in the frequency domain, that when a signal is sampled digitally, there is a replica of that signal centered around every integer multiple of the sampling rate. This isn't a problem if the sampling rate is twice the Nyquist frequency, and if signals above the Nyquist frequency are filtered out when recording, however...
...this diagram shows Foldover in the Frequency Domain, which results when the original signal is not filtered above the Nyquist frequency. Signals that normally be above the audible spectrum (20+ kHz) become audible at the wrong frequencies when they fold over.
Bit Depth
Since samples are represented as integers, amplitude values must be quantized to fit into the discrete numbers allowable by the bit depth. This inaccuracy causes digital noise, which is called quantization noise.
Distribution of Error:
Signal to Noise Ratio (S/N):
For every bit in the sampling bit depth, you get about 6 dB in the signal to noise ratio, so :
maximum dynamic range in decibels = number of bits x 6
dB = 20 log 2^N / 1
N = # of bits in bit depth
Example signal to noise ratios:
2 bits = 12 dB 4 bits = 24 dB 8 bits = 48 dB (suitable for voice) 12 bits = 72 dB 16 bits = 96 dB (CD quality) 24 bits = 144 dB (pro audio)
Why it's a good idea to use 24 bit audio:
If the original recording is made at 16-bit, at peak amplitudes, we have a 96 dB signal to noise ratio. If you record a weak signal that only uses 8 bits, the signal to noise ratio is reduced to 48 dB. If that signal is then filtered, the gain may reduce the to signal enough where only 4 bits are being used to store the signal, and the resultant signal to noise ratio is 24 dB.
If we were to have used 24 bits, the signal to noise ratio would have been maintained at 48 dB, because there is still room for the full 8 bits of audio recorded. You don't gain any information working in 24 bit, but you don't lose any either.
Fractional Accuracy: The peak voltage of 16 and 24 bit is the same, so the extra bits are just fractions of 1.
Dynamic Range Problem
Unlike analog recording, in a digital environment, the signal to noise ratio isn't constant across intensities. At 16 bits, the 96 dB S/N only applies for loud recordings that use all 16 bits. In quieter sections, like in classical music, the recording doesn't use all 16 bits, and the S/N is much worse. This digital noise is called quantization noise. Digital values of the sampled signal cannot take on any conceivable value. Since samples are represented as integers, amplitude values must be quantized to fit into the discrete numbers allowed by the bit depth.
At low amplitude levels, harmonics are added by the lack of sufficient bit-depth, and the added harmonics fold over (which is the reason behind the noise on my synth).
Dither
Dithering solves the dynamic range problem by adding low-level decorrelated noise to the signal. The noise creates randomness that our auditory system perceives as not being related to the signal, and we ignore it. The problem with digital noise is that the noise and the signal sound like they're related.
A constant level of random noise is better because it is not correlated to the signal, and we *perceive* it as better fidelity.
Dithering may not be necessary for 20 or 24 bit recording because the S/N ratio is high enough to make it hard to hear the 1-bit signal, but dithering is necessary when converting to 16 bit from a higher bit-depth.
Signal Reconstruction
An ideal reconstruction filter hasn't been found, and building a filter with such a sharp transition (brick wall filter) near the upper end of human hearing causes audible artifacts and phase distortion. These brick wall filters were the cause of early complaints that digital recordings were "harsh". No analog filter can be both extremely steep and phase linear around the cutoff point.
Oversampling: Oversampling solves many of the problems caused by the lowpass filters used in signal reconstruction.
-Multibit Oversampling: Convert 44.1 kHz to 176.4 (4x) or 352.8 (8x)
-1-bit Oversampling: 1 bit at a high rate (128 x 1-bit = 8*16 bits) Also called "sigma-delta" or "MASH"
With oversampling, no filter is needed because the sampling rate is at such a high frequency that it ceases to be a problem.
The resulting noise is lower:
4x -> 6 dB 8x -> 12 dB
However, in a computer, the ground isolation of the digital noise overwhelms everything else. Outboard gear has the analog circuit isolated from digital circuitry.
