Click for Mini Pro Dual Drive!
|Accelerate Your Mac! |
Bring in the Noise
by Thad Brown
THE LONG AWAITED MPEG LAYER-3 AUDIO SPECIAL REPORT
Well, folks, at long last, a bit of hopefully useful information about MPEG-3 audio. Now the first thing to note is that I already used incorrect terminology, MPEG-3 doesn't exist, but MPEG Layer-3 does. The second thing is, get a life, I'm going to be pretty loose with terminology, audio compression is not simple stuff, if you don't get too picky I won't either. There are some folks where I work who are writing video compression schemes on SGI boxes, and I am sure they could set this all straight better than I, but they don't work for free and I do. So, here we go.
A LITTLE HISTORY
You may not know this, but MPEG audio wasn't invented so kids on Hotline could have the same freakin Portishead and Korn songs on all their servers. Nope, MPEG stands for Motion Pictures Experts Group, and they are a bunch of very serious white lab coat types who set standards for the ISO. The original plan, as you might glean from their name, was to get together a coding and decoding system (henceforth codec) for broadcast video. My satellite dish (which lets me watch up to thirteen NFL football games every weekend in the fall) uses an MPEG codec to give me rather smashing sound and picture. Thanks to MPEG Layer 2, I get not just the crisp picture of some behemoth of a linebacker hitting some poor guy, but also the sound of the crunch of the pads, the smash of the helmets, and the whimpering sounds that follow. The audio end of MPEG was originally mostly to accompany the video, though now there are other uses and potential uses for audio only and voice only transmission.
The whole point of doing this compression is that with current technology it is not possible to send enough bits over most of the available transfer schemes to play back broadcast video or stereo audio. One minute of stereo audio is roughly 10MB of data, and under optimal conditions, a 28.8 modem will take a bit under an hour to download that file, or put another way, with two 28.8 modems dedicated specifically to transmitting the audio on one full CD, it would take roughly three days to download. It's useless to go into specific data throughput rates, but suffice it to say that while a DVD can store over 4GB of data, for a movie to fit on DVD it STILL has to be compressed with an MPEG codec (Layer-2 I believe) to get the data off of the disc and onto a screen. MPEG codecs are most likely going to be the backbone of digital audio and video delivery for a huge number of applications, everything from digital voice transmission to DVD to live web streaming to who knows what else.
Now you get a bit better of an idea why it is so important that Quicktime is going to be part of MPEG-4. Bet El Presidente for Life Steve "Estebahn" Jobs is a bit psyched about that little fact, eh? Anyway, while the folks in the lab coats were doing this MPEG stuff, two thing were happening. First, some equally serious people trying to get audio and video streaming over the web saw these codecs as potentially good for them. Second, that both terrifying and incredibly productive creature known as the 15 year old American Male was doing what he always does--trying to break stuff, not pay for things, and collect porn. Under the category of "not pay for things" our intrepid boys discovered that you could code MPEG audio and trade it with your friends over the web. It's illegal, and so is taping a record for your friends or recording radio broadcasts. These two facts have lead to the boom in consumer MPEG players and coders.
SO CUT THE TALK, HOW DO I START
All right, the rest of the boring (but very important if you are serious) details will go on the botton of this page. The best resource I know for info on mac MPEG players is RAUM, a web site run by an MPEG enthusiast. For players, you basically choose between SoundAPP and MacAMP, though there are others. These are both cool and work well, MacAMP probably looks cooler, but SoundAPP can translate back to an AIFF or .WAV file which can be useful. Encoders are more difficult, I am lucky enough to do some contract work with people who have big time Director rigs, so I use SoundEdit as my MPEG encoder by saving sound files a .swa files. The shockwave audio format is MPEG Layer-3 with a funny name. This encoder is simply the best there is, but it isn't cheap and Macromedia has shown less and less interest in audio with each passing hour, so who knows what they plan to do with SE. I can't say that I would miss SoundEdit for any other reason, but for MPEG audio it's just the way to go, it even has a batch converter plug-in.
There are other freeware and shareware encoders, and the page link above has some info about all of them. Don't forget, most encoders won't play back the resulting files, you will need to get both ends of the spectrum.
Once you have your tools, all you need to do is get a good quality audio file to squash, and have at it. Go in with an audio editor and chop off the extra bits so you don't have too much silence or noise at the start and end of your audio file (after all we want it as small as possible), and it will wind up MUCH smaller than the original.
HOW IT WORKS
Beats the hell out of me. OK, seriously, nobody who is very important and good at designing these codecs is talking much. MPEG is a set of standards, but it leaves a lot of room for development and innovation (see my previous note about how Capitalism Works). Nobody is going to give their great ideas away, at least not to some weirdo with a pony tail who writes web articles, even if they are on Michael "Maximum Impact" 's site. What you can find out about is audio compression in general, and Sony will blabber at the mouth like Larry Ellison about ATRAC compression. All audio compression exploits the following fact. If you DON'T feed a signal into the input of a card or mac, and hit "record" in your audio app, the resulting file will be exactly as large as if you fed a pristine recording of a full orchestra playing Beethoven's Fifth. Doesn't make intuitive sense, now does it. To record a minute of silence is 10 MB/minute, just like a minute of music, even though there no real audio data to record.
The resolution of digital audio storage is governed by two factors, sample rate and word length. Frank Zappa once said that music is "wiggling air molecules," and he was exactly right. The faster they are wiggling back and forth, the higher the pitch, and the farther the swing back and forth the louder they are. That's it folks. The first is frequency and the second is amplitude. When you take a very complex set of interacting audio waves and capture them with your plain talk mic or with a $10,000 tube microphone, you turn those wigglers into an electrical current, a representation of which can be stored digitally. Later it is turned back into an electrical signal, amplified and played through whatever speakers or headphones you choose.
The frequency resolution of the audio is determined by the sample rate which, for reasons I won't bore you with, has to be twice the frequency it is representing. So the upper limit of human hearing is generally considered to be 20,000 cycles per second (Hz), so CD audio samples at a rate of about 44.1kHz, DAT samples at 48kHz. The trickiest part of digital audio is the word length, or how many bits you record. In the simplest terms, the greater the bit length, the greater the possible numbers. If you are recording 8 bit audio you have a specific number of possible amplitudes that can be described, when you record 16 bit you have many many more. 16 bits was ambitious and impressive when CD's were first released, but seems sort of pedestrian these days. 16 bit doesn't suck, but the dynamic range, or difference between the loudest and softest sound that can be accurately recorded, is not nearly as good as very high end analog audio equipment.
The layman's terminology used to explain video compression is that it records the differences between frames not each frame in its entirety. The RESULT is the same for audio (smaller files), but it is achieved very differently. Audio compression requires very complex analysis of the audio spectrum to discern what is and is not necessary for the perception of the sound. Once the audio is coded and decoded, using the standard measurement tools will tell you that the result does not look at all like original, but in theory a listener could not tell the difference.
MPEG is too much of a money maker to get too much information about specifics, but as I said, Sony is all too happy to talk about ATRAC compression. ATRAC is an audio only codec used in Mini Disc players, it only achieves a 5:1 compression ratio, but to my ears, under optimal circumstances, I can't tell the difference between a recording made on a recent MD recorder and an original. When audio is "ATRACed," the audio file is cut into pieces 512 samples long, and then split into many frequency bands, then each band is analyzed for what amount of data needs to be encocoded, and "extraneous" data is ignored while necessary data is kept. Lets look at a stupidified example. We take one big fat bass pluck from the incomparable Ray Brown. Let's say Ray is playing a REALLY fast bop tune, so it's exactly 512 samples (.00116 seconds) long. Our very simple codec would look at that note and divide the audio spectrum into two bands, high and low. Our software would see that there was lots of information in the lower band and esssentially no information in the high band. Our audio codec could store this file in roughly half the space of the original because half of the audio spectrum needs 0 bits to describe it. All audio compression works on this principle. ATRAC looks at each individual set of 512 samples, splits it up into many bands, and constantly modifies the word length to allow for the necessary information to be stored while ignoring silence and unecessary sound.
WHAT'S SPECIAL ABOUT MPEG-3 COMPRESSION?
According to what I have been led to believe, MPEG compression is similar to what I described above. The big difference between MPEG-3 on the one hand, and MPEG-2 and ATRAC compression on the other is that Layer-3 was designed from the ground up as a low bit rate codec. Earlier MPEG Layers were designed to compress as much as possible without compromising quality too much, Layer 3 was from the ground up designed as a low bandwidth transmission codec to stream video over the web and audio over phone lines and other low bandwidth systems. So, you get your compression no matter what, at a heady 12:1 ratio.
HOW DOES MPEG-3 SOUND?
Now here is the kicker, not too bad. Not great, under most circumstances, and positively awful if not done right. Not surprisingly, a clean original is absolutely necessary, your original file should be recorded very well, and have as little noise and hiss as possible. If you are starting with a noisy 8 bit file recorded with a cheap mic (or even a really good mic) don't even try to compress it, you'll just make yourself unhappy. Recently I was demonstrating some of this to a friend and potential client, we took a Neil Young tune that he knew well, and I ripped the audio directly off of the CD with Bias Peak. We then opened the file with SoundEdit 16 and used its shockwave (MPEG-3) compressor to make it an MPEG-3. This let me play the same son with MacAMP and Peak at the same time with the original and the squashed copy, and play them both back on the same system, through my very good (not great) sounding Tannoy studio monitors. The result was that in this ideal situation, the MPEG sounded almost as good as the original, not quite, but very close.
The main way these codecs can make files so much smaller is by finding the places where there is not much information in the very high end of the audio spectrum. If I play a note at 100Hz, the octave of that note will be at 200Hz, or double the original. The standard range given for human hearing is 20-20,000Hz. By doing your math, you will find that the highest octave that can be represented by CD quality audio takes up half of the spectrum, 10,000Hz to 20,000Hz. In fact, lots of multimedia production starts by unceremoniously lopping off almost all of that by not recording anything over 12K. This is OK for playing back audio over the internal speaker of your mac (which probably can't play above 12k anyway), but is still not a great thing. High frequencies are what provide our perception of location for audio. Very low frequencies are almost completely non-directional, you can place a true subwoofer anywhere in a room and it will sound the same. Remember junior high science class where we learned that bats navigate through the dark by using sound? Bats can "hear" far beyond 20,000Hz (NB: don't use MPEG compression for anything you intend to play for bats), and if they tried to move around their caves by bouncing bass frequencies off of the walls, they would run into LOTS of stuff.
The perceptive loss from the compression seems to be almost exclusively in that general range. MPEGs lose some of the stereo image, and some of the "air" present in a very good recording. Don't mistake me, they can sound very good if done right, and for the web they seem to be the only way to go, but they won't make you throw out your CD player I can assure you.
There are plenty of resources on the web for MPEG audio. A few good places to start are Berkley MPEG Research, a company called Heuris has an excellent page with explanations and information, and Columbia has a table of different audio codecs with some interesting links.
Last, don't rip off musicians. I know it's not what people want to hear, but it's a lot of work to make good music, and little pay in return for almost everyone. Listen to your music and public domain things all you want, but we need the cash, believe me.