Ubuntu - Sound - Introduction to mp3
The high-level structure of an MP3 file is quite simple: a sequence of blocks of data, each of them with a well defined function and format.
Some of the blocks are formally known as “headers” or “tags”. Here's the list of common streams:
- MPEG audio stream. This is the most important, because it contains the actual audio data. Normally it's also the biggest. MP3 files should contain “MPEG 1 Layer III” data, but sometimes they have something different (e.g. “MPEG 1 Layer II”).
- ID3V1 tag. This is an old and quite limited tag that stores track information (artist, title, …)
- ID3V2 tag. Also stores track information, but it is a lot more flexible than ID3V1 and allows storing of many things that can't be stored in ID3V1, like cover art, rating, or composer. An ID3V2 tag is made up of a header and a number of frames, each frame storing a particular value (artist, title, …) There are several versions of it. ID3V2.3.0 (which seems the most popular), ID3V2.4.0.
- Ape tag. Another tag that stores track information. This tag is also used to store normalization info.
- Lyrics tag. Track info, lyrics, more. Basic fields like artist name or track title.
- Xing header. Information about the audio stream (size, quality, …). Some players get confused by VBR files that don't have a Xing header, so it should be added if it's missing.
- LAME header. An improved version of the Xing header, which is found in files created with the popular LAME encoder.
- VBRI header. Also for handling VBR files, but many tools don't recognize it, so switching to a Xing header is usually a good idea.
- Null stream. A sequence of zeroes that doesn't belong to other blocks. MP3 files shouldn't contain null streams.
There shouldn't be multiple instances of a stream type in a file (one audio and at most one ID3V1, at most one ID3V2, … )
There are some restrictions regarding the order in which these streams may come. Among them:
- ID3V2.3.0 must be at the beginning of the file
- ID3V2.4.0 may be at the beginning or at the end of the file
- The Xing, LAME and VBRI headers should be located right before the audio stream (so there may be only one of them in a file)
- An MPEG audio stream is divided into thousands of frames, each frame containing the audio data for a small part of the whole song.
One important characteristic of an MPEG audio stream is its bitrate, meaning the amount of memory or disk space that is allocated to a second of sound. Bitrates are normally measured in kilobits per second (kbps) and values from 128 to 256 or 320 are usually used in MP3 files. Roughly speaking, the higher the bitrate - the better the quality, but there's more to it. There are various encoders (programs or libraries that create MP3 audio) and some of them are better than others at the same bitrate, meaning that given some input (usually an audio CD or a WAV file) both produce files of a similar size, but one may sound better. An encoder may accept parameters that define a tradeoff between the quality of the MP3 it creates and the time it spends to do it. Also, some encoders may be better than others only for some kinds of music.
Another thing worth mentioning is that there are 2 kinds of audio: constant bitrate (CBR) and variable bitrate (VBR). With CBR all the frames are compressed with the same bitrate, while with VBR the bitrate may differ from one frame to another. The idea of doing VBR comes from the observation that some parts of a song may sound pretty good when they are compressed at 128kbps, while other parts need 320kbps. There are 2 conflicting needs: on the one hand, we want the files encoded at a high bitrate, so they sound better, yet on the other hand we want them at a low bitrate, so they take less space. VBR offers a nice compromise, by encoding various parts of the file at various bitrates, so on average the whole file has a lower bitrate than a similarly-sounding CBR file.
While VBR files are nice, they introduce another opportunity for encoders and other tools to mess things up, and some of them gladly take advantage of the offer. While the VBR audio data might be OK, many players need some VBR header in order to play that audio correctly and will misbehave if that header is missing or incorrect. Symptoms may include showing an incorrect duration for a song, being unable to seek, stopping playing a song before it finishes or waiting a lot before playing the next song. Anyway, other players handle the same files without any problem (because figuring out what should have been in the header isn't that hard), so you may rip one CD, try the MP3s on some player and think that everything went OK, and then try another player only to find out that it doesn't like them.
There are 3 kinds of VBR headers: Xing, LAME, and VBRI. mp3 files should not have an incorrect or missing VBR header.
Another issue with CDs and audio files in general is that different CDs might have quite different perceived loudness. This gets especially annoying if playing in shuffle / random mode, when you constantly have to reach for your volume control to turn it up or down. This can be nicely taken care of in MP3s, by using “normalization”, a process through which the perceived loudness of a file can be adjusted to some standard level (actually this can be done one file at a time or one album at a time.) It's a good idea to normalize your collection, to eliminate the need to turn the volume up and down all the time.