Databending Part 2—Hacking MP3s
All posts in this series: Part 1, Part 2, Part 3, Part 4, Part 5
Update: see part 3 for Python code to speed this process up!
I recently posted about “databending,” which includes importing raw data into Audacity to make glitchy noises, changing the data in an image using a text/hex editor, and many other ways of creatively reinterpreting/damaging data. Since writing that post, I've learned some more fun ways of creating glitchy sounds with data, and I'll be discussing that today.
Hacking MP3s
Composer Yasunao Tone has a series of “MP3 Deviation” pieces that I like. Because his previous work with damaged CDs (YouTube link) derives its sounds directly from digital audio errors, I was expecting these pieces to do the same, but as the Bandcamp page above notes, Tone found errors between the MP3 encoder and decoder to be “not satisfactory,” and instead, these “MP3 Deviation” pieces use different MP3 errors to trigger different sample playback lengths. The pieces are full of cool sounds, but the process is not particularly connected to MP3s, and I've remained interested in seeing what's possible directly listening to MP3 errors.
I've done some reading off and on about MPEG compression (i.e., the family of formats that includes MP3) from a few DSP textbooks [1] [2] and managed to get the GSM 06.10 2G cell phone codec running in a plugin, but I hadn't previously figured out how to get the MP3 codec to glitch. It turned out to be easier than I assumed! After having Nicolas Collins' book for a while, I recently realized that Nick Briz has a chapter in it on databending, [3] and among other things, Briz writes about hacking MP3s in a hex editor. To explain how this works, I'll first cover a bit of background about MP3s.
How MP3s Work
MP3s use lossy compression. Compression summarizes the data to reduce storage space, and lossy compression additionally discards or makes an inexact summary of some of the data that is less perceptually relevant. An MP3 breaks down the audio into short chunks or “frames”; analyzes the frequencies present in each frame; determines which frequencies are “masked” by others, and thus less perceptually relevant; and based on which sounds are most relevant, allocates different numbers of bits to represent the loudness of each frequency.
The most important part of this for trying to glitch up an MP3 is that there is a header at the beginning of each frame that contains data about the file and must be left intact in order for the file to be readable. Additionally, there is a larger header at the beginning of the file, which must also be left intact. This where the Nick Briz article comes in. He goes into detail about what the header looks like, how to find it, and different ways the header may vary.
Before we get into how to do this, we need some background. First, because we will be looking at the raw binary data of the MP3 file, it's best to use a hex editor. This is an editor that represents raw binary data in hexadecimal or base-16. I use Hex Fiend on Mac, and ImHex is a popular one that works on Mac, Windows, and Linux. If you're familiar with hexadecimal, you can skip the next paragraph. Otherwise, let's take a moment to explain the hexadecimal number system.
For reference, the number “100” in base-10 (i.e., the usual number system we use) means one in the hundreds place (10^2), zero in the tens place (10^1), and zero in the ones place (10^0). In hexadecimal, “100” is equivalent to 256 in base-10—one in the 256s place (16^2), zero in the 16s place (16^1), and zero in the ones place (16^0). To represent values greater than 9, hexadecimal adds the letters A-F to represent 10-15. For example, “FF” is equivalent to 255 in base-10—“F” (or 15) in the 16s place, and “F” (or 15) in the ones place. Hexadecimal is a useful way to work with binary values since each hexadecimal digit always represents 4 binary digits—half a byte, sometimes called a “nibble”—making numbers much easier to read.
Now that we know what these values mean, let's take a look at the hex values in a typical MP3 header. The following table is taken from the Nick Briz article, [4] and it shows the meaning of the example header FF FB A0 40:
| Hex | Binary | Meaning |
|---|---|---|
| F | 1111 | All four bits are part of the MP3 sync code (used to find the header). |
| F | 1111 | All four bits are part of the MP3 sync code. |
| F | 1111 | The first three bits are part of the MP3 sync code. The last bit, in combination with the next bit below (i.e., 11), tells us which MPEG version this was encoded with. In this case 11 translates to MPEG version 1. |
| B | 1011 | The first is used to determine the MPEG version (see prior), the second and third bit tell us the layer (i.e., 01, which is Layer 3), and the last bit tells us if there is copy protection (i.e., 1). In this case there is no protection; if there was the last bit would be a 0. |
| A | 1010 | This byte [ed: should say “nibble”] (all four bits) tell us the bitrate; in this case 1010 is a bitrate of 160 kbps. |
| 0 | 0000 | This byte tells us the sample rate, in this case 0000 is a sample rate of 44,100 Hz. Had it been 0100, this would be a sample rate of 48,000 Hz, or 1000 would be a sample rate of 32,000 Hz. |
| 4 | 0100 | The first two bits contain channel information; in this case 01 means Join Stereo. When set to Joint Stereo (like this example), the latter two bits tell us the mode of joint stereo. |
| 0 | 0000 | The first bit tells us if the MP3 file has a copyright (0 means it does not), the next bit tells us if it’s a copy of the original file or not (0 means it is). The last two bits tell us if there are emphasized frequencies (00 means there are not). |
Most MP3 headers will begin with FFFB, as shown here. Briz gives a long list of alternative (and rarer) headers, but in short, all begin with either FFF or FFE. The important things to do when hacking an MP3 are:
- Find the second header in the file (e.g., use ctrl + F/cmd + F). It will be 8 hex digits long and will start with FFF or FFE. The first header will be the one for the entire file. It will (as far as I understand) start with the same values as the frame header, but the data after these 4 hex digits is also important and should be left alone.
- Do not alter the header or change the number of hex values in a frame. Instead replace values outside the header with an equal number of hex values.
- Repeat this process for each subsequent frame header.
One thing to note is that some MP3s are variable bitrate, and since the 5th hex digit in the header tells the bitrate, this digit will change between headers. On one file I tested, this tripped me up for a little while.
Notes on the Process
Here are some of my observations:
- It's usually enough to replace 1 string of 8 characters (i.e., 4 bytes) per frame, and it's not necessary to mangle each frame. Jumping around and listening to the result, and returning to a spot if it needs more glitching helps keep me from getting bogged down.
- It tends to produce better results to use smaller numbers in each byte (i.e., each pair of digits). Smaller numbers correlate to lower amplitudes for the frequencies in each frame, and this tends to sound like watery burbling. Higher numbers tend to give bursts of white noise.
- It doesn't seem to matter too much if you repeat strings of numbers. A strategy that's worked well is to make a sequence of 8 hexadecimal digits with mostly smaller values in each byte and repeatedly paste it in as a replacement for one string with higher values every few frames.
My Results
First, here is a short “un-glitched” MP3 file I used as the source for this process:
Here is the same file after glitching:
I like the weird bubbling quality, and especially the high chirps and clicks. As far as I can tell, editing near the beginning or end of the frame should get different frequencies, and I may try getting more of those high chirps using this information.
Improving the Process
My results here are extremely short, and the process of doing this by hand makes me feel like Ben Wyatt making claymation. For a 160kbps MP3, not counting the headers, there should be 40,000 hex digits per second (160,000 bits divided by 4 bits per hex digit), so editing these by hand is beyond tedious. In addition, it would be nice to be able to audition a few different glitched versions of a file and pick the best one—this process feels like poking around in the dark since I don't fully know what will happen until I work for a while and listen back to my results. It would be great if I were able to automate some of this.
I had a look around, and it looks like it isn't too hard to work with binary data as hex in Python. This Stack Overflow answer mentions that the built in bytes object contains a .hex() method and suggests the following code. This example is for opening genome data, but I imagine something similar could work for an MP3:
with open('data.geno', 'rb') as f:
hexdata = f.read().hex()
I haven't done too much with Python—most of my coding is with JS, C++, or a bit of Rust—so if anyone has suggestions on working with MP3s I would love to hear them! My general plan is as follows:
- Import the MP3 as hexadecimal (as shown above).
- Split up the hex data at the frame headers and put the frames into an array.
- For each frame, start after the header and randomly replace values. It might be nice to change the probability of replacing a value based on how far through the frame I am—as noted before, this should make the glitches tend to be higher or lower in pitch.
- Reassemble the frames and export as an MP3 again.
I will have a go at this soon, and if I get anywhere, I will do another writeup of my results. I hope to see you then, and I would love to hear if you try any of this!
Udo Zölzer, Digital Audio Signal Processing (John Wiley & Sons, 2022). ↩︎
Li Tan and Jean Jiang, Digital Signal Processing: Fundamentals and Applications (Academic Press, 2018). ↩︎
Nick Briz, “Data Hacking: The Foundations of Glitch Art,” in Handmade Electronic Music, 3rd ed. (Routledge, 2020), 377–90. ↩︎
Ibid., 386 ↩︎
---END OF TRANSMISSION---
Leave a Comment