Last month, we introduced our Digital Audio Recorder by looking at the basics of timer interrupts and how the Arduino Uno’s ATMEGA328P controller can be programmed to create precise timing intervals ideal for audio sampling. We also looked at the ‘double buffering’ technique that allows us to simultaneously record samples and save data to the flash card every 45microseconds.
Well, a month is a long time in tech and we’ve made some major changes and improvements – including boosting the sample rate maximum from 22.05kHz to as high as 48kHz (reliant on the microSD card). We’ll explain how, but first, we’ll look at the FAT filesystem and how we implemented Microsoft’s WAV file structure to enable the recorded file to play almost anywhere.
Like many audio file formats, WAV has a structure designed so that when the file is loaded into any suitable media player, it knows how to play it. Otherwise, the file is just random data. The key is the initial 44-byte data block known as the ‘WAV file header’, which identifies the file and holds important information on the sample size, bit depth, number of channels and so on. This is what our recorder must create as it begins recording.
Last month, we looked at the microSD card interface and basic flash card storage alignment, but before we get to creating the WAV header, we first have to understand how data is arranged on and written to the card.
Every storage device needs a file system, a way of knowing what’s stored where and arguably the most common is Microsoft’s File Allocation Table filesystem or FAT. Arduino supports file storage with the SD library using FAT16 or FAT32 structures, but to improve performance, we’re using the new-and-improved sdfatlib library.
There are two standard ways to write data to storage – you can write it as ASCII text, where each character requires one byte of storage; or you can write it in more efficient binary form. To write a WAV file, we need this second form, but before we can get started, there are other limitations we need to get around.
If we look back at the WAV file header, it contains small memory blocks or ‘chunks’, with each one defining a parameter of the audio inside, whether it’s the sample rate, file size, bit depth or so on. Some of those blocks are two-bytes wide, most are four-bytes. The problem we have with the sdfatlib library is that while it supports both ASCII and binary writing, it can only write in binary format one byte per command, so before we can write these WAVE blocks as datatypes, we have to convert them to bytes and write them to the card one byte at a time.
WAV file format
So let’s go back to the WAV file format and that 44-byte header. The wave header is written to the card in the writeWavHeader() method with the basic structure you can see in the block diagram. Here it is in detail, starting with the file ID:
- Offset 0 (4 bytes length) – ChunkID. This contains the four ASCII characters ‘RIFF’ (Resource Interchange File Format) and identifies the file.
- Offset 4 (4) – FileSize. The overall size of the file, the four bytes (32-bits) gives us a 4GB maximum filesize (written after the recording is completed).
- Offset 8 (4) – Format. The four ASCII characters ‘WAVE’ indicate standard WAV format.
The next 24-bytes tell us specific parameters of the audio:
- Offset 12 (4) – SubID. This contains the characters ‘fmt ‘ (including the space).
- Offset 16 (4) – SubSize. Sets the size of the data chunk to 16-bytes for PCM wave format.
- Offset 20 (2) – AudioFormat. Although we use four bytes here, this is set to 0x0001 indicating non-compressed PCM/WAV audio follows.
- Offset 22 (2) – Channels. The number of audio channels (in our case, one).
- Offset 24 (4) – SampleRate. The audio sample rate, written as binary.
- Offset 28 (4) – DataRate. Sets how fast data needs to be streamed and is calculated by the equation:
- DataRate = samplerate x channels x (bitspersample / 8)
- Offset 32 (2) – BlockAlign. Indicates how the audio data is aligned, so that the player knows which bytes belong to which channel.
- Offset 34 (2) – SampleBits. The number of bits in each sample (we set this to ‘8’).
The last 8-bytes set up the rest of the file:
- Offset 36 (4) – SubID2. This contains the ASCII characters ‘data’ indicating the audio data begins here.
- Offset 40 (4) – Sub2Size. The size of the audio data following, set by the equation: Size = no. of samples x no. of channels x no. of bytes per sample.
Again, this is written after recording is completed.
Having to write these datatypes a byte at a time means the order of each byte stored (called the ‘byte order’) is critical; otherwise, a datatype of ‘1’ may end up becoming ‘16777216’ if we’re not careful. Another thing we must do is keep track of the number of samples we capture, so this info can be written back to the file header after the capture is completed.
There’s a lot to do so here’s how it’s done – first, the only header info we don’t know is the filesize and data chunk size, so we start the recording process by opening up the file ‘REC00000.WAV’, write the file header info we do know and dummy-fill the spots we don’t. Once that’s done, we start capturing the audio and storing it away. Since each sample is one byte (8 bits), the number of samples we capture is also the number of bytes we store. We keep track of this number in a long-integer variable called ‘byteWriteCount’, which is incremented every sample.
When we stop capturing, we use the sdfatlib’s ‘seekSet’ command to go back to the file header, write in the filesize and datasize information and finally, close the file.
After all that is done, the file on the card is now a genuine WAV file playable on any compatible device, including phones, tablets and Windows Media Player on your PC or notebook.
Controlling the recorder
To keep things really simple, our digital audio recorder has just two buttons – Record and Stop. The Arduino continually monitors two of its digital input lines (D5 and D6) and when the Record button (D6) is pressed, the recording function begins. When the Stop button (D5) is pressed, we deactivate the timer interrupt that launches the interrupt service routine (ISR) capturing the samples, write out the information to the WAV file header and close the file.
To give you some basic indication of what the recorder is doing, two LEDs light up, one for ‘recording’ and the other for ‘stopped’. Any problem with the card on boot results in the LEDs flashing.
We’ve used the Arduino DIY prototyping shield to house the microSD card module, buttons, LEDs and audio input circuitry. It’s a tight squeeze but keeps things compact.
Again to simplify things, the audio input circuit just contains a resistor divider to bias the ADC input to half the supply rail – this ensures that we accurately capture the positive (samples 128 to 255) and negative (samples 128 to 0) halves of the audio waveform. You’ll need to feed it with a high-level audio input from your PC’s line output level or phone/tablet’s headphone socket.
We’ve used a 3.5mm panel socket here and a 3.5mm male-to-male cable will connect those devices to the recorder.
Modifying the buffer
The microSD flash card reader module is a new release you’ll find on eBay for around $5 – it supports any SDHC card up to 32GB and contains built-in level translation to convert the 5V signals from the Arduino to 3.3V level for the flash card.
From last month, you’ll remember that we’re forced to use the microSD card’s one-bit SPI (serial peripheral interface) bus rather than the faster four-bit parallel mode, simply because that’s all the Arduino Uno supports. We implemented this alongside a double 512-byte buffering system that records samples into one buffer while the other buffer is written to the card.
However, we found this month that we were temporarily losing samples after 40 seconds or so of recording. The reason wasn’t the recording mechanism itself, but the microSD card’s SPI bus speed and how data is written to the flash storage.
With a 22.05kHz sample rate, the two 512-byte buffer blocks each provide 512/22050 or 23.2milliseconds of audio storage. We benchmarked the write time for 512-byte blocks to the card and on average, it took just three milliseconds (3ms), so there was no problem. Or so we thought.
However, on occasions, we discovered it was taking the microSD card reader as long as 65milliseconds to write the same-sized block – three times as long as the buffer itself can handle and that’s where the samples were being lost.
But we also found that if we dropped the buffer size down to 128 bytes, the average write time dropped to just 0.5milliseconds, with some as fast as 88microseconds. Unfortunately, the occasional write still took up to 65milliseconds to complete, but there were far fewer of these.
So the solution was to redesign the two 512-byte buffer blocks into an eight-block ring buffer that still gave us the maximum buffer size but with improved write speeds.
Setting the sample rate
Not only that, the faster performance means we’ve been able to incorporate an adjustable sample rate, up from the original locked 22.05kHz to as high as you can get away with depending on your microSD card (48kHz maximum). You set it in the source code (it’s pretty obvious where) before you flash your Arduino Uno board with it.
Because of variations in microSD cards, the maximum sample rate you can achieve depends more on the card than on the ATMEGA328P controller. We’ve tested it and captured sample-stable recordings up to 48kHz.
But here’s the key – if the card’s SPI bus write speed is too slow, you’ll hear ‘skips’ in your recording where samples haven’t been written to the card in time. Unfortunately, we can’t tell you which cards are faster since it depends on the card’s controller and flash write speed. You have to ‘suck it and see’.
However, we’ve found in general, that smaller-capacity cards formatted with a larger ‘sector’ or allocation unit size (AUS) seem to offer faster performance in SPI mode. We tested this with 4GB and 8GB cards formatted with 1KB and 32KB AUSs – the 32KB AUS-formatted option delivered faster write speeds all-round. Formatting the card just before use can also make a difference.
Given how few resources we have, and now with higher sample rates, the audio quality is surprisingly good! With only 8-bit sample depth, the signal-to-noise ratio (SNR) is roughly around 40dB (cassette tape), so there’s a bit of background hiss, but if you record audio that has serious dynamic range compression, you don’t notice it (AC/DC’s Back in Black, for example, sounds great).
Sure, it’s not CD-quality, but the techniques here are just the same – and that still makes this a seriously good way to learn how digital audio works, from analog sampling through to file storage.
Really, the only reason we run into any issues at all is simply the ATMEGA328P’s lack of RAM. The chip is fast enough – it just doesn’t have enough RAM to compensate for the occasional few slow card block-writes, but there are ways to solve this.
One simple fix is to reduce the sample rate, albeit at the cost of audio quality. A more complex lossless option is adding external RAM, such as a 23K640 64KB SPI RAM chip. We could also swap the Arduino Uno for the up-rated Arduino Mega2560 with its 8KB of RAM.
Obviously we are right on the edge of the ATMEGA328P’s capabilities here, but slow that sample rate right down and there’s an application that’s right in the Arduino’s wheelhouse – data logging. That data can be anything from weather measurements to a digital voltmeter or even a security system. The ATMEGA328P’s 10-bit ADC range means you can capture analog data down to 0.1% of full scale, or you can just record changes on any of the digital inputs.
And that’s what makes Arduino so cool – it has so many toys to play with, there’s always a new idea just around the corner.
As usual, you’ll find the source code for our Digital Audio Recorder at our Arduino webpage. While it has been tested, the code comes ‘as is’ with no warranty of any kind.