A super efficient audio routine written in pure pic12f508 mpasmx assembly! Using the power of a pic12f508, a 24lc256 eeprom, and a few nights of staying up 'till 1am
- Runs off a 4MHz PIC12F508 at 3.3v
- 12.5KHz 4-bit PCM audio
- Stored on a 32KiB I2C EEPROM (24lc256)
- Max 5.25s of audio
- Interleaved unrolled PCM+I2C read loop, each PWM cycle takes 5 instructions; 3 for PWM and 2 for the rest of the logic
- Literally the most efficient code I've ever written, and the most I've ever planned a project. Seriously it ran first try!
So I was watching a funny Bringus Studios video where he made a Half Life bottle opener based on a meme (see Inspirations for more detail), and I noticed he was using a full ESP32. After seeing that, I remarked to myself that an ESP32 is wayyyy overpowered for something as basic as playing crappy audio off a button press, and that it could be done with a way less capable MCU. Then, after I was able to get my hands on some pic12f508's for fun, the idea suddenly appeared to me, make a microcontroller with only 512 instructions, ~25 bytes of RAM and a 4MHz clock speed play crappy audio. Challenge accepted.
From then on, I read the datasheet for the 12f508 a bunch, understanding how the assembly language worked, realizing that I actually only have 1 million instructions per second instead of 4 because each instruction takes 4 cycles, writing an emulator for it too, and studied up on how audio is stored and played. From then on, I was able to sketch out how I'd handle the code.
My initial naieve approach was to separate it into different functions, one for reading the next sample from the EEPROM and one for playing the sample. This was dumb and bad, my mind wasn't thinking Low Level enough. First of all, there's only a 2-deep hardware call stack, so splitting stuff into functions really wouldn't work well since I'd likely overflow the stack and get lost. Second of all, and most damning, is that the 12f508 is both slow and lacks hardware I2C and PWM support, meaning both would have to be bit-banged together. This means that while I'm reading the next sample, the current one isn't being played, and while I'm playing the current sample the next one isn't being read, leading to a ton of wasted time and also the audio probably being really really bad. From then on, I kinda dropped the project, Uni was picking up and my mind was elsewhere. Then, suddenly, I had a really funny idea while I was studying CPU architecture in an elective; what if I interleave the EEPROM reading and PWM cycles, so we don't waste time waiting?
From then on, I was very focused on the project, it finally felt like I had a concrete way of getting this done. I spent a few nights up quite late (well, 1am) hamming out the design of how I'd actually make this. Thankfully, Microchip are the best ever and make really good instruction sets! I was able to sketch out an unrolled loop of 32 cycles, where 3 would be spent incrementing and doing logic for the PWM, and the next two would be either padding for consistent timing, or other control logic. Some cycles it would clock and pulse in the next bit, some it would wait, some it would switch to the next half-sample, but it does it all in 5 cycles, giving me an audio sampling rate of 12500Hz.
Part of the logic that really stumped me at one point was an efficient way to read the next bit into the buffer, I just couldn't get it down to 2 cycles to fit inside one loop cycle. And then, I remembered the rotate instructions. They're kinda strange, I always thought they'd be better off replaced by regular shift instructions, and that the carry mechanic would just be annoying to work with, but this time they were my saving grace! If I set the SDA EEPROM line to the lowest bit of the GPIO (GP0), I can rotate it right to read it into carry, and rotate it left into the buffer to push it to the start. Eurika! From there, the implementation went pretty smooth, bar a couple hitches. One of which was when I was having trouble incrementing the w register in one cycle, as there's no instruction to increment w, so with all odds against me I decided to store 1 as a constant in memory and use the ADDWF instruction instead. Then that lead to a second problem where I accidentally made that memory location 0x1 instead of storing the value 0x1, meaning it was actually incrementing the sample by the Timer0, absolutely destroying the audio.
[ TODO, along with the rest of the readme... ]