Monday, December 8, 2014

Cricket part 2: Robot --> drum synth

One of my earlier posts detailed my abandonded plans for a kids' robot I called Cricket. But the brain module worked just fine and featured a bunch of pots for real-time control, so why not turn it into a drum synthesizer?? It already had a sound system (amp and speaker) and a big yellow arcade button as a trigger.

Cricket the drum synth: The Cricket brain module with more pots, no motors.

The brain module was packed full of cables, controls, and LED diffusion pods, so my first goal was not to modify any circuitry. (I added pots and switches to the front panel, but they plugged into existing headers.)

Second, I didn't want it to take forever, so I kept the feature list simple:
  • Two digital oscillators with selectable waveforms (saw/triangle/noise/50% square) and independent pitch controls
  • Two-stage (attack and release) amplitude envelope
  • Global pitch LFO with controls for speed, depth, and waveform (same choices as the oscillators)
  • Osc 2 -> Osc 1 frequency modulation, with adjustable depth and a high/low pitch range switch for Osc 2
  • Selectable AND/OR/XORing of the oscillators with each other
  • Digital wrapping/clipping distortion with adjustable depth  
(Yes, there's no filter. Deal with it. :) 

I had a switch left over, so it selects the direction the LEDs light up while Cricket is running. There's also an "in" jack that I might in future use as a footswitch or external trigger input. (The MIDI jack isn't functional.)

I'm really pleased with the sound! It's digital and raw, but also organic and surprisingly varied. You can get kicks, snares, metallic plinks, noise bursts, bass sounds, and even vocal-like screams and yawps. I got what I wanted and then some... pretty good for 8-bit waveforms pumped out of a single pin of the dsPIC using 6-bit PWM. The video below is pure multitracked Cricket -- no processing except for a bit of autopanning:


Here's another video with a more detailed exploration of the features and sound:


Thursday, June 5, 2014

The magic of speech synthesis: linear predictive coding

Growing up in the '80s and '90s, I had a pretty decent idea how a lot of tech around me worked. Maybe I couldn't actually fix a TV with a blown tube or swap out a dead (soldered) CPU on a motherboard yet, but I knew how the big pieces fit together, what they were supposed to do, and what might happen if a given piece went kaput.

Speech synthesizers were not in that category.

When I first encountered a Speak 'N' Spell, it seemed like magic. The voice was so crude and inhuman it was obviously computer-generated (i.e., not recorded). It was halting and seemingly stitched together from scraps of speech, but I'd never even heard of phonemes, let alone a process by which a chip like the one I found inside could spit out words and phrases.

For a long time, I had an inordinate fascination with the SnS, the General Instruments SP0256-AL2, and the speech synthesis cartridges for the TI-99/4A and TRS-80. (Wasn't there a C64 speech cartridge too?) I never did find out much about how they worked, though, or get my hands on hardware to experiment.

Linear Predictive Coding: Speech Analysis, Synthesis, Compression

Fast-forward 20 years or so to DSP class... and it turns out that most of those devices, along with a healthy amount of speech synthesis today, is based on variants of the linear predictive coding (LPC) technique. For my class project, I worked up an LPC example in Matlab to peek under the hood.

LPC models the human vocal tract as a medium-order time-varying filter (typically 10th-order) excited by pitched and unpitched (noise) impulses created by the diaphragm and vocal cords. A speech sequence (e.g., a word) is created from a train of impulses filtered with changing filter coefficients and gain.

LPC discretizes speech into overlapping frames of 10-20 ms, where the filter coefficients, gain, and impulse type and pitch are constant for a given  frame.

LPC is most commonly used as a compression scheme: speech is analyzed to estimate frame parameters, the frame parameters are transmitted using far fewer bits than the original speech, and the parameters are applied to a filter and impulse train in the receiver to synthesize output speech.

The figure shows data from the whole process. From the top, there's the filtered input audio, the detected pitch period in samples for each frame, the resulting excitation signals (pulse trains in green, noise in blue) and gains, and the final synthesized output.


Basic LPC turned out to be easier and more interesting to implement than I expected... considering that I didn't write custom code for everything and that I did leave out quite a bit of work that would normally be required to tune up the sound quality, optimize computing time, and/or achieve compression specs. (Here's a great writeup on all the work that went into the Speak 'N' Spell.)

A few samples of the output:

It's pretty cool to be able to pull speech apart, in a sense, and put it back together any way you like. I'm interested in experimenting with my code to create interesting musical textures, including vocoding by replacing the impulse train with audio from a musical instrument.

Code is here!

Friday, January 24, 2014

Lego Segway with minimal-order observer control

Self-balancing Lego robots are nothing new, but everyone uses PID controllers. I wanted to implement an observer controller to do something new and flex my controls muscles. 

I built a Mindstorms robot that uses a light sensor to measure light reflected off the floor and thereby the robot's tilt. This turned out to be finicky since I had to set the zero point manually, and ambient light variations screwed things up fairly often. It worked well enough in the end though.

Controller Design
A full-order observer controller uses a model of the system in the control loop, which allows us to observe state information that would otherwise be hidden in the actual system. We can then use that state info in the feedback to reduce the error, which now incorporates both the system and model outputs. This can be a robust way to control high-dimensional systems while also being able to inspect the (estimated) states for useful insights.

However, we may not actually need all the state information. A minimal-order observer (aka functional observer) still uses a model, but requires fewer poles to be chosen than a full-order controller. That simplifies design and eliminates the need to calculate and compute state-space transformation matrices.

The figure shows the minimal-order observer, with the controller elements labeled as psi 0 and psi 1. In the lower diagram, psi 0 is algebraically combined with the summation block to simplify coding. As noted, each psi function is a ratio of (simple) Z-domain transfer polynomials.

Minimal-order diagram in Simulink. In the actual system, the real robot takes the place of the "Linearized Model".
I coded the observer controller in RobotC with the help of a couple of Matlab scripts to choose poles and calculate the coefficients of the transfer polynomials. I could have put more work into accurately modeling the robot (weighing it properly, etc.), but as you can see, it works well enough.

The video's a bit long, to show the balancing stability - skip to 1:30 to see me driving the robot with a joystick over Bluetooth. Driving could use some smoothing, but it's fun.

Code is here.