Thursday, June 5, 2014

The magic of speech synthesis: linear predictive coding

Growing up in the '80s and '90s, I had a pretty decent idea how a lot of tech around me worked. Maybe I couldn't actually fix a TV with a blown tube or swap out a dead (soldered) CPU on a motherboard yet, but I knew how the big pieces fit together, what they were supposed to do, and what might happen if a given piece went kaput.

Speech synthesizers were not in that category.

When I first encountered a Speak 'N' Spell, it seemed like magic. The voice was so crude and inhuman it was obviously computer-generated (i.e., not recorded). It was halting and seemingly stitched together from scraps of speech, but I'd never even heard of phonemes, let alone a process by which a chip like the one I found inside could spit out words and phrases.

For a long time, I had an inordinate fascination with the SnS, the General Instruments SP0256-AL2, and the speech synthesis cartridges for the TI-99/4A and TRS-80. (Wasn't there a C64 speech cartridge too?) I never did find out much about how they worked, though, or get my hands on hardware to experiment.

Linear Predictive Coding: Speech Analysis, Synthesis, Compression

Fast-forward 20 years or so to DSP class... and it turns out that most of those devices, along with a healthy amount of speech synthesis today, is based on variants of the linear predictive coding (LPC) technique. For my class project, I worked up an LPC example in Matlab to peek under the hood.

LPC models the human vocal tract as a medium-order time-varying filter (typically 10th-order) excited by pitched and unpitched (noise) impulses created by the diaphragm and vocal cords. A speech sequence (e.g., a word) is created from a train of impulses filtered with changing filter coefficients and gain.

LPC discretizes speech into overlapping frames of 10-20 ms, where the filter coefficients, gain, and impulse type and pitch are constant for a given  frame.

LPC is most commonly used as a compression scheme: speech is analyzed to estimate frame parameters, the frame parameters are transmitted using far fewer bits than the original speech, and the parameters are applied to a filter and impulse train in the receiver to synthesize output speech.

The figure shows data from the whole process. From the top, there's the filtered input audio, the detected pitch period in samples for each frame, the resulting excitation signals (pulse trains in green, noise in blue) and gains, and the final synthesized output.

 
Conclusions

Basic LPC turned out to be easier and more interesting to implement than I expected... considering that I didn't write custom code for everything and that I did leave out quite a bit of work that would normally be required to tune up the sound quality, optimize computing time, and/or achieve compression specs. (Here's a great writeup on all the work that went into the Speak 'N' Spell.)

A few samples of the output:


video

It's pretty cool to be able to pull speech apart, in a sense, and put it back together any way you like. I'm interested in experimenting with my code to create interesting musical textures, including vocoding by replacing the impulse train with audio from a musical instrument.

Code is here!

Friday, January 24, 2014

Lego Segway with minimal-order observer control

Self-balancing Lego robots are nothing new, but everyone uses PID controllers. I wanted to implement an observer controller to do something new and flex my controls muscles. 

I built a Mindstorms robot that uses a light sensor to measure light reflected off the floor and thereby the robot's tilt. This turned out to be finicky since I had to set the zero point manually, and ambient light variations screwed things up fairly often. It worked well enough in the end though.


Controller Design
A full-order observer controller uses a model of the system in the control loop, which allows us to observe state information that would otherwise be hidden in the actual system. We can then use that state info in the feedback to reduce the error, which now incorporates both the system and model outputs. This can be a robust way to control high-dimensional systems while also being able to inspect the (estimated) states for useful insights.

However, we may not actually need all the state information. A minimal-order observer (aka functional observer) still uses a model, but requires fewer poles to be chosen than a full-order controller. That simplifies design and eliminates the need to calculate and compute state-space transformation matrices.

The figure shows the minimal-order observer, with the controller elements labeled as psi 0 and psi 1. In the lower diagram, psi 0 is algebraically combined with the summation block to simplify coding. As noted, each psi function is a ratio of (simple) Z-domain transfer polynomials.

Minimal-order diagram in Simulink. In the actual system, the real robot takes the place of the "Linearized Model".
Results
I coded the observer controller in RobotC with the help of a couple of Matlab scripts to choose poles and calculate the coefficients of the transfer polynomials. I could have put more work into accurately modeling the robot (weighing it properly, etc.), but as you can see, it works well enough.

video

The video's a bit long, to show the balancing stability - skip to 1:30 to see me driving the robot with a joystick over Bluetooth. Driving could use some smoothing, but it's fun.

Code is here.
            

Sunday, June 16, 2013

Tunes are go!

Holy Roland, I can't believe it's taken me until 2013 to move my music hosting off MySpace! The only thing more embarrassing is that people have invested money in MySpace in the interim... good luck Justin.

Anyway, I've got three albums up: Singlestar (the latest) along with collections of tracks for both film music and older stuff. The site is here, but I've also embedded players below. Enjoy!

Krylenko (Bandcamp)

Singlestar


Composed - Music for Film


Collected 1999-2009

Wednesday, June 12, 2013

Remixing a mixer

Is there a recording musician who hasn't owned a Behringer mixer? They're cheap as chips and do what they say on the tin.

I'm surprised my current model is only the second I've owned in 15 years of mucking about with music gear. It's a tiny thing, but just about perfect for the space I have and inputs I need. That said, it didn't come with an aux send. Those are super-useful, especially with my new spring reverb, so I decided to add one.

A bit of parts diving, soldering, and gluing later and I've got a mono out, stereo return aux bus. Had to scrap the tape I/O but don't think I'll be missing it. Here's a pic of this truly classic Junkbox Raider mod:

I could have made it uglier, but I ran out of time.

 

Monday, January 28, 2013

I broke a what?!?!

I've busted a lot of stuff over the years - mostly the poorly constructed and therefore delicate projects I'd built, but also plenty of electronic components, hardware, circuit boards, etc. I've even broken and bent a few small tools.

But until yesterday I'd never snapped off half a pair of needlenose pliers so cleanly it looked like they'd been sawed apart. How's that even possible? (Sure, my finger strength is unparalleled, but I wield it gently. :)

I'd post a picture, but I can't be bothered to dig 'em out from under the pile of Robosapien discards clogging up the trash. Trust this random guy on the Internet, though - it really happened.

Maximum information, minimum post

I've been planning for a while to write up some research I worked on in 2011 involving intrinsic "motivation" for robots. We got a workshop paper out of it, and I presented the results to the ECE department last year. I also planned to extend it into my thesis project.

But... the lab went through some advisor round-robin and the project fell apart, and I just don't feel like writing it up into a full post anymore.

In a nutshell, our robot learned a policy for a partially observable Markov decision process (POMDP) to learn about objects in a space by manipulating them with its arm, then assigning object classification probabilities, with Shannon information gain across all objects as the learning reward.

Here's the AAAI workshop abstract, with a link to the full PDF:
http://www.aaai.org/ocs/index.php/WS/AAAIW11/paper/view/3960

Here's a fun picture of the robot!

Sunday, December 18, 2011

Cricket, the toy robot that never was

I decided earlier this year to build a robot as a gift for a young relative. I've always found Braitenberg vehicles interesting and wanted to create a mobile robot with simple sensors and the ability to switch among several Braitenberg-type "personalities" (light-following, sound-avoiding, etc.).

Thus Cricket was born.

Cricket with, well, some things working.

Turns out I underestimated the chaos of the target environment, with multiple even younger siblings running around. Only a totally bombproof gift would work - which Cricket is not.

That, plus some irritating bugs I don't feel like ironing out, means Cricket is now abandonware. But not forgotten!


Testing the light pods.

Full details and more pics after the jump.