This tutorial shows you how to incorporate basic gesture recognition into your Arduino projects using the ESP system. For example, you might recognize different tennis gestures like a forehand, backhand, and serve; elements of a dance routine; weight-lifting gestures; etc. The gestures are sensed using an accelerometer and sent to the ESP application running on your computer. ESP uses a simple machine learning algorithm to match the live accelerometer data to recorded examples of different gestures, sending a message back to the Arduino when it recognizes a gesture similar to one of the examples. The system only recognizes individual occurrences of discrete gestures; it doesn't provide information about how the gesture is performed. Still, it can be used for a wide range of interactive applications.
Download the ESP gesture recognition application:
- Windows: ESP-Gestures-Win-20161028.zip. You may also need the Visual C++ Redistributable (Update 3) from Microsoft.
- Example Gestures: ForehandBackhandServe.grt
If you're using the Arduino 101, which has a built-in accelerometer, you can skip this step. Otherwise, you'll need to connect your accelerometer. To do this, first solder male header pins onto the breakout board, if you haven't done so already. Then you'll wire up the accelerometer to the Arduino.
As a shortcut, you can plug the accelerometer breakout directly into the analog input pins of the Arduino Uno (or other Arduino with the same form factor). Then in the Arduino code, you can configure the appropriate pins to provide power and ground to the accelerometer.
Alternatively, you can plug your accelerometer into a breadboard and wire it to the Arduino, connecting its power and ground pins to the 5V and GND pins of the Arduino, and its X-, Y-, and Z-axis pins to three analog inputs of the Arduino board.
You can use one of the Arduino programs below to read data from the accelerometer and send it over serial (USB) to the computer. First, check that the pins specified in the Arduino program match the way you've wired up your accelerometer (e.g. that xpin corresponds to the analog input pin that's connected to the X-axis pin of your accelerometer). (This doesn't apply if you're using the Arduino 101, where the accelerometer is connected internally.) Then select the appropriate board and serial port from the Arduino tools menu and upload the Arduino sketch.
Open the Arduino serial monitor, set it to 9600 baud, and check that you're getting accelerometer data from your Arduino. You should see three columns of numbers that change when move the accelerometer. Use the Arduino serial plotter to see a graph of these numbers.
Be sure to close the serial monitor and serial plotter before continuing, as otherwise they'll block the ESP application from talking to your Arduino.
Download the ESP gesture recognition application:
- Windows: ESP-Gestures-Win-20160812.zip
Select the serial port corresponding to your Arduino board from the configuration menu. (Click the "Select a serial port" heading to open the list of ports.)
You should see live data streaming in on the "Raw Data" plot. The three lines on the plot correspond to the three axes of your accelerometer: the red line corresponds to the X-axis, green to the Y-axis, and blue to the Z-axis.
To allow the ESP application understand what range of values to expect from your accelerometer and Arduino, you'll need to record a calibration sample. Place your accelerometer on a flat surface, with the Z-axis facing upwards. Press and hold the "1" key for a second or so to record the "Upright" calibration sample. You should see a plot of the sample appear. Then, flip your accelerometer upside and, keeping it flat and still, hold the "2" key to record the upside down calibration sample. The ESP system uses this data to figure out which numerical values corresponds to 0g of acceleration (on the X- and Y-axes) and which correspond to 1g of acceleration (on the Z-axis).
Proceed to the "Training" tab of the ESP application by clicking on it or by typing a capital "T". This tab allows you to record or load examples of the gestures you want the ESP system to recognize. You can record examples of up to nine different gestures.
To record an gesture example, make the gesture while pressing the key corresponding to the label you want to associate with the gesture. For instance, to record a gesture with label 1, hold the "1" key on your keyboard while making the gesture. (Alternatively, you can load the example tennis gestures in ForehandBackhandServe.grt.)
Be sure to record the example gestures with the accelerometer in the same configuration as it will be later, when you want the system to recognize the gestures. For instance, you might hold the accelerometer in your hand with a particular orientation, or attach it to an object that you'll hold with a particular orientation.
A good sample contains the data corresponding to the whole gesture, but without much additional baseline data at either the start or the end. That is, the sample should start and end with a short period of relatively flat lines, neither too long nor missing altogether. In addition, if your gesture ends in a different place than it starts, be sure not to record the time when you're bringing the accelerometer back to the starting position. For instance, if you were recording a swipe right gesture, you'd want to record only the part of the gesture when your hand is moving from left to right, not the time when your hand is returning to its initial position.
Each additional example you record is another sample that the machine learning algorithm can match against when it's recognizing gestures. That means that if you want the system to recognize different variations of a gesture (e.g. the different ways in which it is made by different people), it may help to record samples of each variation. On the other hand, if you have bad samples, they may confuse the system; more samples isn't necessarily better. In general, we've had good luck recording somewhere around 5 to 10 samples for each gesture, although again, the quality of the individual samples is more important than their quantity.
If you don't like a sample (e.g. because you pressed the key at the wrong time and missed the data corresponding to part of the gesture), you can delete it by clicking in the box next to the word "delete" below the sample. You can trim a sample by clicking and dragging on the plot of the sample to select the part of the sample you want to keep, then clicking the box labelled "trim". You can navigate between the different samples in a class by clicking the arrow icons beneath the plot for the sample. If you recorded a sample in the wrong class, you can move it by clicking the "re-label" button and then pressing the key corresponding to the label to which you want to assign the sample. To name a gesture, click the "rename" button, type the name and press enter.
Once you've recorded a few example gestures, you can train the ESP system to recognize those gestures from your examples. Press the "t" key on your keyboard to train the system. You should see the message "training successful" appear at the bottom of the window. Now, when you make a gesture similar to one of your recorded examples, you should see its name appear on the plot of live sensor data.
The system may not work well the first time you train it. It's helpful to train and test the system often as you record your example gestures, so you can get a sense of how its behaving.
In particular, if the system isn't recognizing gestures you think it should, you may want to record additional examples of that gesture. If the system is recognizing gestures when it shouldn't, you may want to delete or trim examples that look different than the others, or that contain long periods of relatively flat lines. Be sure to press "t" after you modify your examples to retrain the system.
While modifying your training examples is probably the most important means of helping the system perform correctly, ESP also allows you to configure some underlying system parameters. To do this, click the "click to open configuration" label. You should see two parameters: variability and timeout. Note that after changing these parameters, you need to retrain the system by pressing "t".
The variability parameter controls how different a gesture can be from one of the recorded examples and still be recognized. The higher the number, the more different it can be. If the system seems to require your gestures to be overly similar to your recorded examples, you can try increasing this number. (You can also try recording additional examples.) If the system recognizes spurious gestures, you can try lowering this number, although you might also try deleting any bad-seeming training examples.
The timeout parameter controls how long after recognizing a gesture the system waits before recognizing a new one. It's measured in milliseconds (thousandths of a second). If the system seems to be missing gestures made in close succession to another gesture, try lowering this number. Be careful, though, because if you make this number too low, the system may recognize a single gesture multiple times. If the system seems to be recognizing multiple gestures when you only make one gesture (e.g. if it sees a forehand followed by a backhand when you only made a forehand), you might try increasing this parameter.
When the ESP system makes a prediction, it sends a message to your Arduino with the number of the gesture that it recognized (as ASCII text followed by a newline, e.g. "1\n"). By reading these predictions on the serial port, you can make your Arduino respond to the gesture in various ways.
The predictions are also sent over TCP to a server running on localhost port 5204 (in the same format as to the Arduino). For example, this could be a game written in Processing or other software. Make sure the TCP server is running before starting the ESP application.
This particular ESP application uses an algorithm called Dynamic Time Warping (or DTW). This algorithm warps the live sensor signal by eliminating or duplicating individual readings, generating an array of variations and checking to see how similar they are to the recorded samples. The algorithm looks for the training sample that's closest to the live sensor data. If the difference between the two is less than a certain threshold, it considers it a match and outputs a prediction corresponding to the training class containing that sample. You can tune the distance required using the "variability" parameter in the configuration drop-down menu.
Gesture recognition is only one domain to which the ESP system can be applied. ESP is built on top of the Gesture Recognition Toolkit (GRT), which, despite its name, actually contains a wide range of machine learning algorithms that can be applied to a wide range of real-time sensing application. ESP takes code for a particular application and translates it into a customized user interface for working with that machine learning pipeline. These application-specific programs include a GRT machine learning pipeline, specification of sensor inputs, definition of the calibration process, and specifications for the tunable parameters. We've built ESP examples for color sensing, pose detection using accelerometers, and simple audio recognition. See the ESP GitHub for more information.