A software engineer's life: How to build a mobile robot with less than 200 €

Many of you already know the revolution that Raspberry PI has brought to us in last years, allowing everyone to develop low-cost pervasive computing applications. In this post I would like to show you how to build a mobile robot with less than 200€, presenting the personal project I'm working on called ARPA (Autonomous Raspberry Pi Agent). You can fork the repo here: https://github.com/cecchisandrone/raspberrypi.

Demo video

The main features this robot has are:

RaspberryPi powered :)
Four wheels chassis
Manual control through Xbox 360 wireless controller
Simple obstacle avoiding using sonar sensors and magnetometer
Front camera with pan-tilt, led lights, face tracking and remote web interface
Microphone and active speaker to allow both voice recognition and TTS functionalities

Parts list

I've bought the majority of the robot parts from Amazon and eBay, spending a total of more or less 200€. This is the complete list:

Chassis	26.1	€
RaspberryPI B+	34.99	€
Sonar	1.78	€
Pantilt Camera mount	2.99	€
Servo	6	€
Wifi USB	7.9	€
PiCamera	26.19	€
Motor Driver	4.48	€
MicroSD	4.5	€
Audio cable	1.88	€
Breakout kit	8	€

Speaker	5	€
USB Microphone	1.5	€
Magnetometer	1.41	€
DSI cable	7	€
Mount	6	€
USB battery	15	€
Various Cable & components	5	€
Total		165.72.€

Implementations details

My background is mainly oriented in Java and Web technologies so first I wanted to reuse my knowledge to do something fast. Indeed that's the case since most part of the code is written in Java with a little part done for the face tracking written in Python (because of the better performance of the OpenCV bindings respect to the Java ones). I will not describe in detail all my design and implementation decisions (you can always have a look on the code :P) but I will comment the main choices I've done to implement the robot.

Manual control

When I've received the motor driver and the chassis, the first thing I wanted to try was to implement a RC car, powered by a RPi. I had a XBox 360 wireless controller for that purpose and after a quick test I've seen that it's really well supported on Raspian thanks to xboxdrv userspace driver. Once the driver is correctly loaded, I've read the device file from /dev/js0 in Java and abstracted the necessary API to manage buttons, analogs, etc. Basically for every new event (e.g. button pressed) a string is written to the device file, containing the values of all inputs; see https://github.com/cecchisandrone/raspberrypi/blob/master/raspio/src/main/java/com/github/cecchisandrone/raspio/input/JoypadController.java for more details about this.
Regarding the motor interface, I've used Pi4j with PWM, you can see the details here but it was quite easy to implement, thanks to the library. After this part, ARPA was capable of being remotely controlled; yes, it's not a robot but this could be very useful when his autonomous decisions are not good :)

Camera pan-tilt control

In order to correctly implement face tracking, I wanted to install a pan-tilt mount for the PiCamera, moved by a couple of servo motors. These motors can be driven with PWM and even if they are really cheap, they are quite precise for this kind of applications. I preferred using ServoBlaster here because of better performance in software-managed PWM (Raspberry Pi has one physical PWM but it can cause conflicts with 3.5mm audio output). ServoBlaster is started at the startup of the RPi (using /etc/rc.local) and I send values to the motor using string commands like echo 0=50% > /dev/servoblaster.
I have wrapped also this device control with Java, you can check the code here.

Very cheap servo motor I've used

Face tracking

When I started to think about this feature, I immediately remembered a library used at the university during some experiments in Computer Vision, OpenCV. It is a C++ library but it has also good porting for Python and Java. I decided to use the Python implementation mainly for its better performances; doing CV on Raspberry Pi is not trivial, mainly because of its low clock frequency and memory. For the tracking algorithm, I've taken inspiration from many examples I've found on the web, based on Haar cascade classifiers. You can have a look at the Python code here.

ARPA face tracking web interface

As a plus, I've implemented a web interface with Flask and jQuery to show the face tracking bounding box and to toggle it (to save some CPU power when not needed). It is also useful as remote camera streaming interface to watch what the robot is seeing :)

Autonomous navigation

When you talk about mobile robot the first need is to make it move autonomously. Obviously it's a tricky task since it depends a lot on the sensors and hardware you have. The chassis I bought is not so big and also I quickly realized that the weight of all the devices, batteries and sensors is too much for it. Also encoders on this type of wheels are not really precise and not enough to build a navigation map. So I've chosen to use mainly two sensors, the sonar HC-SR04 and the magnetometer HMC5883L (connected on I2C pins), the first to measure the distance in centimeters from the head of the robot to obstacles and the second to have its heading in degrees. The algorithm I've implemented is pretty straightforward, it's just an obstacle avoidance algorithm that takes a random angle when it finds an obstacle; really dumb but enough for my tests. You can find the code for both here.

VTT and TTS engines

You can't consider a robot complete if it doesn't interact with humans using their natural language. Implementing from scratch VTT (voice-to-text) and TTS (text-to-speech) algorithms is really complex and there are decades of studies for that. So, first, I wanted to find affordable solution that could fit the performances available on the RPi.

VTT
I have tested first Google Voice engine that is really good but it allows you for free to have not more than 50 requests per day. Then I've discovered http://wit.ai that is a startup acquired recently by Facebook that provides you a rich API to implement voice recognition. In fact it doesn't transform only your voice into text but it creates an abstraction layer above; you can link every sentence with an intent (for example 'Turn off the light') and put some intent attributes on it (in this case 'off' is the parameter). Also, if you say 'Switch off the light', that is the same intent and so could be matched with the previous one. This learning process is managed with the WIT console, in which you have the log of all the recordings and you can associate them to intents and extract variables. Every recognition result has a confidence factor that can help you to decide if the sentence has been correctly recognized. The good thing of WIT is that with an automatic learning process you don't have to map every different sentence to an intent since the platform will do that for you. It's really impressive.
Once you have trained a bit the platform, you can query it with a REST API, in order to send recordings and get back the recognition results in JSON. Simple, right?

TTS
There are a lot of implementations for TTS softwares, both online and offline; you can find a list here. I've tested Google, eSpeak and PicoTTS and I've preferred the latest since it works offline and the voice is not so metallic and also the performances seemed better globally on the RPi. The engine takes as parameter the sentence and the language (it supports the 6 major languages) and writes in output a .WAV file containing the synthesized voice.