Objectives
Experiment with speech recognition in order to control an electronic device remotely through Internet, and demonstrate new functionalities of Google Chrome 11 browser supporting voice-to-text and HTML5.
Introduction
In April 2011, Google released version 11 of the Chrome browser. One of the most important addition to the new version of Chrome browser is the support for speech recognition. One can enable speech recognition with a simple tag <input type=”text” x–webkit-speech />.
Google is effectively using the WebKit layout engine, first released as a beta version by Microsoft Windows in September 2008, and the public stable release was released in December 2008.
The electronic setup
A AXE050 board with Picaxe18M microcontroller is used. Picaxe products are manufactured by Revolution Education. Here is a picture of AXE050:
Note: AXE027 programming serial cable is required. Picaxe-18M uses the same port for programming and serial communication with PC.
The firmware in AXE-050 is the same as we used in Internet Control of a Picaxe (check the link). The idea behind using the same firmware without modifications, is to use it in future voice operated projects (if I get time for them…)
The main server
A Windows-based XAMPP server is installed in the PC. The server is hooked to a local network with a fixed local IP. The application is situated in a network that has no fixed public IP, hence DynDNS services is used. The router routes all external demands to the fixed local IP.
Windows XP and later do not allow easy RS232 control through Internet. To overcome such a blockage, RS232 control is done through DOS, using Kermit for DOS. Kermit scripts are run by PHP, to control AXE050. Kermit’s executable MSK316.exe is installed in same directory as all files. They will be all in c:\xampp\htdocs (rename the servers default index page to another name)
Note COM2 is used at baudrate of 4800 setup in Control Pannel of Windows XP.
LED ON Kermit script (name it ledon.ksc):
set port 2 set baud 4800 output c=n exit 0
This script is called in a BATCH file (name it ledon.bat):
@echo off c:\xampp\htdocs\msk316 take ledon.ksc exit
LED OFF Kermit script (name it ledoff.ksc):
set port 2 set baud 4800 output c=f exit 0
This script is called in a BATCH file (name it ledon.bat):
@echo off c:\xampp\htdocs\msk31 take ledoff.ksc exit
All batch files are then converted into .exe using Bat_to_Exe_Converter. So we will have two executables ledon.exe and ledoff.exe in the folder htdocs
Finally the html and php codes…
The html code is based on Romin Irani’s Voice Enabled Web Applications via x-webkit-speech example, that has been adapted to suit the requirements of this actual project.
index.html
<!DOCTYPE html> <html> <head> <meta http-equiv="content-language" content="en-US"> <meta charset="utf-8"> <title>Voice Operated Control </title> <script type="text/javascript"> function checkSpeechSupport(){ var element=document.createElement("input"); if (!('webkitSpeech' in element)) { alert("Sorry! Your browser does not support the Speech Input"); } } function checkanswer() { var answer = document.getElementById('q2answer').value; if (answer == 'light on') { window.location.href = "http://localhost/processing.php?action=on"; } else if (answer == 'light off') { window.location.href = "http://localhost/processing.php?action=off"; } else { window.open('http://translate.google.com/translate_tts?tl=en&q=oops!%20try%20again%E2%80%9D'); } } </script> </head> <body onLoad="checkSpeechSupport()"> <form id="speechform"> <fieldset id="inputs"> <font face="arial, verdana"><legend><h2>Voice Control of Picaxe AXE-050U:</h2></legend> <label for="q2answer">Say "light on" or "light off" command:</label></font> <input id="q2answer" name="q2answer" type="text" x-webkit-speech onwebkitspeechchange="checkanswer()"/><br/> <img src="http://localhost:8888/out.jpg"> </fieldset> </form> </body> </html>
processing.php
<?php //check the GET actions variable to see if something needs to be done if (isset($_GET['action'])) { //Action has been requested //Issue the command we wish to send to the Picaxe if ($_GET['action'] == "on") { //Turn LED on - for this simple script we are just looking for either a 1 or 0 $page = "script.html"; header("Refresh: 2; URL=\"" . $page . "\""); echo "<font face=\"arial, sans-serif\"><b>LIGHT ON</b></font>"; exec("ledon.exe"); } else if ($_GET['action'] == "off") { //Turn LED off //Now we "open" the serial port so we can write to it $page = "script.html"; header("Refresh: 2; URL=\"" . $page . "\""); exec('ledoff.exe'); echo "<font face=\"arial, sans-serif\"><b>LIGHT OFF</b></font>"; } } ?>
We are almost ready. We need to add video to the main index.html page, In the form section you will see http://localhost:8888/out.jpg. In fact we need to run a video server. Yawam is chosen to show a live image of the status of our AXE050 board.
YAWCAM
In order to view the LED switiching on and off a webcam setup is required. YAWCAM freeware is advised. Having the requirements mentioned on the YAMCAM website one can easily establish a streaming server on port 8888. If the PC is behind a router the video streaming server port must be opened on the router.
Once YAWCAM is duly working you will need to enable Http and Stream in COntrol Panel of YAWCAM as the following image shows:
Trying out
We are now ready to open index.html. Point your Chrome 11 browser to http://localhost/index.html. Tou may now see the following (after you have directed your webcam to the AXE050 board):
Notice the little microphone shown at the end of the input text field. You will need a duly set up microphone in your PC (I have used an USB microphone). Do not write anything in the input field. Simply click on the microphone and this should prompt you to speak as shown below:
Once the browser records your voice input, it contacts Google Servers to do the translation into text for you, and the value will be put in the text field. Say the words “light on” and if the answer is right processing.php link will be opened:
In backgroung ledon.exe will be run for a short duration and a LED on the AXE050 will switch on. The processing page will be automatically redirected to index.html page which will now show a live image of AXE050 with a LED switched on:
If the command is not recognised you will prompted to say it again, as you will hear the robotic voice from Google “Oops! Try again!”. You can disable popup in your Chrome 11 browser if you don’t want popoups that you have to close.
Now we are ready to switch off the LED with voice command “light off”. If the command is recognised, processing.php will be opened again, and the following page will be shown:
In backgroung ledoff.exe will run for a short duration and the LED on the AXE050 will switch off. The processing.php page will be redirected to index.html page which will now show a live image of AXE050 with LED duly off, as we were in our first initial stage.
Conclusion
We have experimented with voice (speech) recognition. The technology has not come to age yet. One has different accents and pronunciations (mine is a big mixture 😉 ) and it is a huge task to recognise all of differing human voices. The interaction with devices adds an extra dimension to voice recognition. Chrome 11 browser’s support for speech to text is not well domumented. but gradually its potential is being discovered and shared.
NOTE: Check part II of this project!
Acknowledgements
A special thanks to Max Carter for sharing the orignail PICAXE firmware, and to Romin Irani for his above mentioned index.html code (duly adapted for purpose of this project).
Pingback: Voice Operated Internet Control of a PICAXE | RedAcacia
Pingback: Voice Operated Internet Control of a PICAXE (Part II) | RedAcacia