Solving Voice Recognition Issues & Moving Far Beyond

Mobile phones are powerful pieces of technology, and they’ve only reached a fraction of their true potential. Some phones today are faster than computers we used only a few years ago. The trend at the moment is moving from single core to dual core, going up to 1 GHz with more RAM. Even though technology is developing very rapidly, we’re seeing some issues in one of the areas with the fastest growth: voice recognition. Once we move past these issues, the capabilities of mobile phones will exceed the wildest imagination.

In addition to Apple and Google, we know a number of smaller companies working on voice recognition and text to speech technology. In order to bring these two technologies to a global audience though, the challenges of different languages and different accents must be addressed. This issue reminds us of an old Clint Eastwood movie, Firefox, where the Soviets had a plane that was controlled by the thoughts of the pilot. To operate the plane, the Clint Eastwood character had to be able to think in Russian rather than English. This movie came out in 1982, but the problem is one we still face today with voice recognition. Some of the early testing we’ve done with French, Italian and a few other languages revealed issues with translation, but the technology will get there. Once it does, there won’t be a need for large screens. Voice recognition and text to speech will allow phones to be a lot smaller in the next two or three years.

Even more exciting, once today’s technologies are brought together, the mobile phone may be able to sense and interpret your brain waves. Some of the recent research we looked at showed that monkeys are able to control robotic arms based on a technology that can sense what a monkey is thinking about. When the monkey wanted to eat a banana, he controlled the robotic arm to reach for a banana and peel it open simply by thinking about it. We already have our phones on us all the time. Imagine the possibilities when that phone can sense what you’re thinking and knows what’s happening around you.

With a phone that’s always listening and taking action through your speech, brain waves, body temperature and other inputs, the mobile device becomes a super-assistant. Voice recognition acts on keyword recognition. It’s designed to recognize particular buzzwords like “Mom,” with an action, like “send flowers.” Your mobile phone super-assistant will speak into your ear, reading documents or emails, while sensing when there’s a turn coming up. Right now Siri can do certain tasks, but in the future it will be able to guide and help you to do many more things, some of which we have yet to imagine.

Someday, your phone may even save your life by detecting a significant change in your body temperature or heart rate. For example, the phone could sense heart fibrillations and blood pressure fluctuations for cardiac patients. It may also be able to transfer medical data to hospitals and physicians. Phones could monitor blood sugar and blood oxygen levels, which could be life saving.

In another time, these ideas were the stuff of science fiction, but they’ll soon be reality. The mobile industry is the one to watch for enterprise applications. There is already a huge stream of add-in options coming. To be working on these projects today gives Vensi a huge edge. We’re very happy to be in this space.

Building on Siri – The Future of Mobile Software Development

Siri, the voice in the latest iPhone, is a great start for Apple. It’s been successful, probably beyond their expectations. This has made them even more protective in their drive to prevent “jail breaks” – that’s what the development world calls efforts to open up, or gain access to, the operating system so they can run software and do other things not authorized by Apple. With so much at stake, and with those efforts to break through the defenses so relentless, Apple is vigorously defending its technology, using every tool and technique available.  For example, Apple filed a suit recently against Samsung for infringing on four patents, including Siri’s “voice search.” Sooner or later though, Apple will have to yield and open up a bit. When they do, developers will rush to add Siri’s voice recognition features to their applications.

That voice recognition capability is part of a field called natural language processing or NLP, and its potential goes far beyond the usefulness we’ve seen to date. The way it works is straightforward in principle, if not so much in programming. What happens when you say something like “Please call my Mom,” or “Call Mom,” or something similar? The software recognizes the words “call” and “Mom” and links them. Same thing if you say, “Dial Mom.” The software recognizes nouns and verbs and their synonyms and responds by taking a specific action.

There are many ways this idea can be expanded upon. Right now you have to press a button to activate the software, but you could have a sound activated system that monitors a room and takes a specific action whenever a certain word or sound is heard. It could be in a secure area; it could be in a nursery; or it could be in a conference room. The system could be flexible enough to adapt itself to a complicated spoken command such as “Listen for the phrase xyz, then calculate the latest values for factors 1, 2, and 3. Display the results on the conference room monitor.” The possibilities are endless. Think Star Trek and the voice controlled systems on the Enterprise. They’re always on, always listening, and always responding.

You can extend the notion even further. Cell phones were getting smaller until touch displays and video cameras came along. Then they had to get bigger in order to have a useable display. But, suppose you didn’t have to scroll through apps to find the one you wanted? You could just tell your phone what you wanted or where you wanted to go, and it would sort through the hundreds of apps, find the one corresponding to your needs, activate it, and report/display the results to you. It could also keep track of your location and surroundings and give you a running commentary and make informed suggestions based on your transaction history and interests. Face it, sooner or later your phone will shrink to pen size or smaller, maybe even become part of your body as an implanted micro device.

Or, you might combine NLP with very flexible LCD displays descended from the flexible displays that tech companies like Nokia and Samsung are working on. Then you’d have a screen you could unroll when you need a face to face with someone or want to view content the way you might view it on a TV, magazine, or newspaper. It’d be something like the old-fashioned book scroll you see in period pieces, or even more compact when it’s time to put it away. That’d fit right in with the pen analogy.

So what’s your take on the future of mobile development?  Why don’t you write to us at info@vensi.com and tell us?  If your idea(s) are interesting enough, maybe we can make that happen…

Siri the Imperfect Siren

Siri is like that. An irresistible temptress with a fascinating voice and a wealth of knowledge. One who answers your questions, guides you to your destination, and even suggests activities for you. But, like a siren, she is not infallible. And like a siren, she has a history and sisters that are trying to outdo her, but with limited success to date.

At its core, Siri is a very sophisticated voice recognition program that shares a technological history with other voice recognition platforms such as OnStar. Siri works worldwide with over a dozen applications such as stocks and weather, and (so far) English-only applications in the US such as maps. There are imperfections, however, that can cause frustration and dissatisfaction.

Misunderstanding the speaker is one of them; it’s going to happen with any voice recognition application. Imagine that a real person gives you a phone number over the phone. Do you think you would be able to get it on the first try? Isn’t it likely that you ask whoever you’re talking with to repeat it at least once? You probably do this for the simple reason that it’s hard for anyone to understand what another person is saying with 100% accuracy 100% of the time. It makes no difference what the topic is. Everyone has different speech patterns, accents, tonal values, etc., that make occasional misunderstandings inevitable. A speech recognition program is no different in that respect. It can misunderstand despite the developer’s best intentions.

With Siri, it’s a case of people having unrealistic expectations and just assuming Apple will always get it 100% right. Sure, Siri gets better as it gets to know you and builds its knowledge base. It’s unlikely, however, that Siri will ever get to 100% infallibility with its current software, or even with periodic upgrades.  But that’s okay because 100%  is really not necessary.

This premise can be true for any technology, even Vensi’s. Because of the complex nature of software development and its ongoing evolution, not all of the projects we do are going to be 100% perfect. We are able to meet our clients’ expectations in terms of functionality, delivery timetables and cost without trying to achieve the ambitious goal of perfection. We can get pretty close though, by partnering with our clients so that we really understand their needs, goals and objectives. We make sure that we satisfy every requirement, and deliver on time and on budget. We strive for perfection while balancing the real-world limitations of a competitive marketplace. That’s what Apple did with Siri: They set the bar at accomplishing the best that they could and they launched when they felt it was good enough. Like every technology and software company out there, they go to market as early as possible, knowing they will continue to make improvements and understanding that they will probably never get all the way to 100%.

Sometimes people expect perfection, but more often than not, our clients understand that perfection is not always possible or even necessary. Even if it is possible, it’s not something many clients really want because the cost and effort necessary to reach that last few percent of perfection far outweighs any value it might have.

Here at Vensi, we understand Pareto’s Law: if we can get 80% of the functionality in 20% of the time, the remaining 20% of the functionality will take 80% of the time. Our goal is to balance functionality with our ability to provide high quality, cost-effective mobile and web solutions in a timely manner to our clients. This approach works.