Editorâs note:Â Tareq Ismail is the UX lead at Maluuba, a personal assistant app for Android and Windows Phone that was a Battlefield participant at TechCrunch Disrupt SF 2012. Follow him on Twitter @tareqismail.
The release of Facebookâs Graph Search has raised much discussion among technology pundits and investors. One of the biggest questions surrounding the highly anticipated feature is its availability on mobile.
After all, Facebook CEO Mark Zuckerberg has said on a number of occasions that Facebook is a mobile company. âOn mobile we are going to make a lot more money than on desktop,â he said at TechCruch Disrupt SF 2012, adding âa lot more people have phones than computers, and mobile users are more likely to be daily active users.â Facebook understands mobileâs importance, so why wouldnât it offer Graph Search for Android and iPhone from the start?
Itâs simple: Graph Search for mobile would need to incorporate speech, which is a different beast altogether.
Many of the examples given during the Graph Search keynote contained long sentences, which are not easy to type on a mobile device. Think of the example âMy college friends who like roller blading that live in Palo Alto.â Search engines like Google get around this on mobile by offering autofill suggestions, but their suggestions come from billions of queries. For Facebook, since their search is based on hundreds of individual values like âfencingâ or âcollege friendsâ specific to each user and not a group, autofill suggestions will often not be useful, or worse, will require a lot of tapping and swiping to drill down to the full request.
Whatâs more is that Graph Search queries are designed to be written out naturally in full-form sentences with verbs, pronouns, etc., which is 0something that keyword search engines like Google do not need. If youâre looking for sushi places to eat on Google, itâs a five-character search for the keyword âsushi.â With Graph Search, Facebook wants to show you sushi results refined by a group of your friends, so the same search would require writing out âsushi restaurants my friends have been toâ or âsushi restaurants my friends like.â Thatâs a lot more typing.
Itâs clear that on mobile, Graph Search would need to be powered by speech to make it most effective. No one will want to type out such long sentences. Not to mention, with services like Google Now and Siri, people will come to expect control through speech.
Supporting speech is a different problem altogether than what theyâve solved so far and theyâll have to do a lot more work until itâs available on any major mobile platform. Here are four reasons why.
Speech Recognition Doesnât Come Cheap
If time is money, then speech recognition is very expensive. Itâs well-known that it requires a considerable amount of investment to develop and no one knows this better than Apple and Google.
Apple chose to not make their own speech recognition but rather license Nuanceâs technology for Siri. Nuance has spent over 20 years perfecting their speech recognition; itâs not an easy task and theyâve had to acquire a number of companies along the way.
Google, on the other hand, chose to develop their own speech recognition and needed to build a clever system to collect data to catch up to Nuance. The system, called Google 411, set up a phone number where people could call in from landlines and feature phones to ask for local results. Once they got the data they needed, they shut down the service and used it to build their recognition system. Itâs taken a company like Google, who masters search, over three years to come to where they are now with their speech recognition.
Even if it takes Facebook half as long to come up with a similarly clever solution, theyâll need to start soon for it to be released any time in the next year.
Names Are Facebookâs Strength And Speech Recognitionâs Weakness
One of Facebookâs early successes has been names. The companyâs algorithms to return the most relevant person when making a search for a friend played a key role in its early success. People are accustomed to saying âadd me on Facebookâ without the need to specify a username or handle, an advantage that makes their entry into speech that much harder.
Names are speech recognitionâs biggest challenge. Speech recognition relies on having a dictionary or list of expected words that are paired to sample voice data given to the system. Thatâs why most engines do really well when recognizing common English words but have such a hard time with out-of-the-norm names and varying pronunciations. Facebook has hundreds of thousands of names to deal with and itâs a key part of their experience, so theyâll need to master the domain for it to be useful for their users. Now, one could argue that having access to all these names may give Facebook the edge to solving this problem, but theyâll need to work on a solution for some time for it to become anywhere near acceptable.
Supporting Natural Language Isnât Easy
The final piece of the puzzle may be the most difficult: supporting natural language is really, really hard. Working at natural language processing company Maluuba, I can attest to just how hard a problem this is to solve. Natural language processing is the ability to understand and extract meaning from naturally formed sentences.
This also includes pairing sentences that have the same meaning but are said differently. For example, with Graph search, I can type âfriends that like sushiâ and it shows a list of my friends who have identified sushi as an interest, but if I type âfriends that like eating sushiâ it looks for the interest âeating sushiâ â" which none of my friends have listed â" and it returns zero results. In reality, both sentences mean the same thing but are worded differently. Understanding natural language involves understanding the real intent behind a request, not just its literal intent.
On a desktop browser, they may be able to get users to learn how to search in specific sentence templates, especially with the help of autofill suggestions. But for speech itâs nearly impossible. People ask for things differently almost every time; even the same person can ask for the same request in a different fashion when speaking. Ask 10 of your friends how they would search for nearby sushi restaurants. I have no doubt most, if not all, responses will be different from one another.
Now, they could fix the sushi example I gave earlier but that may cause false positives with other aspects of the system. Understanding natural language requires large data sets and complex machine learning to get right, something that Facebookâs Graph Search team may be investigating but will not be able to master any time soon. Itâs just not a simple problem to solve. Thatâs why Apple jumped into a bidding war to buy Siri, which at its core is a natural language processor. To put into perspective how difficult it was, Siri spun out of the DARPA project that took over five years to build with over 300 top researchers from the best universities in the country.
Languages, Languages, Languages
Facebook has over a billion users who collectively speak hundreds of different languages. Facebook has said theyâre beginning their launch with English. How long until all billion usersâ languages are supported for the desktop? And since speech is significantly harder, how long until those users are supported on mobile? Itâs one thing to support hundreds of languages through text and a much harder thing to support it through speech. This will be the problem they face for the next decade.
Facebook acknowledges that their future lies in mobile. Mobile begs for Graph Search to be powered by speech, something that Facebook simply cannot do yet. I have no doubt they will but it most definitely wonât be to any acceptable quality anytime soon. Theyâve taken the first step but they have a long journey ahead of them.
Maluubaâs mission is to empower people with the ability to find exactly what they want by speaking to their smart phone. Maluubaâs proprietary, patent-pending engine provides superior capabilities to traditional voice recognition systems. Asking a question like, âwhat movies are playing nearby?â enables users to buy tickets, find theater directions, and share search results on social platformÂs such as Facebook and Twitter. The Maluuba language engine is a product of two years of advanced research in artificial intelligence, machine learning...
No comments:
Post a Comment