September 29, 2010
By: Leslie Horn
Google recently added a predictive feature to its search engine called Google Instant, which is intended to display desired search results as you type.
“It’s search at the speed of thought,” Marissa Mayer, Google’s vice president of search products and user experience, said at a press conference in San Francisco earlier this month.
But what if those thoughts aren’t entirely appropriate? What words are not included in Google Instant?
The tool omits certain terms that it deems offensive. When you type in one of these words, the instant feature disappears. Google’s Joanna Wright told CNN that the constraints are in place to protect children. Although Google hasn’t released the list of terms itself, 2600.com has compiled a list from user input.
It reveals stark inconsistencies in the words with which Google Instant takes issue. For example, the word “lesbian” is blocked, but “gay” is not. “Cocaine” doesn’t make the cut, but words like “crack” and “heroin” are passable. Many seemingly innocuous words are blocked, too. “Scat,” as in the type of vocal improvisation often used by jazz musicians, is one of the strange excluded words, most likely because of its NSFW double meaning. The word “hate” is also blocked as is “Lolita,” the name of the classic novel by Vladimir Nabokov.
Hacker Web site 4chan is also not a Google Instant favorite. It blacklists if a user types “B” or “Y” after the name, 2600.com said.
If you notice a word blocked by Google Instant that is not on 2600.com’s list, they ask that you e-mail them with submissions. Of course, you can still search these terms, but they won’t be predicted when you’re typing them.
A Google spokesman said the company has a narrow set of removal policies for pornography, violence, and hate speech, but the issue is complex.
“It’s important to note that removing queries from autocomplete is a hard problem, and not as simple as blacklisting particular terms and phrases,” he said in a statement. “In search, we get more than one billion searches each day. Because of this, we take an algorithmic approach to removals, and just like our search algorithms, these are imperfect.”
Those algorithms look not only at specific words, but phrases – in multiple languages.
“So, for example, if there’s a bad word in Russian, we may remove a compound word including the transliteration of the Russian word into English,” he said. “We also look at the search results themselves for given queries. So, for example, if the results for a particular query seem pornographic, our algorithms may remove that query from Autocomplete, even if the query itself wouldn’t otherwise violate our policies.”
He acknowledged that the system is “neither perfect nor instantaneous,” and said Google continues to “work to make it better.”