Reverse Engineering Google Suggest “No Fly” List (Who is Google Protecting?)
Okay, so now you’ve read about it here, at Being Amber Rhea, and at Bacchus’s Eros Blog. Google Suggest will autofill somethings, like [stormfront] or [comstock films podcast]; but not other things, like [being amber rhea] or [violet blue]. Let’s call it the Google Suggest No Fly List.
This is a fact. A lot of what people say about Google is rumor-mongering and tinfoil-hate conspiracy paranoia. But the quirks in Google Suggest’s autofill, that [peggy comstock] will autofill at the ‘m’, but [tony comstock] will not – that’s not something somebody said might be so on some underground blackhat SEO bulletin board. It is a fact. You can call up Google.com right now and try it for yourself.
How and why Google does this gets more into Renolds Wrap territory, and unless Google decides to tell us how and why they constrain the Google Suggest autofill all we can do is speculate. But since speculating about Google is even more popular than downloading dirty pictures, here’s a little of mine.
Given what’s been observed by Amber Rhea and Bacchus, it would seem a fairly reasonable inference that it has something to do with sexuality. It would also seem to have something to do with Google’s concept of “safety”; that is to say, protecting Google from returning search results that people might find offensive. Since as far as we know, Google can only deal with sexuality algorythmically, it would also seem reasonable to investigate at Google’s construct of “safety.”
(If you’d like to open another browser tab and switch your Google “SafeSearch” preferences to “strict filtering” you can play along at home.)
Open up google.com. Enter [nude] into the search. You’ll have to enter the entire string, because Google Suggest won’t autofill [nude]. Hit the “Google Search” button. The results? No returns. Under Google’s “strict filtering” there are no returns for the search [nude]. Now toggle your preferences to “moderate filtering”. What do you see now in the top results? nude.hu, paradisenudes.com, freshnudes.com, etc; sites that are not appropriate or even legal to display to minors.
Now try search [sex] with under Google’s “strict filtering” setting. Again, you’ll have to enter the entire string because Google Suggest will not auto fill [sex].
Unlike the “strict filtering” search for [nude], a search for [sex] under the “strict filtering” setting does deliver returns; wikipedia.org, webmd.com, amazon.com, others. Perhaps some people might be upset by having their children wonder into WebMD’s “6 Mistakes Men Make Having Sex”, but the important thing is that WebMD uses only proper medical terminology and euphemisms. And “nude”. We know that WebMD doesn’t use the word “nude” in the article, otherwise it wouldn’t show up in a Google “strict filtering” return.
Now toggle to “moderate filtering” and search [sex] again. The number 1 return? Pornhub.com. My best guess is that’s why, although there are “strict filtering” returns for [sex], Google Suggest does not autofill for [sex]. If there are high ranking returns for a string that will be filtered by Google’s “strict filtering” settings, then Google Suggest will not autofill the string, regardless of how the user has the set Google SafeSearch filter settings on their browser.
Let’s test the hypothosis.
Toggle back to “strict filtering” and search [peggy comstock]. Google Suggest will autofill this sting at [peggy com ]. Now look at the returns. Since we’ve done a “strict filtering” search, there’s nothing in the returns that has words like [nude] or any of the other words that deliver no returns under “strict filtering ([fuck], [tits], [bastard] to name a few strings.)
Now toggle to “moderate filtering and search [peggy comstock] again.
Like before Google Suggest autofills the string at [peggy com ]. And like before, the results are “strict filtering” compatible.
Another thing that is odd is that for some strings, Google Suggest will auto fill right pass the “root” string, a string that has returns that won’t pass SafeSearch “strict filtering” and auto fill longer dirivitve strings that have returns that will pass Google’s “strict filtering”. So Google will suggest [comstock films podcast] butnot [comstock films]. Google will suggest [video video download], but not [video video]; [sex offender] but not [sex].
But wait, before you go off an tell everyone what a genius I am for figuring out how and why Google Suggest does or does not autofill some search strings and not other, try this one. Toogle your settings back to “strict filtering” and enter [bastard]. Google suggest will autofill the string at [bast ]. Now hit search.
Now isn’t that weird. Google Suggest autofills the string [bastard], but Googles SafeSearch “strict filtering” doesn’t deliver any returns. Hmmm. Let’s toggle back to “moderate filtering” and search [bastard again].
Double hmmmm. wikipedia.org, reference.com, and bastards.org, an adult adoptee rights advocacy organization. Google Suggest auto fills [bastard], but Google’s “strict filtering” doesn’t return results, and I can’t see the how or why of that, at least not algorythmically. What is it about these results that pass the ”moderate filtering” setting that gets them trapped in the “strict filtering” setting. And with no “strict filtering” result, why does Google Suggest autofill the string [bastard]? Maybe the Google algorythm that controls these things is more sophisticated than I first thought!
Let’s try another!
Back to “strict filtering”. Now type in n i g g e r. Yep, I’m dropping the n-bomb. Let’s do a Google SafeSearch “strict filtering search for [nigger]. You’ll have to type the whole thing because Google Suggest will not autofill the string.
Now hit search.
Hmmm, ummm, okay. Top result? Wikipedia’s entry for nigger. Next result? Niggermania.com, which bills itself as “the best site for nigger jokes, rants, and racist humor.” Well as long as they don’t use the word “nude”, I guess it’s okay… Let’s toggle back to “moderate filtering” and try it again. Pretty much the same results. This is kind of the reverse of [bastad]. No on the Google Suggest autofill, but yes on the SafeSearch “strict filtering” returns. There’s no way I can arrange overlapping circles that explains this!
So here’s what we have so far:
Sometimes it appears that Google suggests won’t autofill a search string because their are no “strict filtering” search returns for the string. [nude] and [fuck] are two such strings.
Sometimes it appears that Google suggests won’t autofill a search string because the top returns for the string under “moderate filtering” are results that “strict filtering” will not allow to pass. The way that Google handles the search string [sex] is an example of this.
(Doing a venn diagram in my head, I don’t see a situation where these two conditions would be incompatible, but perhaps a reader with an information science degree and access to a white board can can either confirm this or correct me.)
But then we have [bastard], which Google Suggest will autofill at [bast ], but has no “strict filtering” returns. And further more, when we look at the Google SafeSearch “moderate filtering” results, it’s not clear why any of them would fail to pass the “strict filtering” SafeSearch settings. On [bastard] the apparent link between Google Suggest autofill and search returns breaks.
Adding to a sense that things are more complicated than they first appear is Google’s treatment of the search [nigger]. Google Suggest will not autofill the string, but Googles SafeSearch “strict filtering” delivers returns. Someone Google Suggest has identified [nigger] as a potentially offensive with something more sophisticated than simply looking at whether or not the search results for the string pass the “strict filtering” setting. My venn diagram approach is starting to look like it’s no sufficient to explaining what’s going on at Google.
Now I suppose if a Google engineer, with her Stanford/MIT/Cal Tech PhD is reading this she’s chuckling at my simplistic attempt to explain the behavior of the Googlebot. That’s fine. I’m not a computer scientist. I’m a filmmaker and a businessman, and if my understanding Google and search and the internet is simplistic, or skewed by my own self-interest, or just dead wrong, it’s not something I’m ashamed of. If Google or anyone else wants to lay it all out for me, I’m a quick study. I’ll sort it out and adjust accordingly. Or not. My parents played “Man of La Mancha” over and over again when I was a toddler, and I’m pretty sure that gave my brain a quixotic bent.
Now let’s look at one more.
Toogle back to “strict filtering” and enter [sex education] in the search box. Again, you’ll have to type the whole thing because Google Suggest won’t do it for you. Now search.
Not surprisingly, the results are pretty benign looking. Nothing on the first page of the “strict filtering” results that raises a “won’t someone please think of the children!” flag. But that was (mostly) true about [sex]. It wasn’t until we look at the “moderate filtering” results that we got sites that were not appropriate for minors. Go ahead, toggle to “moderate filtering”.
Hmmm. I don’t know what to say. Granted, I come from a pretty liberal point of view about children and sex information, but if ”6 Mistakes Men Make Having Sex” passes Google’s “strict filtering” algorythm, I’m not seeing anything here that raises an obvious red flag. Just for kicks, let’s toggle our SafeSearch settings to “do not filter my results” and see what happens. I know, I know, “do not filter my results” isn’t supposed to be about text, just image. But let’s give it a try anyway.
Well hmmm and hummm and hrrrm. Again, using ”6 Mistakes Men Make Having Sex” as a benchmark (remember, that was a page 1 “strict filtering” return for [sex]) nothing on a “do not filter my results” search for [sex education] that waves a big “won’t someone please think of the children” flag. So why doesn’t Google Suggest autofill for [sex education]? Who would be against sex education?
Oops.
I think we just arrived back at Google’s concept of safety, and it’s starting to look like it’s not really about protecting children, is it?




























November 18th, 2008 at 8:31 am
[...] Reverse Engineering Google Suggest “No Fly” List | The Art & Business of Making Erotic Films "Now I suppose if a Google engineer, with her Stanford/MIT/Cal Tech PhD is reading this she’s chuckling at my simplistic attempt to explain the behavior of the Googlebot. That’s fine. I’m not a computer scientist. I’m a filmmaker and a businessman, and if my understanding Google and search and the internet is simplistic, or skewed by my own self-interest, or just dead wrong, it’s not something I’m ashamed of. If Google or anyone else wants to lay it all out for me, I’m a quick study." (tags: google search sexuality) [...]
November 26th, 2008 at 1:51 am
[...] Reverse Engineering Google Suggest “No Fly” List (Who is Google Protecting?) [...]