This article was last updated on June 18, 2022
When I started this little endeavour, I had the intention of clearly and accurately presenting some pure unadulterated facts. I didn’t really quite grasp the magnitude of this exercise. Despite my waxing enthusiastic about the Internet and how all of us are contributing our knowledge to the sum total of the knowledge of the entire planet and about Google (see my blog Google) and how it in organizing all this information allows all of us to be able to actually find stuff in amongst those zillion pages, I have discovered that misinformation is still rampant and we have a long way to go before we eradicate ignorance if not stupidity.
In my blog Pornography: Statistics Laundering I discuss some of the misinformation, exaggerations and outright self-serving lies which are floating around relating to this topic. I am somewhat outraged to find that many of the doomsday stats presented by what appear to be ring wing conservative religious groups seem to be in no way representative of reality. In fact the gross exaggerations of the situation seem to be designed to scare the electorate into voting for public policies which would greatly curtail our freedoms. Far be it for me to dictate what you do in the privacy of your own home between consenting adults.
A little warning: Readers of this blog will note that I usually write out profanities as f**k. This isn’t out of prudishness; this is deliberate on my part because I think it’s a bit funnier just in the same way I find Jon Stewart of the Daily Show being bleeped on regular television a bit funnier than actually hearing the word. [laughs] Okay, maybe that does make me prudish? In any case, for the sake of clarity in the lists below, I spelled out the words in full: a break with policy. See my blog I suck; you suck; we all suck. What!?!
The size of the Internet: 24 billion pages
First of all, the following is probably not the be all and end all. I am a devoted user of Google but that doesn’t mean that Yahoo or Bing is not without merit. Due to time, I have to restrict myself to something and I think choosing the mother of all search engines seems like not an incorrect choice.
Oddly enough or maybe not oddly enough, there is quite a bit of opinion on the size of the Internet. Once again I am thwarted in going to a single definitive source of information and must make do with an estimate which I’m sure will be contested.
Size of the Internet: 24 billion web pages
When you know, for example, that the word ‘the’ is present in 67.61% of all documents within the corpus, you can extrapolate the total size of the engine’s index by the document count it reports for ‘the’. If Google says that it found ‘the’ in 14,100,000,000 web pages, an estimated size of the Google’s total index would be 23,633,010,000.
This gentleman seems to present a compelling calculation however what catches my eye right off the bat is that my query on the word "the" matches what he says in his article with slight variances from day to day. I’m not 100% sure that his statement is true: the word "the" appears in 67.61% of all web pages but a cursory test of a couple of web sites seems to yield similar percentages. However I note that 14.1 billion is 59.66% of the authors 23 billion number, not 67.61%. Whatever. For the purposes of the following calculations, I am going to go with 59.66%. Since I just queried "the" and ended up with 14,470,000 hits, I will calculate the total pages as being 24,253,167,000.
The amount of pornography on the Internet: less than 1%
I used an Excel spreadsheet to track this information and calculate the percentages. The method is simple: go to Google; type in the word and hit Enter; note down the number of results shown under the Search box then calculate those results as a percentage of the estimated total pages index by Google. Here I am using the number of 24,253,167,000 as the total number of pages. FYI: These numbers change from moment to moment, day to day so I doubt you will get exactly the same numbers but they will be similar.
|word||# of hits||% of total|
|word||# of hits||% of total|
|word||# of hits||% of total|
|word(s)||# of hits||% of total|
The "quantity" of pornography on the Internet seems to be less than 1% of the total web pages published.
I’m sure that others may object to this rather simplistic methodology but from the table above, any of the porno keywords are clearly coming in at less than 1%. I find it hard to believe that web sites, web pages offering pornography whether it be pictures, movies or erotic stories are not in some form using one or more of the above keywords.
The above searches all pages, that is, any language. I did experiment with Advanced Search trying to zero in just on English pages but found the results a tad odd. The word sex normally returns around 570,000,000 hits but with Advanced Search in selecting English only, I ended up with 615,000,000. Hmmm, I would have expected a lower number. Beats me why but I don’t think this affects the outcome of my unscientific scientific research.
We can debate the total number of pages however the overall results will be the same. The porno words just show up as a low percentage.
Click HERE to read more from William Belle
my blog: Pornography: Statistics Laundering
The Straight Dope
How much of all Internet traffic is pornography? – October 7, 2005
This gentleman gave me some of the ideas for doing my investigation. While his numbers are five years old, he concluded his study about the amount of pornography on the Internet by stating, "I’d say we’ve got an unremarkable list of life’s little pleasures, whether online or off."
Indexable or number of web pages on Internet
This gentleman presents 3 estimates. As I said, method #2 is confirmable.
- Supposedly from Google dated 2008: 1 trillion
- The method I used above: here around 27 billion; I used 24.
- His own method combining 1 and 2 to arrive at 48 billion.
According to our urban legends debunkers, this particular phrase was never uttered by Sergeant Joe Friday. Apparently he said, "All we want are the facts, ma’am." but over the years, the line being repeated and accidentally modified by so many people, the popular version has become the stable of folklore.