Not a GeoCities Ad
FREE TIBET
With the purchase of a Tibet of equal or greater value
FREE VIAGARA
Contribute to the Harry Viagara Defense Fund
These are not ads, they just fill the space under the Geocities popup so that it doesn't block the text on the page. That way you can read it while waiting for the popup to minimize. Just a formatting trick I am trying.      

Internet Search Engines and Pornography

In the ongoing debate on Internet pornography, there has been a plethora of anecdote and a paucity of data. Two recent studies give some indication of the extent of the problem. A recent survey (1) classified 2,500 random servers into categories. About 1.5% of servers were classed as pornography. While a small number, it is larger than the amount of government sites.  Since "the publicly indexable World Wide Web now contains about 800 million pages" (2),  about 12 million of those pages are pornography.

Chicago Public Library recently announced the results of a three month study of sites retrieved on their public Internet workstations. Less than 5% (apparently meaning between 4-5%) of their overall results were considered pornographic, with the number dropping to less than 2% (again, meaning between 1-2%) on their children's workstations.

This survey seeks to further illuminate the debate by examining how common pornography is in actual searches. Search tools are some of the busiest and most profitable sites on the Web. About 85% of users go to a search tool when they want to locate something (3), so this is an important place to gather data.  How accurate their search algorithms are and how well their relevancy rankings perform determines how much unwanted pornography shows up. Searches performed by random users were analyzed to determine how common searches for pornography are and how common "incidental" pornography is.

Methodology
A total of 207 searches were collected from the Magellan search engine using  Megellan Voyeur (4), a feature of the Magellan site that displays random searches currently being performed. Aptly named, it allows the viewer to follow any of the displayed searches. No identification of the original searcher can be made. The displayed searches can also be copied and saved, as was done for this survey.

Though the searches displayed are random, the searches saved were not random. Data was collected on weekday nights between 7:00 p.m. and 10:00 p.m. (CST) and on Saturdays between 5:00 p.m. and 6:00 p.m. This was done to minimize searches from classrooms and businesses and maximize searches by individuals. Only English language searches were included in the survey, since foreign language searches were difficult to evaluate for intent. Magellan is not heavily advertised and does not command a large market share. Users of the Magellan site have sought out a well regarded but not well known tool. While the searchers in the study sample can not be identified, the time and language constraints make it likely that they are from either the United States or Canada, are not at work or school and may be slightly more sophisticated than average.

The study re-entered the searches in the sample in Hotbot and the top ten responses checked for pornography. This study used Hotbot to search for pornography since it reflects the Internet's overall content better than does the smaller and cleaner world of Magellan. A site was considered pornographic only if it declared itself unsuitable for minors through use of Adult Check, requiring a credit card or some other statement. Several sexually related or fetish sites turned up, but were not counted, as they did not meet the above criteria.

Magellan (5)  is similar to the better known Yahoo in that it is a subject directory. It selects sites and arranges them into subject categories. It is those selected sites and categories that are searched, as opposed to search engines such as Alta Vista or Hotbot, which try to cover wider portions of the Internet. Since it is selective, it indexes less pornography than a general tool such as Hotbot and even includes a "green light" feature, which searches only sites with no content intended for mature audiences.

Hotbot (6) is full featured tool and one of the most popular sites on the web. It covers a relatively large percentage of the web and has fewer invalid links that most search engines (7). Unlike Magellan, it does not select sites for inclusion, but relies on web bots and spiders to identify and index sites.

Searches for Pornography

Only seven searches appeared to be seeking pornography (8). These searches also seemed to be very successful, with at least eight of the top ten returns meeting the study's definition of pornography. Identifying searches for pornography is a subjective task since the searchers intent is unknowable. Nevertheless, searches such as +porncast, free downblouse and preteenporn must be expected to retrieve pornography.

Incidental Pornography
The 200 non-pornographic searches were examined for pornographic sites in the first ten results. The Hotbot default is to display the ten items it considers most relevant on the first screen. Six (9) of these searches (3%) turned up one to three pornography hits in their top ten. These searches did not appear to be seeking pornography, but found it anyway.  In the 200 searches in the sample,  a total of 1,634 hits were examined. Only ten, a  tiny .61%,  of these hits were incidental pornography.

Credit must be given to the programmers at Hotbot and other search tools. It is not uncommon for a pornography site to have thousands of words or even entire dictionaries in their keyword fields. The algorithms used are now sophisticated enough to prevent unintended pornography from showing up in all but a small percentage of searches, despite the best efforts of some sites. While search tools are usually able to deal with deceptive keywords, this will require a continued effort on their part, since no rules to prevent deceptive sites seem possible or effective. In the free wheeling, supranational and market driven world of the Internet, deceptive advertising and marketing are almost beyond policing.

This survey provides some data for the ongoing debate. Searches for pornography  on the Internet make up only a small percentage of all searches, but their absolute number is very large. While pornography is not common, it will be encountered on a regular, but infrequent, basis by those who use search engines. When retrieved, pornography is easily identified, which makes it simple to avoid while simultaneously attracting attention to it.

Searches for Pornography Searches that found "incidental" pornography Searches that did not seek or find pornography
Searches (n=207) 3.4% 2.9% 93.7%

Table 1 - Searches and Pornography

Sought Pornography "Incidental" pornography Non-pornography
All results (n=1704) 3.8% .58% 95.6%
Results from non-pornographic searches (n=1634) .61% 99.39%

Table 2 - Search Results and Pornography

References
1.  Steve Lawrence and C. Lee Giles, "Accessibility of information on the web" Nature v. 400 (July 1999): 107-109.
2.  Ibid.
3.  Ibid.
4.  The Magellan Voyeur site is located at http://voyeur.mckinley.com/cgi-bin/vogeur.cgi
5.  The Magellan subject directory is located at http://magellan.excite.com
6.  Hotbot is located at http://www.hotbot.com
7.  Steve Lawrence and C. Lee Giles, "Accessibility of information on the web" Nature v. 400 (July 1999): 107-109.
8.  The searches were: +cyper +sleaze; sable photos; skirt+thumb; +porncast; free downblouse; preteenporn; pregnant jpeg.
9.  The searches were: "spring break"; amanda robbins, N.A.T.O.; www.aaaadv.com; +free +gif.'s; backdoor innocent angels.

Meet the One Librarian or Return to One Librarian's Opinion

1