Not a GeoCities Ad |
FREE
TIBET With the purchase of a Tibet of equal or greater value |
FREE
VIAGARA Contribute to the Harry Viagara Defense Fund |
These are not ads, they just fill the space under the Geocities popup so that it doesn't block the text on the page. That way you can read it while waiting for the popup to minimize. Just a formatting trick I am trying. |
In the ongoing debate on Internet pornography, there has been a plethora of anecdote and a paucity of data. Two recent studies give some indication of the extent of the problem. A recent survey (1) classified 2,500 random servers into categories. About 1.5% of servers were classed as pornography. While a small number, it is larger than the amount of government sites. Since "the publicly indexable World Wide Web now contains about 800 million pages" (2), about 12 million of those pages are pornography.
Chicago Public Library recently announced the results of a three month study of sites retrieved on their public Internet workstations. Less than 5% (apparently meaning between 4-5%) of their overall results were considered pornographic, with the number dropping to less than 2% (again, meaning between 1-2%) on their children's workstations.
This survey seeks to further illuminate the debate by examining how common pornography is in actual searches. Search tools are some of the busiest and most profitable sites on the Web. About 85% of users go to a search tool when they want to locate something (3), so this is an important place to gather data. How accurate their search algorithms are and how well their relevancy rankings perform determines how much unwanted pornography shows up. Searches performed by random users were analyzed to determine how common searches for pornography are and how common "incidental" pornography is.
Methodology
A total of 207 searches were collected from the Magellan search engine
using Megellan Voyeur (4), a feature of the Magellan
site that displays random searches currently being performed. Aptly named, it
allows the viewer to follow any of the displayed searches. No identification of
the original searcher can be made. The displayed searches can also be copied and
saved, as was done for this survey.
Though the searches displayed are random, the searches saved were not random. Data was collected on weekday nights between 7:00 p.m. and 10:00 p.m. (CST) and on Saturdays between 5:00 p.m. and 6:00 p.m. This was done to minimize searches from classrooms and businesses and maximize searches by individuals. Only English language searches were included in the survey, since foreign language searches were difficult to evaluate for intent. Magellan is not heavily advertised and does not command a large market share. Users of the Magellan site have sought out a well regarded but not well known tool. While the searchers in the study sample can not be identified, the time and language constraints make it likely that they are from either the United States or Canada, are not at work or school and may be slightly more sophisticated than average.
The study re-entered the searches in the sample in Hotbot and the top ten responses checked for pornography. This study used Hotbot to search for pornography since it reflects the Internet's overall content better than does the smaller and cleaner world of Magellan. A site was considered pornographic only if it declared itself unsuitable for minors through use of Adult Check, requiring a credit card or some other statement. Several sexually related or fetish sites turned up, but were not counted, as they did not meet the above criteria.
Magellan (5) is similar to the better known Yahoo in that it is a subject directory. It selects sites and arranges them into subject categories. It is those selected sites and categories that are searched, as opposed to search engines such as Alta Vista or Hotbot, which try to cover wider portions of the Internet. Since it is selective, it indexes less pornography than a general tool such as Hotbot and even includes a "green light" feature, which searches only sites with no content intended for mature audiences.
Hotbot (6) is full featured tool and one of the most popular sites on the web. It covers a relatively large percentage of the web and has fewer invalid links that most search engines (7). Unlike Magellan, it does not select sites for inclusion, but relies on web bots and spiders to identify and index sites.
Searches for Pornography
Only seven searches appeared to be seeking pornography (8). These searches also seemed to be very successful, with at least eight of the top ten returns meeting the study's definition of pornography. Identifying searches for pornography is a subjective task since the searchers intent is unknowable. Nevertheless, searches such as +porncast, free downblouse and preteenporn must be expected to retrieve pornography.
Incidental Pornography
The 200 non-pornographic searches were examined for pornographic sites in the
first ten results. The Hotbot default is to display the ten items it considers
most relevant on the first screen. Six (9) of these searches
(3%) turned up one to three pornography hits in their top ten. These searches
did not appear to be seeking pornography, but found it anyway. In the 200
searches in the sample, a total of 1,634 hits were examined. Only ten,
a tiny .61%, of these hits were incidental pornography.
Credit must be given to the programmers at Hotbot and other search tools. It is not uncommon for a pornography site to have thousands of words or even entire dictionaries in their keyword fields. The algorithms used are now sophisticated enough to prevent unintended pornography from showing up in all but a small percentage of searches, despite the best efforts of some sites. While search tools are usually able to deal with deceptive keywords, this will require a continued effort on their part, since no rules to prevent deceptive sites seem possible or effective. In the free wheeling, supranational and market driven world of the Internet, deceptive advertising and marketing are almost beyond policing.
This survey provides some data for the ongoing debate. Searches for
pornography on the Internet make up only a small percentage of all
searches, but their absolute number is very large. While pornography is not
common, it will be encountered on a regular, but infrequent, basis by those who
use search engines. When retrieved, pornography is easily identified, which
makes it simple to avoid while simultaneously attracting attention to it.
Searches for Pornography | Searches that found "incidental" pornography | Searches that did not seek or find pornography | |
Searches (n=207) | 3.4% | 2.9% | 93.7% |
Table 1 - Searches and Pornography
Sought Pornography | "Incidental" pornography | Non-pornography | |
All results (n=1704) | 3.8% | .58% | 95.6% |
Results from non-pornographic searches (n=1634) | .61% | 99.39% |
Table 2 - Search Results and Pornography
References
1. Steve Lawrence and C. Lee Giles, "Accessibility
of information on the web" Nature v. 400 (July 1999): 107-109.
2. Ibid.
3. Ibid.
4. The Magellan Voyeur site is located at http://voyeur.mckinley.com/cgi-bin/vogeur.cgi
5. The Magellan subject directory is located at http://magellan.excite.com
6. Hotbot is located at http://www.hotbot.com
7. Steve Lawrence and C. Lee Giles, "Accessibility
of information on the web" Nature v. 400 (July 1999): 107-109.
8. The searches were: +cyper +sleaze; sable photos;
skirt+thumb; +porncast; free downblouse; preteenporn; pregnant jpeg.
9. The searches were: "spring break"; amanda
robbins, N.A.T.O.; www.aaaadv.com; +free +gif.'s; backdoor innocent angels.
Meet the One Librarian or Return to One Librarian's Opinion