PublicationThe Demographics of Web Search
Authors:Weber, I.; Castillo, C.
Source: SIGIR, ACM Press, Geneva, Switzerland (2010)
Abstract:How does the web search behavior of ``rich'' and ``poor'' people differ? Do men and women tend to click on different results for the same query? What are some queries almost exclusively issued by African Americans? These are some of the questions we address in this study. Our research combines three data sources: the query log of a major US-based web search engine, profile information provided by 28 million of its users (birth year, gender and zip code), and US-census information including detailed demographic information aggregated at the level of ZIP code. Through this combination we can annotate each query with, e.g., the average per-capita income in the ZIP code it originated from. Though conceptually simple, this combination immediately creates a powerful demographic profiling tool. The main contributions of this work are the following. First, we provide a demographic description of a large sample of search engine users in the US and show that it agrees well with the distribution of the US population. Second, we describe how different segments of the population differ in their search behavior, e.g. with respect to the diversity of formulated queries or with respect to the clicked URLs. Third, we explore applications of our methodology to improve web search and, in particular, to help issuing query reformulations. These results enable the creation of a powerful tool for improved user modeling in practice, with many applications including improving web search and advertising. For instance, advertisements for ``family vacations'' could be adapted to the (expected) income of the person issuing the query, or search suggestions shown to users could be adapted to items that are more interesting given their particular characteristics.
ACM COPYRIGHT NOTICE. Copyright © 2010 by the Association for Computing Machinery, Inc. Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers, or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from Publications Dept., ACM, Inc., fax +1 (212) 869-0481, or permissions@acm.org