Science Watch

Researchers have estimated the world's data storage capacity at 295 exabytes—enough information to fill a pile of CDs that would stretch beyond the moon. That vast pile of information is only getting vaster: It increases by a factor of 10 every five years, according to researchers.

Searching through the enormous amount of information that's available online can be daunting. Looking for the nugget of knowledge you need amid thousands of irrelevant websites can make person feel like a carnivore stalking its game in a crowded, distracting forest.

To Peter Pirolli, PhD, that analogy is more than just a metaphor. Pirolli, a cognitive psychologist and expert on human-computer interaction, has spent the past two decades studying the strategies people use to navigate and gather information online—and many times, he's taken his inspiration from ecology.

In the 1990s and early 2000s, he and his colleagues at Xerox's Palo Alto Research Center (PARC) found that the same mathematical models that describe how animals seek out food can also predict how people click their way through the Web in search of information.

More recently, Pirolli and others have turned their attention to Web 2.0—platforms like Twitter, Facebook, Del.icio.us, Wikipedia and others that encourage people to collaborate and share information. Their goal now is to figure out how users navigate through this new, socially networked space.

They hope that by doing so, they'll pave the way for websites that allow people to find, share and synthesize knowledge more effectively—a goal that grows more relevant each year, as people begin to use social networks not just for leisure but also to share health-care information online, manage work systems, and complete other complex tasks. And once again, the researchers are finding clues in the natural world that can help explain how the socially networked world works.

Rise of the ‘informavore'

Pirolli and his colleague Stuart Card, PhD, first began developing their information foraging theory in the early 1990s, the dawn of the Internet age. They found their inspiration in biological foraging theory, which biologists and ecologists had developed decades earlier to help explain how animals decide when and where to look for food.

Simply put, foraging theory says that animals try to maximize the amount of food they take in during a given amount of time. When, for example, a bird finds an abundant patch of seeds, it begins to eat as much as it can in that area. Eventually it will deplete the seeds there and will need to move on, but doing so is risky since it will have to spend time and energy searching through scarce patches to find another rich one. Foraging theory says that the bird will move on when the gains it can expect from leaving outweigh the gains it can expect from staying. And ecologists, over the years, developed mathematical models that could predict, based on specific characteristics of the animal and its environment, when it would reach that moving-on point.

Pirolli's insight was to realize that people's information landscape is, like the natural world, a patchy place. Modern humans are "informavores"—we gobble up information the way animals gobble up food. On the Internet, some websites are rich with the information morsels that we're seeking, and others are not. So models like those that predict when a bird will move to new hunting grounds can also predict when a person sitting at his or her computer will give up on one Web page and move to another.

Key to those models is Pirolli's concept of "information scent." Animals follow a physical scent to look for food. Humans, similarly, follow an online scent trail that is made up of key words, images and other cues. When those words and images are closely related to the information we're seeking, the scent is strong and we continue on that path. More distantly related words indicate we're on less fruitful foraging ground.

These might seem like abstract concepts, but Pirolli and Wai-Tat Fu, PhD, a cognitive psychologist at the University of Illinois at Urbana–Champaign, have used them to develop a computer program that can simulate the way human users are likely to navigate a website. In a 2007 paper published in the journal Human-Computer Interaction, they showed that a computer model called SNIF-ACT could predict fairly accurately how users would navigate through two sample websites: Yahoo's help section and the PARC internal staff website.

The researchers looked at data from 74 human participants, each of whom was given eight pieces of information to track down on one of the two sites—queries including "Where can you download an expense report on the PARC website?" or "What is the playing season for fantasy football?" on Yahoo. Then the researchers set the computer model loose to find the same information.

They found that the computer model's path through the website mirrored that of the human participants. For example, the researchers looked at the number of times participants chose to give up on a particular Web page and go back to the previous one. They found that the model could explain 73 percent of the variation in human users' choices on the PARC website and 80 percent on Yahoo.

Using such models, Fu says, could help Web designers come up with more user-friendly site designs.

"We can give the model some sample tasks and see how well it is able to find information," he says. "And then use that as a metric to decide whether a site is well designed. If the model has trouble finding information, we believe that a real human will, too."

The social Web

Nowadays, though, many Web users are less interested in simply seeking information on sites that were produced by one authoritative source. Instead, they spend their time joining in conversations and sharing their own thoughts on Facebook and Twitter, tagging content they think other people might find interesting on Digg or Del.icio.us, or adding their knowledge to the worldwide collaborative encyclopedia Wikipedia.

So, Pirolli and others are working to extend their information foraging model to the world of online social networks. In some ways, he says, his work has actually come full circle. When he first began thinking about how people seek information, he began in the context of social networks in the physical, not virtual, world. In one early study on information seeking, published in Psychological Review in 1999, he and Card examined the work of analysts at a company that published technical newsletters. Each analyst was responsible for combing through about 50 magazines and journals each month, and consolidating the relevant information into a concise newsletter on market trends in the very specific market subfield to which he or she was assigned. But as it turns out, the analysts also made a point of passing on relevant articles to their colleagues who covered other, related subfields but who they knew didn't receive that particular magazine.

"That started to drive me early on into thinking about information foraging from a network perspective—what happens when people are connected to one another either through online networks or the real world," Pirolli says.

What often happens is that people can produce, learn and synthesize more useful knowledge in groups than any one of them could alone—and that has an analogy in the natural world. Birds, lions and other animals live and forage in groups. They do so because the advantages of sharing information about a patch of foraging land can sometimes outweigh the disadvantages of having to share your food with others.

"Group foraging can have a variety of advantages, one of which is that it can provide much more rapid development of precise information about an environment," says Marc Mangel, PhD, a theoretical ecologist at the University of California–Santa Cruz, who has spent two decades studying mathematical models of wildlife populations.

But, Mangel notes, animals' groups won't grow without limits because at a certain point the disadvantages of sharing food will, to an individual member, outweigh the advantages of sharing information. Instead groups will expand until they reach an optimal size, and then level off.

Pirolli and Mangel, working with colleagues at PARC and UC–Santa Cruz, have recently been exploring an analogue of this behavior in the online user-generated encyclopedia Wikipedia. Wikipedia began in 2001, and until about 2007 the site grew exponentially, as measured by the number of people who contributed to it by writing and editing entries, and by the number of total edits per month. But around 2007, the growth began to slow down and level off.

In not-yet-published work, Mangel is looking at a series of population models that ecologists have developed to describe how wildlife populations grow, fluctuate and stabilize over time. He's studying how well these models fit observed data about the growth of Wikipedia.

"In some sense, the rate of growth of any social network is a balance between new individuals coming in and individuals leaving for all sorts of reasons," Mangel says. "In biology, we'd say the rate of growth of a population is the balance between births and deaths. We don't exactly have births and deaths here, but we do have new editors who are attracted for some reason, and then we have a variety of processes that could cause people to leave."

Figuring out which models apply, and which processes are most important, could suggest new ways to get more people to participate in the site, Mangel says.

Complicated networks

As Pirolli has turned his attention to the social Web, his research interests have also moved beyond purely ecological models of human behavior to a more general interest in how information travels through social networks.

"Like a lot of analogies, [foraging theory] gives you a lot of initial insight … but at some point the domain itself starts to shape your theorizing," he says.

And the domain of the social Web introduces some complications to theories of how people look for information online.

"When you go into the social realm, now you have a whole set of other judgments you have to make," Pirolli says. "It's more important to worry about credibility. There are multiple people giving you information, and you have to be in a position to judge … who they are, what are their biases, can I trust them?"

In one study, published in the proceedings of the 2011 IEEE International Conference on Social Computing, Pirolli and his colleagues examined how Twitter users decide which sources of information to trust. The researchers asked 98 participants to follow tweets from 60 different Twitter accounts. Of the 60 accounts, 10 each demonstrated expertise in cars, investing, wine, fantasy football or dating. The other 10 accounts weren't experts in any particular area. Also, half of the accounts were "high-status"—they had at least 10,000 followers on Twitter—while the other half were "low-status," with fewer than 200 followers.

Next, the researchers gave the participants information about a used car and asked them to judge its value. Then, they told the participants that one of the Twitter users had appraised it at a particular price. Finally, they asked the participants to estimate the car's value again. The difference between the initial and final estimate could be used as a measure of the participant's confidence in the credibility of the Twitter account.

The researchers found that credibility ratings were influenced by both the content of the users' tweets and their social status in the network.

Finally, Pirolli and his colleagues used their data to develop a model that could automatically rank the credibility of a Twitter user on a particular subject, using a combination of key words in his or her tweets, the number of followers he or she had, and whether those followers were also interested in the relevant subject.

Such a model could be used to help people narrow down their sources of information in a social network and find the most useful ones, according to Pirolli.

"One of the objectives we assume people have is that they want to get the best information possible in the shortest amount of time," he says.

Helping people do just that, Pirolli believes, is where psychologists have more to contribute to the field of information systems than many realize.

"More and more often, these technologies are being appropriated for things like health-care systems and work productivity systems. This is where I think psychologists need to have more impact. Otherwise it's going to be built only by engineers. It's the classic problem with software—people will build it without regard to how people actually use it."