Barely noticeable on the floor of Zachary Steinert-Threlkeld’s sixth-floor office sits a black box, about the size of a compact suitcase, tucked under a small conference table next to his stand-up desk. That box, a computer actually, is the core of his research and is quietly collecting 5 million tweets each day, about 1 percent of the world’s daily output.
Steinert-Threlkeld, an assistant professor of public policy at UCLA, uses big data, primarily from Twitter, to understand protest dynamics. He uses social media to explore the relationship between online behavior and real-world action offline.
“Originally, I started with how one’s social network – not Twitter, not Facebook, but actually your friends – influences your decision to protest,” he said. “Since then, I’ve done work on natural-language processing, and I’m starting to use images that people share on social media to understand protest dynamics.”
The study of political protests is not new. But through his computational research, Steinert-Threlkeld has found that observing social media dynamics can provide a higher level of understanding than traditional research methods, such as surveys that rely on people to explain how they feel and act.
The Twitter data Steinert-Threlkeld is collecting can be used for many purposes, but his immediate focus is on two trailblazing projects. The first involves the creation of a giant media database that will pull together multiple data sources: social media data from Twitter and from the Chinese social media site Sina Weibo, radio broadcasts and newspaper reports, as well as local, national and international television newscasts – some going back to the 1970s. In a second research project, Steinert-Threlkeld also is collecting and studying images delivered through social media to better understand protest mobilization. Both projects hope to help illuminate the genesis and growth of political ideas.
Steinert-Threlkeld, along with his research partner Junseock Joo, an assistant professor of communication studies at UCLA and the principal investigator of the database project, and communication professors Tim Groeling (also profiled in this issue of Blueprint) and Francis Steen, received a $944,182 grant late last year from the National Science Foundation to merge into a single place the text, images, audio and video along with data from China, collected by Jennifer Tan of Stanford, and other data from research partners around the world. Having this “multimodal” information in one dataset will allow researchers to examine how different events are portrayed and communicated across time and platforms.
What they are doing is unique, Joo said, because the multimodal, international project involves integrating several forms of communication unique in academia.“Usually the collection process maintained by academics focuses on a specific type of media. For instance, researchers focus on social media exclusively but not on the other media types, the traditional mass media,” he said, emphasizing that previous research does not always reflect on how people communicate in the real world. “The news and information flow isn’t just locked within one system; it goes out of the scope of one media type and interacts with other types of media.”
Steinert-Threlkeld opted to focus on Twitter rather than Facebook or other social media platforms because most people on Twitter keep their accounts public, making it a more accessible source for researchers. “I can probably get richer data on Facebook from the public accounts, it’s just that very few are public,” he said, adding that he would have to work through Facebook to get data from private accounts. “I felt that using Facebook to study protests could scare Facebook… and at any point they could pull the plug.”
About one-half to one-third of the tweets Steinert-Threlkeld collects contain GPS coordinates, and of that number, about 10 percent contain images. Steinert-Threlkeld and Joo are training machine-learning algorithms that identify patterns and categorize these images, allowing them to analyze protesters by gender, race and age.
Through their image study, which measures protests in South Korea, Hong Kong, Venezuela, Russia, Spain and the Women’s March in the U.S., they hope to understand protest dynamics at a level of detail not available through newspapers, the traditional data source for social scientists.
“Newspaper-based datasets will say, ‘authorities reported 200,000’ or ‘estimates ranged from hundreds to thousands.’ Sometimes they’ll give specific numbers, but it’s all third-hand at that point because the newspaper is reporting from an authority figure; the newspaper doesn’t record it itself,” Steinert-Threlkeld said. “Whereas we actually take protest photos and count the number of faces. It’s a direct measure.”
In addition, they hope to gain deeper insight into the relationship between the demographics of protesters and bystanders and their decisions to join in an event. In other words, once this data is in hand, researchers will be able to tell more about who takes to the streets, how many gather and what drives them to violence – all of which is information that has intrigued political scientists and historians for eons.
“There have been many people who are doing similar work on text data but not with images, Joo said. “This is on the frontier – it’s cutting-edge work.”
For someone who spends most of his time immersed in Twitter data, Steinert-Threlkeld is only a casual user of the social networking service created in 2006.
“I started around 2008. Twitter was new and exciting then,” he said. “I was 22, and it was a cool new thing. A lot of my early posts were like, ‘I’m at this coffee shop’ or, you know, not professional. Social.”
Today his account boasts just over 800 tweets, but the majority now deal with academic topics.
It’s not hard to believe Steinert-Threlkeld, then, when he says he fell into this area of research by happenstance. “Honestly, I thought I was going to study Turkish political development,” he said. “I was really interested in Turkey, but UC San Diego was the only school I applied to that didn’t offer a Turkish language, and that’s where I ended up. So that was out the window.”
With his mop of curly hair, Harry Potter-style glasses and relaxed attire, Steinert-Threlkeld could easily blend in with the students on campus. “Oh, that’s because I shaved last night,” he said, laughing. “I have to teach today.”
Thoughtful and earnest, the married 32-year-old professor often pauses, considering his words before speaking, like a statesman dealing with the media. Raised by a mother and father who started their careers in journalism, he grew up in Texas before moving to Connecticut as a teen with his parents and younger brother.
As an undergraduate, he studied anthropology and economics at Washington University in St. Louis, then worked for two years in Minneapolis as a systems integration analyst at the management consulting firm Accenture.
He dabbled a bit with computers and computer programming but didn’t start working with data collection in earnest until he went to graduate school for his PhD. It was there he met James Fowler, a political science professor and Guggenheim fellow whom many consider one of the top experts in social networks.
“He had weekly meetings with students, a seminar-type thing,” Steinert-Threlkeld said. “It was really grad students and sometimes guest professors presenting their own work.”
Then he picked up “The Information: A History, A Theory, A Flood,” the James Gleick bestseller that examines the history of information and how it has shaped the world. “I was reading the book in 2013 and starting to think about dissertation ideas, and I realized that I should use big data,” he said. “If every generation grows in the amount of data it deals with, I should work on being on that frontier.”
He eventually settled on the 2010 Arab Spring, which fortuitously ended just before he began his dissertation research. Steinert-Threlkeld was intrigued by how protesters during the pro-democracy uprisings in the Middle East and North Africa were using social media. Using text data collected from 13.8 million tweets, filtered by hashtags and geolocators, he was able to determine patterns in crowd behavior.
Surprisingly, he found that participants on the periphery of a protest often had more of an impact than organizers. “People are more likely to protest when they learn about an event from people who are like them, such as their friends,” he said. That’s particularly true in authoritarian settings because authoritarian governments are notoriously intolerant of antigovernmental organizations.
By contrast, in a more open society, “you’re more likely to see organizations and leaders mattering because they’re allowed to exist in the first place. There’s less fear of repression, so you’re less reliant in the safety in numbers you get from talking to your friends,” he said. “You’re more likely to go alone in a democracy, so you’re more likely to listen to the central leader or organizer than in an autocracy.”
In the brief time since he completed his dissertation, the impact of social media on protests has changed. That means, he said, that in an era of “fake news,” bots and trolls, researchers need to think carefully about collecting unbiased data through social media, while also respecting individual privacy laws.
“During the Arab Spring, there’s a lot of good evidence that the leaders didn’t pay attention to social media or thought of it as a small thing,” said Steinert-Threlkeld, who also studied the 2013-14 Euromaidan uprising in the Ukraine. “Today, governments realize that social media is a politically important space and are more likely to act repressively.”
Steinert-Threlkeld stands at his desk, demonstrating how the computer recognizes images pulled from Twitter and automatically files them by category. He clicks on a folder, looking for fire. “These are pretty good,” he said, peering at thumbnails. “It recognizes torches, it gets candles. … There are many variables at play that I want to start looking at that I haven’t done in the past. But I’m hoping in the next year or two I can start.”
There’s much more he wants to do. For example, he’d like to explore what makes people decide to participate in protests in the U.S. Are dissenters more liberal? Better educated? Are they concentrated in certain parts of the country? “If you live in a precinct where many people voted for Hillary Clinton, maybe you’re more likely to go protest. Or education level might matter more because you think you can make difference on the political system.”
Restlessly imagining more possibilities, he ticks off a number of other possible uses for the data he’s collecting. Perhaps a deep dive into the duration of protests – why a protest may last longer in California than in Texas. Did that change after the midterms? Maybe bringing in city-level information will raise more questions, he says.
“Yesterday and today, I was trying to figure out how to work with Census data and other datasets,” he said, noting with a laugh that the complexities of his interests “is making me pull out my hair.”
Few social scientists are attempting the type of research he is doing, whether in academia or the private sector. “I guess it’s not as widespread as I thought it would be at this point,” he said, “which either means I’m on to something or on to nothing.”