Inspiration:

Today, quantitative efforts to understand the information environment almost all rely on one or more of the following:

  1. Bespoke or short-term data collection isolated within an institution;
  2. Platform-provided data via dashboards, researcher APIs, or downloadable data sets (all of which are brittle and can be unreliable); or
  3. Commercial social listening services with opaque data collection and inclusion criteria and limited platform coverage.

Mission Statement:

The mission of the Accelerator is to power policy-relevant research by building shared infrastructure. 

Through a combination of data collection, analysis, tool development, and engagement, the Accelerator will support the international community working to understand today’s information environment – the space where cognition, technology, and content converge.

We will:

  • Maintain data and technical resources which support a wide range of analyses;
  • Speed academic research by building accessible data pipelines;
  • Advance policy debates around managing online commons;
  • Develop international multi-sectoral capacity for evidence-based stewardship of the information environment; and 
  • Improve policy-making by enabling a richer understanding of the information environment and its impact on society.”

Our place in the community:

Throughout scoping the Accelerator we have benefited from extensive consultations across the community of scholars who share the common objective of enabling evidence-based policymaking on the information environment. The Accelerator will support existing and developing efforts while fulfilling a unique need.

The Accelerator is an international, multi-institutional consortium with an administrative and incubational home at Princeton as a special initiative within the School of Public and International Affairs.

Joint Scoping Effort

A joint effort between Princeton University’s Empirical Study of Conflicts Project and the Carnegie Endowment for International Peace’s Partnership for Countering Influence Operations sought to evaluate the need for large-scale research infrastructure and the feasibility of overcoming critical barriers. Over the course of a year the team conducted interviews and meetings with more than 240 researchers and commissioned 13 exploratory studies with 20 partners from 17 institutions.

These studies:

  • examined the research process to identify the kinds of infrastructure that could speed discovery;
  • reviewed the design space on research administration and funding models; and
  • analyzed how analogous institutions handle privacy and ethical considerations.

Collectively the studies provide a rich evidence base for understanding how to best move forward.

Overall, most papers study single social media platforms, focus on text, and examine the US or the EU:

We analyzed 3,923 academic papers published on the information environment from 2017-2021 in the top ten journals by impact factor five academic fields (Communications, Computer Science, Economics, Political Science, and Sociology), plus the top six general interest science journals.  Out of the total, only 169 utilized social media data. We found that:

  • Twitter was the most studied platform at 59%, followed by Facebook at 26% and Reddit at 7%. 46% of the papers solely used Twitter data.
  • 65% of papers analyzed a Western democracy (US, EU countries, the UK, Australia, or New Zealand), 35% of papers analyzed exclusively users/posts from the United States, and 60% of papers exclusively used English-language data.
  • Only 12% of the papers investigated information flow across different platforms.
  • Only 13% of the papers scrutinized images or videos, while 43% examined text. The remainder of the papers analyzed either direct interactions with posts (e.g., reactions and comments), indirect interactions with posts (e.g., shares), post metadata, or content moderation.
  • 53% used simple econometric methods (e.g., multiple regression), and only 23% used machine learning (ML). Of papers using ML, half used supervised ML algorithms. The remainder used descriptive or qualitative analysis.

Interviews and meetings with researchers revealed three main reasons for these shortfalls in research in the field:

  1. Data access is either expensive or difficult to obtain; 
  2. Unlike natural language processing methods, analyzing images, videos, and cross-platform data requires multidisciplinary skills/techniques that are not widespread; and 
  3. Recruiting and retaining skilled personnel is difficult due to the academic hiring structures and competition for such individuals. Data engineers and scientists can be especially expensive, as there is intense competition for these professionals. Additionally, researchers often rely on Ph.D. students or post-doctoral fellows to build pipelines and conduct analyses, but these positions are temporary and qualified candidates are also in high demand.