A new project involving the University of Chicago Data Science Institute (DSI), Princeton University and SRI International plans to build new AI and data science tools that can monitor and detect internet censorship with greater accuracy.
This project was granted $1 million by the U.S. Defence Advanced Research Project Agency (DARPA). The researchers involved will build the tools to monitor and detect internet censorship, develop new statistical techniques to identify censorship with greater levels of confidence.
The end goal is to increase the level of confidence about the content that is being censored as well as when and where its being censored. Past works on measuring Internet censorship has developed tools that can identify possible instances of censorship, but the measurements have a some amounts of uncertainty. So there is a need to develop better statistical techniques. At the same time, the team hopes to derive and provide real time insight from the data, which is a great data science challenge as the scale of the internet is so massive.
How censorship is done?
Through inserting firewalls, middleboxes and other intermediary devices on the path in and out of a location, people are able to interfere with traffic in a number of ways. The have the ability to block websites or pages like news site, online forums or social media platforms.
Because internet slowdowns and failures happen even during normal operations, identifying censorship from “normal” outages is a difficult task. However, the huge amounts of data that moves through the global network of the internet contains subtle clues that interference is happening. There is hope that AI can detect the differences between that the clues provide, which supposedly makes intentional slowdowns look different from “normal” ones.
The team intends to train the new AI models and apply novel data science techniques to detect the ‘fingerprints” of these devices, giving internet watchdogs, policymakers and citizen groups the ability to observe when and where censorship is happening.
The research will be built off Nick Feamster’s, Faculty Director of Research at the Data Science Institute, previously developed tools that utilize network traffic to better measure the speed and performance of home internet connections. His Net Microscope tool was used in a Wall Street Journal investigation of weather more expensive internet plans truly provide better streaming speeds for apps like Netflix and Zoom. Through the Internet Access & Equity Initiative, a project of the Data Science Institute funded by data.org, Feamster leads a team using similar tools to measure access to high-speed broadband in different areas of Chicago.
Though the motivation is different, some of the same methods apply in sifting through network traffic for signs of censorship.
Feamster said detecting Internet censorship at scale is a challenging Internet measurement problem, similar to those dealing with network performance measurement and diagnosis. Raw data is similar in both cases, as they are watching for unusual performance or behavior of the path or endpoint. Performance measurement entails identifying problems caused by benign causes, such as misconfiguration or underprovisioning. As with other types of anomalies, censorship detection involves looking for intentional disruptions and degradations.
How the project is done?
The project will start by training models using data gathered by SRI, as well as other existing projects that have gathered large amounts of data. The initial phases of the project will involve training models to detect anomalous activity by scanning through terabytes of data containing both normal and censored Internet traffic. The researchers will also conduct laboratory studies with firewall and middlebox devices to fingerprint these devices and also to generate training data to help discover the presence of these devices across the Internet.
At the end of the project, the researchers plan to deliver new monitoring tools and dashboards to consumers of the data—who might be diplomats, policymakers, or even regular citizens. Feamster envisions a real-time “weather map” for censorship, where observers can almost immediately see Internet interference as it is happening, in which countries, and even what sites or content governments are manipulating. The information could have a range of applications, from informing citizens, diplomats, and policymakers about the existence of censorship, to inspiring the designs of the next generation of tools to circumvent these forms of information controls.
According to Feamster, the research aims to discover Internet censorship at scale, in real-time, in order to empower citizens to live in free and open societies, promote constructive discourse, and ultimately allow society to flourish. There is a quintessential problem of data science here-computer scientists have developed specific Internet measurement techniques as building blocks, but those techniques also still produce messy data, with a lot of uncertainty. The ability to leverage data science will allow them to take these fundamental approaches to deriving insights with more confidence than today.