29 M10U: Consequences of Failed Predictions

This chapter draws on material from:

Changes to the source material include removal of original material, the addition of new material, combining of sources, and editing of original material for a different audience.

The resulting content is licensed under CC BY-NC-SA 4.0.

29.1 Introduction

In the early months of the COVID-19 pandemic, many people found social media to be more important than ever before. It kept us informed and connected during an unprecedented moment in time. People used major social media platforms for all kinds of things, from following and posting news, to organizing aid—such as coordinating the donations of masks across international boundaries—to sharing tips on working from home to, of course, pure entertainment.

At the same time, the content moderation challenges faced by social media platforms didn’t disappear—and in some cases were exacerbated by the pandemic. Though the benefits of social media platforms may be obvious, and even seem utopian at times, the perils are also painfully apparent, more so every day: the pornographic, the obscene, the violent, the illegal, the abusive, and the hateful. So, social media platforms must, in some form or another, moderate—that is, take down, otherwise penalize, or at least flag certain kinds of content. Content moderation is important: to protect one user from another, to shield one group from its antagonists, and to remove the offensive, vile, or illegal. From a more self-serving perspective, platforms also want to present their best face to new users, to their advertisers and partners, and to the public at large.

In the early weeks of the pandemic, YouTube, Twitter, and Facebook all made public statements about their moderation strategies at that time. While they differed in details, they all had one key element in common: the increased reliance on automated tools. These companies described their increased reliance on automation in terms of ensuring the wellbeing of their content moderation teams and the privacy of their users. This is hard to argue with: Most social media companies rely on workers from the Global South to review flagged content, usually under precarious working conditions and without adequate protections from the traumatic effects of their work. While the goal to protect workers from being exposed to these dangers while working from home is certainly legitimate, automated content moderation still poses a number of problems.

29.2 Regression and Automated Content Moderation

Why read about content moderation in the context of regression? Well, we’re just skimming the surface of regression in this class. Regression is a whole family of statistical techniques, ranging from the relatively simple to the overwhelmingly complex. I’ve heard data scientists joke that fancy terms like machine learning and artificial intelligence are just fancy labels for what is actually just a regression analysis, and there’s some truth to that! While it’s hard to say what the details are of a particular platform’s automated content moderation process are (more on this later!), regression is going to be involved somewhere.

As we read about last week, one of the applications of regression (especially in machine learning) is in prediction. While we aren’t using regression in this way in this course, it’s important for us to recognize that many of the technologies we use every day are applying regression to make predictions. In particular, automated content moderation is about prediction—can we get the computer to predict whether or not an image or some text is the kind of thing that we ought to remove from a platform? There’s no denying that that can be helpful! What happens, though, if we get the prediction wrong?

29.3 The Problem with Automated Content Moderation

Automated content moderation doesn’t work at scale; it can’t read nuance in speech the way humans can, and for some languages it barely works at all. Over the years, we’ve seen the use of automation result in numerous wrongful takedowns. In short: automation is not a sufficient replacement for having a human in the loop.

And that’s a problem, especially since the COVID-19 pandemic has increased our reliance on online platforms to speak, educate and learn. Conferences are moving online, schools are relying on online platforms, and individuals are tuning in to videos to learn everything from yoga to gardening. Likewise, platforms continue to provide space for vital information, be it messages from governments to people, or documentation of human rights violations.

It’s important to give credit where credit is due. In their announcements, YouTube and Twitter both acknowledged the shortcomings of artificial intelligence, and are taking that into account as they moderate content. YouTube decided not to issue strikes on video content except in cases where they have “high confidence” that it violates their rules, and Twitter announced it would only be issuing temporary suspensions—not permanent bans. For its part, Facebook acknowledged that it would be relying on full-time employees to moderate certain types of content, such as terrorism.

These temporary measures helped mitigate the inevitable over-moderation that follows from the use of automated tools. However, history suggests that protocols adopted in times of crisis often persist when the crisis is over, and we’ve seen that happen as the pandemic has become less severe. Just because many COVID-19 restrictions have been eased doesn’t mean that automated content moderation has gone away. In fact, it didn’t begin with COVID-19! Automated moderation is cheaper and faster than human moderation—which is a soul-sucking, punishing task—so it’s always going to be used to a certain extent in efforts to reduce costs and trauma.

Again, those are good motivations, so we shouldn’t rule out the value of data science-driven automated content moderation. However, The Santa Clara Principles for content moderation—authored by the Electronic Frontier Foundation (EFF) and other organizations—begin with a statement on how and when it should be used:

Companies should ensure that human rights and due process considerations are integrated at all stages of the content moderation process, and should publish information outlining how this integration is made. Companies should only use automated processes to identify or remove content or suspend accounts, whether supplemented by human review or not, when there is sufficiently high confidence in the quality and accuracy of those processes. Companies should also provide users with clear and accessible methods of obtaining support in the event of content and account actions.

Companies like Apple, Meta, Google, Reddit, and Twitter have all endorsed these principles, but that’s no guarantee that they’ll actually follow through. In fact, Twitter’s approach to content moderation (not to mention its name) has changed drastically in recent months—and in a way that demonstrates the shortcomings of automated approaches. All social media users ought to be interested in how companies rely on automated content moderation.

29.4 An Example from Instagram

In October 2020, the EFF argued that a then-recent Facebook transparency report] showed that the company’s increased use of automated content moderation during the COVID-19 pandemic lacked in both human oversight and options for appeals. Those gaps increase the power of automated moderation techniques—and make the consequences of failed predictions more severe. Let’s walk through this argument in some more detail:

When content is removed on Instagram, people typically have the option to contest takedown decisions. In short, there’s a recognition that an automated moderation system might make a mistake, so users can make a case for that. Typically, when the appeals process is initiated, the deleted material is reviewed by a human moderator and the takedown decision can get reversed and content reinstated. During the increased use of automated moderation, however, that option was seriously limited, with users receiving notification that their appeal may not be considered. According to the transparency report, there were zero appeals on Instagram during the second quarter of 2020—this doesn’t mean that Instagram’s automated content moderation didn’t make mistakes, just that it didn’t fix any of them.

While Instagram also occasionally restores content on its own accord, user appeals usually trigger the vast majority of content that gets reinstated. So, heavier reliance on automated moderation means fewer human-reviewed appeals, which means more mistakes that can’t get corrected. An example: in Q2 of 2020, more than 380,000 posts that allegedly contained terrorist content were removed from Instagram, fewer than the 440,000 in Q1 (when Instagram wasn’t so reliant on automation because COVID-19 mostly hadn’t picked up yet). While around 8,100 takedowns were appealed by users in Q1, that number plummeted to zero in Q2. Looking at the number of posts restored, the impact of the lack of user appeals becomes apparent: During the first quarter, 500 pieces of content were restored after an appeal from a user, compared to the 190 posts that were reinstated without an appeal. In Q2, with no appeal system available to users, merely 70 posts of the several hundred thousand posts that allegedly contained terrorist content were restored, and more people had to deal with the consequences of a failed prediction.

To be clear: Instagram should be taking down terrorist content from their platforms! That’s a good thing, and an example of why everyone should endorse content moderation, even if people disagree over the details. However, evidence of human rights violations and war crimes often gets caught in the net of automated content moderation as algorithms have a hard time differentiating between actual “terrorist” content and efforts to record and archive violent events. In a similar example, YouTube took down a video in June 2022 that had been posted by the January 6th House Select Committee. The rationale behind the takedown was that the video contained election misinformation. It did contain election misinformation, but for the purposes of documenting and criticizing that misinformation, not as an endorsement of that misinformation! It’s not clear whether this was the result of automated or human moderation, but if human content moderators make mistakes, automated moderators are even less trustworthy.

Why does this matter? First of all, because this negative impact of automated content detection is disproportionately borne by Muslim and Arab communities. Second, because even though Instagram and Facebook are operated by the same company, there were inconsistencies in how the two platforms moderated content. The report, in question, which listed data for the last two quarters of 2019 and the first two of 2020, did not consistently report the data on the same categories across the two platforms. Similarly, the granularity of data reported for various categories of content differed depending on platform. These may seem like small details, but they’re pretty big inconsistencies given that both platforms are owned by the same company. These differences in how one company enforces its rules across two platforms further highlights how difficult it is to deal with the problem of violent content through automated content moderation alone.

29.5 Conclusion

As machine learning and artificial intelligence continue to develop, people and organizations are increasingly use them (and their underlying regression models) to make predictions about the world. It’s impressive how many predictions these tools get right, but the more dependent we become on these tools, the greater the consequences of a failed prediction. It’s important for us as data scientists—and perhaps even more as citizens—to be aware of the shortcomings of these techniques!