Subscribe:
Until recently, platforms like X, Facebook, and YouTube have been hands-on in how they moderate content on their platforms. But a shift is in the air: some companies have begun outsourcing content moderation to the users themselves.
But does this kind of approach work?
“It sounds like a good idea,” said Sameer Borwankar, assistant professor of Information Systems at McGill University. “But there are many open questions.”
Borwankar recently co-authored two studies that examined the effectiveness of community watch programs on Twitter. He found that, while crowd-sourced moderation can tone down the discourse on the platform, it does little to address other important issues.
Watching the watchers
Borwankar and his co-authors specifically studied the tone and quality of posts by a group of community watchers on Twitter (now known as X), before and after they joined the platform’s Community Notes program. Community Notes is Twitter’s version of crowd-sourced content moderation.
Watchers in this program are individual users who help flag content that appears to violate the platform’s guidelines. They can also add a note explaining why they think the post is problematic. Other users can then vote on whether they think the flag is legitimate. If enough people agree, the watcher’s note will remain attached to the post.
This voting mechanism is supposed to keep watchers accountable, explained Borwankar. The idea is to encourage people from diverse backgrounds and political affiliations to weigh in on a ruling. This should, in theory, prevent bias in which posts are flagged as harmful.
In their first study, Borwankar and his colleagues found that the system showed promise, despite its limitations. Community watchers posted higher-quality, less inflammatory content compared to before they joined the program. These posts also had fewer qualities often associated with misinformation, such as extreme language.
“When people are part of these programs, they are more careful about what they’re writing, which is a good thing overall,” he said.
However, the system had some unintended side effects. The watchers, whose usernames were publicly available for anyone to see, were reluctant to engage with controversial topics, partially out of fear of retaliation from other users.
And this is a problem, said Borwankar. Controversial topics are where inflammatory language and misinformation are more likely to exist. If watchers fear engaging in those conversations, they risk giving some problematic posts a free pass.
To mitigate this, Twitter started allowing its watchers to have two identities on the platform: one for personal use, and another for monitoring and flagging problematic posts. Watchers were now anonymous when performing monitoring duties. The voting system for flagged posts remained the same; once a post was flagged by a community watcher, other users could vote on whether they agreed.
In their second paper, Borwankar and his co-authors found this tactic had a positive effect. Under the protection of anonymity, the watchers chose to participate in more controversial topics while maintaining a neutral tone in their discussions, explained Borwankar.
Where does the buck stop?
Borwankar’s results are promising. They show that community watch programs can tone down discourse around a controversial topic and potentially curtail the spread of misinformation. But Borwankar isn’t ready to call this a slam dunk.
“I don’t see this as a solution in itself,” he said.
For one, not every post that’s flagged gets a vote. If a community watcher identifies a viral post as problematic, more users from various political affiliations will vote on whether it violates community guidelines. This, theoretically, should add balance to the process. But posts with less engagement won’t receive the same treatment.
Community Notes’ voting system is also Twitter’s way of getting user input from diverse political affiliations. But this is easier to do in an English-speaking country like the United States, which has only two political parties, said Borwankar. In a country like India, which has six national parties, political diversity looks very different. It’s unclear how Twitter takes this into account when searching for political balance, said Borwankar.
Twitter, since implementing its Community Notes program, also doesn’t take down content – even if users agree a post is problematic. The platform can always overrule the will of the users.
For Borwankar, this is a significant power imbalance.
“The companies have all the power over what to show,” he said. “If your diverse users are saying ‘this is misinformation,’ it should not be on the platform.”
X (formerly known as Twitter) widely implemented its Community Notes program in 2023. Meta, Facebook’s parent company, is now testing a similar mechanism across its social media platforms, including Instagram and Threads.

Sameer Borwankar
Home
This article was written by Eric Dicaire.
This article was inspired by two research papers by Sameer Borwankar, Jinyang Zheng, and Karthik Kannan titled “Democratization of Misinformation Monitoring: The Impact of Twitter’s Birdwatch Program” and “Unveiling the Impact of Privacy-Preserving Policies in Crowd-based Misinformation Monitoring Program.”