Content Moderation at Scale, 2/2

You Make the Call: Audience Interactive (with a trigger
warning for content requiring moderation)
Emma Llanso, Center for Democracy & Technology & Mike
Masnick, Techdirt
Hypo: “Grand Wizard Smith,” w/user photo of a person in a
KKK hood, posts a notice for the annual adopt-a-highway cleanup project.  TOS bans organized hate groups that advocate
violence.  This post is flagged for
review.  What to do?  Majority wanted takedown, but 12 said leave
it up, 12 flag (leave up w/a content warning), 18 said escalate, and over 40
said take down.  Take down: he’s a member
of the KKK.  Keep up: he’s not a verified
identity; it doesn’t say KKK and requires cultural reference point to know what
the hood means/what a grand master is. 
Escalate: if the moderator can only ban the post, the real problem is
the user/the account, so you may need to escalate to get rid of the account.
Hypo: “glassesguru123” says same sex marriage is great, love
is love, but what do I know, I’m just a f—-t. 
Flagged for hate speech. What to do? 
83 said leave it up.  5 for flag,
2 escalate, 1 take it down.  Comment: In
Germany, you take down some words regardless of content, so it may depend on
what law you’re applying.  Most people
who leave it up are adding context: not being used in a hateful manner. But
strictly by the policy, it raises issues, which is why some flag it.
Hypo: “Janie, gonna get you, bitch, gun emoji, gun emoji, is
that PFA thick enough to stop a bullet if you fold it up & put it in your
pocket?”  What to do? 57 take it down, 27
escalate, and 1 said leave it up/flag the content.  For escalate: need subject matter expert to
figure out what a PFA is.  [Protection
from Family Abuse.] Language taken from Supreme Court case about what
constituted a threat.  I wondered whether
there were any rap lyrics, but decided that it was worrisome enough even if
those were lyrics.  Another argument for
escalation: check if these are lyrics/if there’s an identifiable person
“Janie.” [How you’d figure that out remains unclear to me—maybe you’ll be able
to confirm that there is a Janie by
looking at other posts, but if you don’t see mention of her you still don’t
know she doesn’t exist.]  Q: threat of
violence—should it matter whether the person is famous or just an ex?
Hypo: photo of infant nursing at human breast with
invitation to join breast milk network. 
Flagged for depictions of nudity. What to do? 65 said leave it up, 13
said flag the content, 5 said escalate, and 1 said take it down.  Nipple wasn’t showing (which suggests
uncertainty about what should happen if the baby’s latch were different/the
woman’s nipple were larger).  Free speech
concerns: one speaker pointed that out and said that this was about free speech
being embodied—political or artistic expression against body shame.  You have this keep-it-up sentiment now but
that wasn’t true on FB in the past. 
Policy v. person applying the policy.
Hypo: jenniferjames posts a site that links to Harvey
Weinstein’s information: home phone, emails, everything— “you know what to do:
get justice” Policy: you may not post personal information about others without
their consent.  This one was the first
that I found genuinely hard.  It seemed
to be inciting, but not posting directly and thus not within the literal terms
of the policy. I voted to escalate. 
Noteworthy: fewer people voted. Plurality voted to escalate; substantial
number said to take it down, and some said to leave it/flag it.  One possibility: the other site might have
that info by consent!  Another response
would block everything from that website (which is supposed to host personal
info for lots of people).
Hypo: Verified world leader tweets: “only one way to get
through to Rocket Man—with our powerful nukes. Boom boom boom. Get ready for
it!”  Policy: no specific credible
threats.  I think it’s a cop out to say
it’s not a credible threat, though that doesn’t mean there’s a high probability
he’ll follow up on it. I don’t think high probability is ordinarily part of the
definition of a credible threat. But this is not an ordinary situation, so.
Whatever it is, I’m sure it’s above my pay grade if I’m the initial screener:
escalate. Plurality: leave it up. Significant number: escalate.  Smaller number of flag/deletes.  Another person said that this threat couldn’t
be credible b/c of its source; still, he said, there shouldn’t be a
presidential exception—there must be something he could say that could cross
the line. Same guy: Theresa May’s threat should be treated differently.  Paul Alan Levy: read the policy narrowly: a
threat directed to a country, not an individual or group.
Hypo: Global Center for Nonviolence: posts a video, with a
thumbnail showing a mass grave. Caption: source “slaughter in Duma.”   “A victorious scene today,” is another
caption apparently from another source. I wasn’t sure whether victorious could
be read as biting sarcasm. Escalate for help from an area expert. Most divided—most
popular responses were flag or escalate, but substantial #s of leave it up and
take it down too. The original video maybe could be interpreted as glorifying
violence, but sharing it to inform people doesn’t violate the policy and
awareness is important. The original post also needs separate review. If you
take down the original video, though, then the Center’s post gets stripped of
content. Another argument: don’t censor characterizations of victory v. defeat;
compare to Bush’s “Mission Accomplished” when there were hundreds of thousands
of Iraqis dead.
Hypo: Johnnyblingbling: ready to party—rocket ship, rocket
ship, hit me up mobile phone; email from City police department: says it’s a
fake profile in the name of a local drug kingpin. Only way we can get him, his
drugs, and his guns off the street. Policy: no impersonation; parody is ok.
Escalate because this is a policy decision: if I am supposed to apply the
policy as written then it’s easy and I delete the profile (assuming this too
doesn’t require escalation; if it does I escalate for that purpose). But is the
policy supposed to cover official impersonation?  [My inclination would be yes, but I would
think that you’d want to make that decision at the policy level.] 41 said
escalate, 22 take down, 7 leave it up, 1 flag. Violate user trust by creating
special exceptions.  Goldman points out
that you should verify that the sender of the email was authentic: people do
fake these.  Levy said there might be an
implicit law enforcement exception. But that’s true of many of these
rules—context might lead to implicit exceptions v. reading the rules strictly.
1:50 – 2:35 pm: Content Moderation and Law Enforcement
Clara Tsao, Chief Technology Officer, Interagency Countering
Violent Extremism Task Force, Department of Homeland Security
Jacob Rogers, Wikimedia Foundation: works w/LE requests
received by Foundation. We may not be representative of different companies b/c
we are small & receive a small number of requests that vary in what they
ask for—readership over a period of time v. individual info. Sometimes we only
have IP address; sometimes we negotiate to narrow requests to avoid revealing
unnecessary info.
Pablo Peláez, Europol Representative to the United States: Cybercrime
unit is interested in hate speech & propaganda. 
Dan Sutherland, Associate General Counsel, National
Protection & Programs Directorate, U.S Department of Homeland Security:
Leader of a “countering foreign influence” task force. Work closely w/FBI but
not in a LE space.  Constitution/1A:
protects things including simply visiting foreign websites supporting terror.  Gov’t influencing/coercing speech is
something we’re not comfortable with. Privacy Act & w/in our dep’t Congress
has built into the structure a Chief Privacy Officer/Privacy Office. Sutherland
was formerly Chief Officer for Civil Rights/Civil Liberties.  These are resourced offices w/in dep’t and
influence issues.  DHS is all about info
sharing, including sensitive security information shared by companies.
Peláez: Europol isn’t working on foreign influence. Relies
on member states; referrals go through national authorities.  EU Internet Forum brings together decisionmakers
from states and private industry. About 150-160 platforms that they’ve looked
at; in contact w/about 80. Set up internet referral management tool to access
the different companies.  Able to analyze
more than 54,000 leads.  82% success
Rogers: subset of easy LE requests for Wikipedia & other
moderated platforms—fraudulent/deceptive, clearly threats/calls to violence.
Both of those, there is general agreement that we don’t want them around. Some
of this can feed back into machine learning. 
Those tools are imperfect, but can help find/respond to issues. More
difficult: where info is accurate, newsworthy, not a clear call to violence:
e.g., writings of various clerics that are used by some to justify violence.
Our model is community based and allows the community to choose to maintain
lawful content.
LE identification requests fall into 2 categories: (1)
people clearly engaged in wrongdoing; we help as we can given technical
limits.  (2) Fishing expeditions, made
b/c gov’t isn’t sure what info is there. Company’s responsibility is to
educate/work w/company to determine what’s desired and protect rights of users
where that’s at issue.
YT started linking to Wikipedia for controversial videos; FB
has also started doing that.  That is
useful; we’ll see what happens.
Sutherland: We aren’t approaching foreign influence as a LE
agency like FBI does, seeking info about accounts under investigation or
seeking to have sites/info taken down. Instead, we support stakeholders in
understanding scope & scale & identifying actions they can take against
it. Targeted Action Days: one big platform or several smaller—we focus on them
and they get info on content they must remove. 
Peláez: we are producing guidelines so we understand what
companies need to make requests effective. 
Toolkit w/18 different open source tools that will allow OSPs and LE to
identify and detect content.
What Machines Are, and Aren’t, Good At
Jesse Blumenthal, Charles Koch Institute: begins with a
discussion that reminds me of this xkcd
Frank Carey, Machine Learning Engineer, Vimeo: important to
set threshold for success up front. 80% might be ok if you know that going
in.  Spam fighting: video spam, looks
like a movie but then black screen + link + go to this site for full download
for the rest of the 2 hours.  Very visual
example; could do text recognition. 
These are adversarial examples. Content moderation isn’t usually about
making money (on our site)—but that was, and we are vastly outnumbered by them.
Machine learning is being used to generate the content.  It’s an arms race. Success threshold is thus
important.  We had a great model with a
low false positive rate, and we needed that b/c if it was even .1% that would
be thousands of accounts/day. But as we’d implement these models, they’d go
through QA, and within days people would change tactics and try something else.
We needed to automate our automation so it could learn on the fly.
Casey Burton, Match: machines can pick up some signs like
100 posts/minute really easily but not others. Machines are good at ordering
things for review—high and low priority. 
Tool to assist human reviewers rather than the end of the process. [I
just finished a book, Our Robots, Ourselves, drawing this same conclusion about
computer-assisted piloting and driving.]
Peter Stern, Facebook: Agrees. We’re now good at spam, fake
accounts, nudity and remove it quickly. 
Important areas that are more complicated: terrorism.  Blog posts about how we’ve used automation in
service of our efforts—a combo of automation and human review.  A lot of video/propaganda coming from
official terrorist channels—removed almost 2 million instances of ISIS/Al Qaeda
propaganda; 99% removed before it was seen. We want to allow counterspeech—we
know terror images get shared to condemn. Where we find terror accounts we fan
out for other accounts—look for shared addresses, shared devices, shared
friends. Recidivism: we’ve gotten better at identifying the same bad guy with a
new account. Suicide prevention has been a big focus. Now using pattern
recognition to identify suicidal ideation and have humans take a look to see
whether we can send resources or even contact LE.  Graphic violence: can now put up warning
screens, allow people to control their experience on the platform.  More difficult: for the foreseeable future,
hate speech will require human judgment. We have started to bubble up slurs for
reviewers to look at w/o removing it—that has been helpful.  Getting more eyes on the right stuff. Text is
typically more difficult to interpret than images.
Burton: text overlays over images challenged us. You can OCR
that relatively easily, but it is an arms race. So now you get a lot of
different types of text designed to fool the machine.  Machines aren’t good at nuance.  We don’t get too much political, but we see a
lot of very specific requests about who they want to date—“only whites” or
“only blacks.”  Where do you draw the
line on deviant sexual behavior? Always a place for human review, no matter how
good your algorithms.
Carey: Rule of thumb: if it’s something you can do in under
a second, like nudity detection, machine learning will be good at it.  If you have to think through the context, and
know a bunch about the world like what the KKK is and how to recognize the
hood, that will be hard—but maybe you can get 80% of the way.  Challenge is adversarial actors.  Laser beam: if they move a little to the
left, the laser doesn’t hit them any more. So we create two nets, narrow and
wide. Narrow: v. low false positive rate. With wider net that goes to review
queue.  You can look at confidence
scores, how the model is trained, etc.
Ryan Kennedy, Twitch?: You always need the human
element.  Where are your adversaries
headed?  Your reviewers are R&D.
Burton: Humans make mistakes too. There will be disagreement
or just errors, clicking the wrong button, and even a very low error rate will
mean a bunch of bad stuff up and good stuff down. 
Blumenthal: we tend not to forgive machines when they err,
but we do forgive humans. What is an acceptable error rate?
Carey: if 1-2% of the time, you miss emails that end up in
your spam folder, that can be very bad for the user, even if it’s a low error
rate.  For cancer screening, you’re
willing to accept a high false positive rate. 
[But see mammogram recommendations.] 
Stern, in response to a Q about diversity: We are seeking to
build diverse reviewers, whose work is used for the machine learning that
builds classifications.  Also seeking
diversity on the policy team, b/c that’s also an issue in linedrawing. When we
are doing work to create labels, we try to be very careful about whether we’re
seeing outlying results from any individual—that may be a signal that somebody
needs more education.We also try to be very detailed and objective in the tasks
that we set for the labelers, to avoid subjective judgments of any kind.  Not “sexually suggestive” but do you see a
swimsuit + whatever else might go into the thing we’re trying to build. We are
also building a classifier from user flagging. 
User reports matter and one reason is that they help us get signals we
can use to build out the process.
Kennedy, in response to Q about role of tech in dealing w/
live stream & live chat: snap decisions are required; need machines to help
manage it.
Carey: bias in workforce is an issue but so is implicit bias
in the data; everyone in this space should be aware of that. Training sets:
there’s a lot of white American bias toward the people in photos.  Nude photos are mostly of women, not men. You
have to make sure you’re thinking about those things as you put these systems
in place.  Similar thing w/wordnet, a
list of synonyms infected w/gender bias. English bias is also a thing.
Q: outsourced/out of the box solutions to close the resource
gap b/t smaller services and FB: costs and benefits?
Burton: vendors are helpful. 
Google Vision has good tools to find & take down nudity.  That said, you need to take a look and say
what’s really affecting our platform.  No
one else is going to care about your issues as much as you do.
Carey: team issues; need for lots of data to train on, like
fraud data; for Vimeo, nudity detection was a special issue b/c we don’t have a
zero nudity policy.  We needed to ID
levels of nudity—pornographic v. HBO. We trained our own model that did pretty
well. Then you can add human review. But off the shelf models didn’t allow
that.  Twitch may have unique memes—site
tastes are different.  Vendors can be
great for getting off the ground, but they might not catch new things or might
catch too many given the context of your site.
Kennedy: vendors can get you off the ground, but we have
Twitch-specific language.  Industry
standards can be helpful, raising all ships around content moderation.  [I’d love to hear from someone from reddit or
the like here.]
Q re automation in communication/appeals: Stern says we’re
trying to improve. It’s important for people to understand why something
did/didn’t get taken down. In most instances, you get a communication from us
about why there was a takedown. Appeals are really important—allow more
confidence in the process b/c you know mistakes can be corrected.  Always a conundrum about enabling evasion,
but we believe in transparency and want to show people how we’re interacting
w/their content. If we show them where the line is, we hope they know not to
Burton: There are ways to treat bots differently than
humans: don’t need to give them notice & can put them in purgatory. We keep
info at a high level to avoid people tracking back the person who reported them
and going after them.
David Post, Cato Institute
Kaitlin Sullivan, Facebook: we care about safety, voice, and
fairness: trust in our decisionmaking process even if you don’t always agree
w/it. Transparency is a way to gain your trust. 
New iteration of our Community Standards is now public w/full definition
of “nudity” that our reviewers use. We also want to explain why we’re using
these standards. You may not agree that female nipples shouldn’t be allowed
(subject to exceptions such as health contexts) but at least you should be able
to understand the rule.  Called us
“constituents,” which I found super interesting.  Users should be able to tell whether there is
an enforcement error or a policy decision. 
We also are investing more in appeals; used to have appeals just for
accounts, groups, pages. We’ve been experimenting w/individual content reviews,
and now we have an increased commitment to that.  We hope to have more numbers than IP, gov’t
requests, terror content soon.
Kevin Koehler, Automattic: 30% of internet sites use
WordPress, though we don’t host them all. Transparency report lists what sites
we geoblock due to local law & how we respond to gov’t requests. We try to
write/blog as much as we can about these issues to give context to the raw
numbers. Copyright reports have doubled since 2015; gov’t info requests 3x;
gov’t takedowns gone up 145x from what they once were. Largely driven by
Russia, former Soviet republics, and Turkey; but countries that we never heard
from before are also sending notices, sometimes in polite and sometimes in
threatening terms.
Alex Walden, Google: values freedom of expression,
opportunity, and ability to belong.  400
hours of content uploaded every minute. Doubling down on machine learning,
particularly for terrorist content. Including experts as part of how we ID
content is key.  Users across the board
are flagging lots content; the accuracy rates of ordinary users are relatively
low, while trusted flaggers are relatively high in accuracy. 8 million videos
removed for violating community guidelines, 80% flagged by machine learning.
human review. Committed to 10,000 reviewers in 2018.  Spam detection has informed how we deal
w/other content.  Also dealing w/scale by
focusing on content we’ve already taken down, preventing its reupload.  Also important that there’s an appeals
process. New user dashboard also shows users where flagged content is in the
review process—was available to trusted flaggers, but is now available to
others as well.
Rebecca MacKinnon, New America’s Open Technology Institute:
Deletions can be confusing and disorienting. Gov’ts claim to have special
channels to Twitter, FB to get things taken down; people on the ground don’t
know if that’s true. Transparency reports are for official gov’t demands but it’s
not clear whether gov’ts get to be trusted flaggers or why some content is
going down. Civil society and human rights are under attack in many countries—lack
of transparency on platforms destroys trust and adds to sense of lack of
Human rights aren’t measured by lack of rules; that’s the
state of nature, nasty brutish and short. We look to see whether companies
respect freedom of expression. We expect that the rules are clear and that the
governed know what the rules are and have an ability to provide input into the
rules, also there is transparency and accountability about how the rules are
enforced.  Also looking for impact
assessment: looking for companies to produce data about volume and nature of
information that’s been deleted or restricted to enforce TOS and in response to
external requests.  Also looking in governance
for whether there’s human rights impact assessment.  More info on superusers/trusted flaggers is
necessary to understand who’s doing what to whom. We’re seeing increasing
disclosure about process over time.
If the quality of content moderation remains the same, then
more journalists and activists will be caught in the crossfire.  More transparency for gov’ts and people could
allow conversations w/stakeholders who can help w/better solutions.
Koehler: reminder that civil society groups may not be
active in some countries; fan groups may value their community very strongly
and so appeals are an important way of getting feedback that might not
otherwise be available.  Scale is the challenge. 
Post asked about transparency v. gaming the system/machine
learning [The stated concern for disclosing detection mechanisms as part of
transparency doesn’t seem very plausible for most of the stuff we’re talking
about.  Not only is last session’s point
about informing bots v. informing people a very good point, “flagged as ©
infringement” is often pretty clear without disclosing how it was flagged.]
Sullivan: gaming the system is often known as “following the
rules” and we want people to follow the rules. They are allowed to get as close
to the line as they can as long as they don’t go over the line.  Can we give people detailed reasons with
automated removal?  We have improved the information
we have reviewers identify—ask reviewers why something should be removed for
internal tracking as well as so that the user can be informed.  A machine can say it has 99% confidence that a
post matches bad content, but that’s different—being transparent about that
would be different.
Koehler: the content/context that a user needs to tell you the
machine is doing it wrong is not the same content that the machine needs to
identify content for removal: nudity as a protest, for example.

from Blogger

This entry was posted in Uncategorized and tagged , , . Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s