In Depth: Content in moderation: is digital censorship messing with the arts? ~ lotfi

Is automated moderation messing with art?

For anyone who spent their youth staring wistfully through the windows of record shops, longing for Saturday's trip to the library or taping films from the TV, the smorgasbord of culture available at our fingertips is literally a dream come true. But for some, it causes the internet's biggest problem. If lack of funds isn't a problem and there's no human gatekeeper to ID or refuse service, how can parents be sure their children aren't watching, reading or listening to explicit films, music and books?

It's a question that is being answered in various ways, all of which come with their own set of problems. In the UK, ISP-level website blockers are now in place to stop children stumbling across adult sites - you can turn them off but you'll have to ask. Netflix has a gated "just kids" section featuring only child-friendly films and TV shows. Most TV services come with a PIN number system to stop curious young minds from accessing potentially disturbing shows on demand. Social networks rely on a combination of automated processes, human moderation and users flagging up inappropriate content.

Last week, a new ebook store caused a stir by taking a different approach, specifically to profanity. Clean Reader raised the heckles of authors and readers by giving its users the option to "clean" any book bought through the app.

Devised by Jared and Kirsten Maughan after their daughter was upset by the swearing in a library book she was reading, the couple worked with app developers The Page Foundry to offer an app with various levels of "cleanliness" - clean, cleaner and squeaky clean - that blank out progressively less offensive terms as they go.

CleanReader

To "clean" a book obscures fucks and cunts, "cleaner" sees off shits and bastards while "squeaky clean" bans anything that could be thought of as profanity. The app also obscured some anatomical terms including breast, vagina and penis. When asked how they came up with the words to remove, Jared Maughan told us, "We referenced several websites that list profane words. They may not like being mentioned in your article but there's several of them. They even offer less offensive alternative words/phrases which we used in our database."

Since it rose to public attention earlier this month, plenty of famously profane books have been put through the filter - including American Psycho and some erotic literature - and, as expected, they come out sounding like something from a sitcom set in Pleasantville in 1952.

Many authors were not happy that their books were available in the million-strong library. Joanne Harris, author of Chocolat, led the charge, telling us that she objected to the app changing what she had written and engaging the app makers in a series of emails. After a week of clashes with authors, Clean Reader stopped selling ebooks on Thursday night.

In a statement on Facebook, it said, "Over the last several days we have been asking for and receiving significant feedback from authors. A common theme in all of it is that many authors do not want their books being sold in connection with Clean Reader." The app is still available, however, and the team behind it are working on "the next release" which will be made "in response to the feedback we have received from many authors and users".

Before the U-turn, the makers of the app were quick (and increasingly desperate) to point out that Clean Reader does not actually physically change the text of the book, which would put them on shaky ground both ethically and legally. It's not censorship or copyright infringement, they insisted.

CleanReader

Instead, the offending word is blanked out and the reader can tap it to see a replacement word (darn, bother, bottom etc) if they want to. The reader is free to turn the filter off any time they like. That, according to the app makers, is what makes it OK. A similar service for film, named Clean Films, was shut down by a court ruling in 2006.

Maughan told us, "WE DO NOT SELL EDITED COPIES OF BOOKS! Please make this very clear in your article." He went on to ask us to be explicit about the fact that people can turn the Clean Reader feature off and read the book as it was written - thus placating authors (he thought) but also negating the whole point of the app. "It's really no different than if someone decided to take a Sharpie to their book and cover up all the swear words," he insisted. "Believe it or not this is happening."

The complaints levelled at Clean Reader fall into three categories: the first is a broad lack of understanding of literature and why authors might be upset about the app. Second, because books in the store are bought in by broad-stroke publishing house deals, authors haven't given their permission for the app to tamper, temporarily or otherwise, with their work. And third, because "cleaning" books is an automated process, no care is taken in making sure the suggested replacements even make sense. The app makers haven't even given thought to the replacement words, having taken them wholesale from existing lists.

The art of careful moderating

Despite your feelings on the mechanics or outcome, the larger reasoning behind the app makes a certain amount of sense. Swearing is not allowed on radio and live TV before a certain time of night, clean versions of songs and albums are produced for radio and streaming play and films are restricted by age-ratings based on profanity as well as content.

The publication of 50 Shades of Grey, which started life as Twilight fan fiction, prompted some to call for the age rating of books but while some pop-lit books, like the gross-out Wetlands, come with stickers that say "Not suitable for younger readers" there's no law against a teenager going to a bookshop and buying them. You can see why a parent might want an alternative.

But neither film nor music use machine algorithms to decide what words should and shouldn't be removed or dialled down. Removing swear words from songs is still done manually - the mix engineer works to a list of words that radio stations will not play, then goes through each track and manually edits them out. Sometimes swear words are simply cut leaving just the backing music, sometimes they are replaced with funny sounds and very occasionally they are replaced with an alternative word provided by the artist.

More often than not, swear words are replaced with the reversed audio. That's why you get a lot of "ish" sounds in hip-hop; what you're hearing is 'shit' played backwards. But there is always a human removing the words and at least some contextual thought goes into how to replace them. What's more, the artist is always at least aware that the process is taking place.

CleanReader

In the age of music streaming, record labels provide streaming services with clean versions of many albums just as they provide radio stations with radio edits of songs. On streaming services like Spotify, these clean edits are sometimes set as the default version of an album, which can be jarring when particularly profane songs become a hotpotch of articles, determiners and quantifiers. Spotify is keen to point out that the explicit version is never removed - but you might have to go looking for it.

There's no automated process in film classification either, at least in the UK. The BBFC has stopped relying on a list of swear words to decide what was and wasn't suitable for each age rating. It now takes the context of the language into account - a lighthearted "fuck" is rated lower than an aggressive or insulting "fuck", for example. You can even have a "cunt" in a 15-rated film these days, as long as it is "justified by the context".

CleanReader

Every film is watched by at least one human being, with the most controversial movies watched by several. The director and president take the ultimate responsibility for the rating of every film. It's not for nothing that there are real people's signatures on the classification certificate you see before every movie screened in the UK.

The BBFC is by no means a perfect organisation with a perfect system and has had many complaints levelled at it in its time, but imagine if its process was taken over by machines working to algorithms. To cut and censor a film without a full and deep appreciation for its contextual meaning or nuances seems anti-democratic; almost dictatorial in its approach.

Of course, the BBFC and Clean Reader are very different things. Clean Reader only wants to offer you the chance to read a book without profanity - it's not taking out rape scenes, cutting violence or denying suicide as Thomas Bowlder and his sister did in The Family Shakespeare which was published in 1807, very certainly without the Bard's posthumous permission. It's also not providing a standard against which all books are rated nor saying "this book is now suitable for a 12 year old to read". But in the digital age of instant access, there are parallels.

CleanReader

Which begs the question: can you really apply rigorous machine thinking to something with as many shades of grey as film or literature? By programming a machine to replace swear words, we're saying if x then replace with y - if 'fuck' then replace with 'darn' - there's a right and a wrong, a catch-all solution. But that doesn't allow for the hundreds of tiny decisions that go into every word of a novel.

"Each word in my books is carefully chosen," Melissa Harrison, author of Clay and At Hawthorn Time, told us. "Swear words perhaps more than most, because they are powerful. When Denny calls Jozef a 'cunt' in my first novel, Clay, it's supposed to be shocking. Moreover, the insult Denny chooses is very precisely determined by his class, age and background; there is no other word that a man like him would use in that circumstance. The word is exactly right."

Similarly, Joe Dunthorne, author of Submarine and Wild Abandon, feels protective over his words. "I take so much time and pride over my choice of profanity that it would feel like a betrayal [to have them changed]. It serves a narrative purpose as well. Specifically in Submarine, the choice of sex words - there's a very long sex scene and the whole idea was that it runs in real time so as you read the sex scene happens at the speed at which the actual sex is occurring. The idea is that as he progresses towards orgasm his choice of sex words become ever more florid and out there, so you can track the progress of his emotional state through the choice of profanity. And that is obviously a deliberate effect that would be lost if any of those words were tagged out in favour of something else."

That's not to say that a human touch is necessarily the answer. The authors we spoke to weren't really any more open to having their books 'cleaned' by a human or a machine. "By definition changing the word is making the wrong choice, so what kind of wrong choice doesn't really matter," Dunthorne said. "I don't think a human's going to have a much more nuanced take on it."

CleanReader

Somewhat misguidedly, Clean Reader's people responded to upset authors by likening literature to a salad. In this analogy, the salad is the book, the cheese is profanity. "Is the chef offended when I don't eat the blue cheese? Perhaps. Do I care? Nope. I payed [sic] good money for the food and if I want to consume only part of it then I have that right. Everyone else at the table can consume their food however they want. Me removing the blue cheese from my salad doesn't impact anyone else at the table."

This blog post prompted the following reaction from Joanne Harris: "NOTE: A BOOK IS NOT A FUCKING SALAD"

If the question of profanity in literature is such a big deal to readers, Harrison is happy for them to forgo her books. Dunthorne has an alternative solution: let authors who are willing provide an alternative themselves.

The algorithmic approach comes with too little care, and a human moderator ctrl+f-ing a book isn't going to do much better. "It's interesting to think about whether as a writer you might be given the option to provide alternative words," he suggested. "Ie, if you could basically present an alternative age rating version of the text where you as the writer get to choose how you have it softened."

Smart**ches and provocateurs

The question of moderation does not start and end with the arts, of course. There has been mass outrage at Twitter removing tweets, for example. The number of government take-down requests made to Google have generally increased since the numbers were first reported in 2009.

Facebook, Instagram and any site publishing content has an obligation to remove harmful images and text, although where the line is drawn has always been something of a grey area. Swearing can seem like the least of the internet's problems; to try to stem the profanity online would surely give even the most determined algorithm a breakdown.

Before it moved to Disqus, TechRadar's own comment system used a similar approach to swearing as Clean Reader's. That's why, for a time, if you left a comment about a smartwatch, it was published as smart***ch.

The mechanics at work didn't understand that you were talking about the Samsung Galaxy Gear, it just thought you were calling us twats; sure, that could have been solved by a full word only clause, but the point is that the context is never taken into account. There is a world of difference between a comment tossed online and a published work that the writer agonises over for years but it can still obscure your meaning even when your meaning was inoffensive.

Facebook, YouTube and other high-volume websites use a mixture of automated processes and people to "keep dick picks and beheadings" out of the average person's line of sight. Wired's expose on the moderation farms published in late 2014 shone a light on the problems that this can cause.

CleanReader

Commercial content moderation (CCM), where companies are set up to offer human moderation as a service, is not exactly hidden by the big internet companies, but it isn't exactly raved about either. Possibly that's because of the potential health problems of repeatedly and continually exposing people to video and images of bestiality, violence and gore.

Sarah Roberts, Assistant Professor at the Faculty of Information and Media Studies (FIMS) at Western University, is one of the few people to have studied online content moderation from an academic standpoint. "The long-term effects of such exposure have yet to be studied (these moderation practices, organised into commercial content moderation among several different industrial sectors, are a relatively new phenomena)," she told us. "Certainly the short-term effects aren't great, either. People suffer burnout, or go home thinking about the terrible content they've viewed. Perhaps even worse yet, in some cases, they become inured."

The difference between using a machine to moderate text and images versus using a person to make a judgement call is vast. "Commercial content moderation is a highly complex and sophisticated task that calls upon a number of different faculties, including cultural, linguistic and other bodies of knowledge, as well as the ability to adjudicate based on a variety of policy issues as well as the much less tangible matter of taste. For those moderation tasks that are driven algorithmically, the content is often fairly simplistic (e.g., text comments).

CleanReader

"Humans are needed for intervention that computers simply cannot reproduce, and that comes in pretty much at any level beyond the most basic, and certainly where images and video are concerned," she added. To suggest that literature is basic enough to be moderated by a machine is something approaching an insult to anyone who ever studied a book at school or university, or made a career from studying or writing novels.

While no one has yet actively studied the effect of moderation on what people create and publish online and digitally, Roberts suggests that it is an area that is "ripe for investigation". Although Facebook, for instance, now publishes quite comprehensive community standards and has an expert team overseeing the process, the act of moderating content is not discussed or done in plain sight. "How would people respond if they knew the mechanisms by and the conditions under which their content was being adjudicated and evaluated?" asks Roberts. "How does that influence our online environment, collectively? And what does it mean when the sign of doing a good job of [content moderation] is to leave no trace at all?"

The questions outweigh the answers. How do we know if the humans working in content moderation farms are deleting photos of Michelangelo's David or other classic nudes from Instagram or Facebook? Why does Apple get to decide what nude photography is art and what is porn? How do we know if Netflix has removed a scene from a film? How do we know if an ebook service has changed the way we interpret a book simply by removing a swearword or underlining a popular phrase? What's to say all of these tiny moderations aren't working together to change the way art is created, appreciated and enjoyed?

We don't yet know what damage moderation, automated or otherwise, could be having on the arts. We have become accustomed to film, music and literature being thoughtfully processed by bodies like the BBFC, publishers and radio stations, and are generally aware of what has been cut, what has been deemed unacceptable and why. This provides not only a framework to work within but also something for provocateurs to rail against. The internet provides a vast opportunity for distribution, but an even bigger one for anonymous and unjustified editing.

CleanReader

Authors like Melissa Harrison and Joe Dunthorne are certain that apps like Clean Reader won't change the way they write. Harrison told us, "What I won't do – and will never do – is write dialogue that doesn't ring true, or use anything less than exactly the right word in the right place." Dunthorne agrees, "I don't think it would change the way I write. Things far more important than swearing don't."

But how can we be sure that authors, filmmakers, musicians and artists of the future, creatives who have grown up with moderators assessing their digital output at every turn, will say the same?