The internet is not ready for the flood of AI-generated text

The way that many of our systems currently focus on engagement makes them particularly vulnerable to the incoming wave of content from bots like GPT-3

Christopher Brennan
Monday Note

--

by Christopher Brennan*

Photo by Andre Benz on Unsplash

*Christopher Brennan is the editor-in-chief of Deepnews.ai

Most of the words you’ve ever read were written by a human, but that could soon change.

And if nothing else changes along with it, the internet will be a much different place.

Over at the Deepnews.ai blog, I have written a couple of posts around the idea of algorithmically generated text, particularly on GPT-3, the tool from San Francisco-based OpenAI that has surprised many with its ability to write with relative sophistication after being given a simple prompt. You’ve probably seen its tricks in write-ups such as the widely shared Guardian article.

One of my posts was based on an interview with Phillip Winston, a software engineer in Virginia who helped figure out that a bot on Reddit was answering thousands of questions on a forum using an application of GPT-3 called Philosopher AI. Another post was with Liam Porr, a student at UC Berkeley who used the technology to write a self-help blog where he inputs prompt titles such as “Boldness and creativity trump intelligence.”

All of the attention around sophisticated text generation has prompted sci-fi speculation and talk of using it as a building block of general artificial intelligence, or machines with human-like reasoning capabilities, which is the stated goal of OpenAI and what you are thinking of when you are reading Asimov. Given what I have seen of it, I agree more with machine learning pioneer Yann LeCun, who cautions that GPT-3 is really just a language model.

But we don’t need hyper-intelligent machines to dramatically change the way that the internet works. In my recent conversations, we talked about the issue of what happens when AI text-generation capabilities are more widespread and can generate what Winston calls “10,000 Wikipedias” worth of text in a very short time. You might have already gotten weary of our current state of “too much content,” but it is about to get far, far worse.

Some of the closest possibilities are commercial. OthersideAI has just raised millions in seed money for a use of GPT-3 that will write automatic emails for salespeople in the style of their choice. Porr, after revealing his blog as automated, wrote about automated copywriting, which could generate several options and then automatically A/B test them to see which gets the most engagement.

Engagement, however, is where I have worries about text generation.

Engagement, clicks, likes, time on page, are the relentless pursuit of many of the leading internet companies such as Facebook, so much so that it is hesitant to crack down on divisive content because it brings so much engagement, and with engagement comes revenue. Algorithms that recommend posts to users often tilt the playing field towards the type of posts that make people engage, helped along with data about their behavior that makes targeting as efficient as possible.

One of the most interesting things we have seen from recent applications of GPT-3 is that they have proved human users will engage with it unknowingly. Porr’s first GPT-3 post, Feeling unproductive? Maybe you should stop overthinking,” which to be fair he did do some editing on himself, made it to the top of social news site Hacker News because of how much engagement it received.

Some commenters thought the post was robot-made, though those comments were down-voted by other users who liked it, making those doubts less visible. If you can fool some people sometimes, you don’t need to fool all the people all the time.

This is an example of what AI-generated content can look like.

When I think about what the internet may become with AI-generated text from commercial and other applications, I think about tourist traps like New York City’s Times Square (the current, glitzy version, though there is also the possibility that AI-text will move us more towards the seedier 1970s version).

You go to Times Square and are surrounded by different things competing for your attention and your money, from guides hawking bus tours, to the army of aggressive Elmos that want you to pay to take a picture, to the white glow of 1,000 advertisements merging into one, unified spotlight. All new arrivals make a pass by Times Square, though after a few minutes they will want to head off to go find the “real” authentic New York City.

The problem with a flood of AI-generated content is that it may turn all streets into Times Square, whether you realize it or not. All the advertisements in Times Square are labeled as such and are not trying to trick you. In a world that is flooded by text from GPT-3, you could walk away from a touristy area to a neighborhood that looks “real,” though it would be a tourist trap as well. With the amount of data that exists about your behavior on the internet, AI-generated text can create articles and entire websites that are targeted towards you and designed to make you engage.

Of course, there are places that you can navigate to where you know the intentions are good. In our New York analogy, if you know where you are going, you can head over a block to 8th Avenue where the New York Times building is. However, as Kevin Roose pointed out in the Times last year, you may have to pay a subscription fee for quality (or to access his article).

There are alternatives for quality content online that aren’t paid, the equivalent of the free New York Public Library over on 5th Avenue, though NYPL of course has the benefit of being well known. When a huge percentage of the content online is algorithmically created to extract something from you, you will always have to be on guard in a neighborhood you don’t know. It will make it increasingly difficult to find and trust smaller outlets such as non-profit newsrooms trying to cover their communities.

On some level, algorithmic text that just wants to push you to buy something or promote a brand, a sort of unlabelled “sponsored content,” is not the biggest worry. The Stanford Internet Observatory’s Renée DiResta makes the case in The Atlantic that technologies such as GPT-3 may dramatically impact the world of misinformation and disinformation, creating an infinite supply of fake news.

The most sophisticated efforts will also be micro-targeted at specific groups and engineered to generate the maximum amount of engagement from real, human users. False news created by humans is already many times more likely to be shared online than true news, but the advent of AI-generated text means that fake news can be optimized to be hyper-efficient at whatever its purpose is. The dataset that was used to train GPT-3 likely contains everything from news to books on the songbirds of Bolivia to the manifestos of mass shooters, and it can generate all sorts of content to make people happy, sad, infuriated, or radicalized. As DiResta brings up in her piece, machine-generated messages can be used to create entire AI-generated personas of people who don’t actually exist but become powerful influencers.

One of the most obvious first steps forward, which should be put in place for every output of tools such as GPT-3 no matter or much or how little human editing was involved, is labeling of AI-generated content so that people know what they are reading. Though it may seem counterintuitive in a world of “too much content,” it is the Louis Brandeis approach that the remedy for speech we don’t like is “more speech.”

Beyond that, the way to fight AI-generated text at scale is by creating technical tools that can identify it at scale. This will mean making tools that look more closely at the qualities of the text itself, as Deepnews does with its quality score. It will also mean an arms race between increasingly sophisticated machine text and the means to spot it.

While these tools will help people from being flooded with AI content, it will also create two separate internets. One will be open, filled with potential dangers as well as the undiscovered gems of human creativity (or maybe even beautiful insights generated by machines ) floating out there in a sea of bad intentions.

The other will be walled off, an extension of the current trend of paying for quality news, but even more securitized. There will be CAPTCHAS and verified identities and security checks on what you are doing online. You may find yourself having to work harder and harder to prove you are human.

christopher@deepnews.ai

--

--