PT-3 Creative Fiction
Creative writing by OpenAI’s GPT-3 model, demonstrating poetry, dialogue, puns, literary parodies, and storytelling. Plus advice on effective GPT-3 prompt programming & avoiding common errors.
The latest and greatest neural network for unrestricted natural language generation is OpenAI’s GPT-3. GPT-3 is like GPT-1 and the GPT-2 I’ve used extensively before1—only much more so, and then going beyond them in a fascinating new way.
GPT-3’s samples are not just close to human level: they are creative, witty, deep, meta, and often beautiful. They demonstrate an ability to handle abstractions, like style parodies.
Scaling works: quantity is a quality all its own.The scaling of GPT-2-1.5b by 116× to GPT-3-175b has worked surprisingly well and unlocked remarkable flexibility in the form of meta-learning, where GPT-3 can infer new patterns or tasks and follow instructions purely from text fed into it. What can we do with GPT-3? Here, we’re all about having fun while probing GPT-3’s abilities for creative writing tasks, primarily (but far from limited to) poetry. Fortunately, OpenAI granted me access to their Beta API service which provides a hosted GPT-3 model, letting me spend a great deal of time interacting with GPT-3 and writing things. Naturally, I’d like to write poetry with it: but GPT-3 is too big to finetune like I did GPT-2, and OA doesn’t (yet) support any kind of training through their API. Must we content ourselves with mediocre generic poetry, at best, deprived of finetuning directly on chosen poetry corpuses or authors we might like to parody? How much does GPT-3 improve and what can it do?
Turns out: a lot! Below, I walk through first impressions of using GPT-3, and countless samples. In the latest twist on Moravec’s paradox, GPT-3 still struggles with commonsense reasoning & factual knowledge of the sort a human finds effortless after childhood, but handles well things like satire & fiction writing & poetry, which we humans find so difficult & impressive even as adults. In addition to the Cyberiad, I’d personally highlight the Navy Seal & Harry Potter parodies, the Devil’s Dictionary of Science/Academia, “Uber Poem”, “The Universe Is a Glitch” poem (with AI-generated rock music version), & “Where the Sidewalk Ends”.
What BENCHMARK miss
The GPT-3 paper includes evaluation of zero-shot/few-shot performance across a wide range of tasks, but I fear that unless one is familiar with the (deadly dull) benchmarks in question, it won’t be impressive. You can skip to the appendix for more example like its poems, or browse the random samples.
The original OpenAI Beta API homepage includes many striking examples of GPT-3 capabilities ranging from chatbots to question-based Wikipedia search to legal discovery to homework grading to translation; I’d highlight AI Dungeon‘s Dragon model (example), and “Spreadsheets”/“Natural Language Shell”/“Code Completion”2. Andrew Mayne describes using GPT-3 to generate book recommendation lists & read interactive stories & engage in conversations with historical figures like Ada Lovelace3, summarize texts (such as for elementary school children, also available as a service now, Simplify.so) or summarize movies in emoji (Matrix: “🤖🤐”; Hunger Games: “🏹🥊🌽🏆”), convert screenplay ↔︎ story, summarize/write emails, and rewrite HTML. Paras Chopra finds that GPT-3 knows enough Wikipedia & other URLs that the basic Q&A behavior can be augmented to include a ’source’ URL, and so one can make a knowledge base ‘search engine’ with clickable links for any assertion (ie. the user can type in “What year was Richard Dawkin’s The Selfish Gene published?” and GPT-3 will return a tuple like (“The Selfish Gene was published in 1976″,”https://en.wikipedia.org/wiki/The_Selfish_Gene”) which can be parsed & presented as a search engine). Hendrycks et al 2020 tests few-shot GPT-3 on common moral reasoning problems, and while it doesn’t do nearly as well as a finetuned ALBERT overall, interestingly, its performance degrades the least on the problems constructed to be hardest.
Ryan North experimented with Crunchyroll anime, Star Trek: The Next Generation, & Seinfeld plot summaries. Max Woolf has a repo of GPT-3 example prompts & various completions such as the original GPT-2 “unicorn” article, Revenge of the Sith, Stack Overflow Python questions, and his own tweets (note that many samples are bad because the prompts & hyperparameters are often deliberately bad, eg the temperature=0 samples, to demonstrate the large effect of poorly-chosen settings as a warning). Janelle Shan experimented with weird dog descriptions to accompany deformed GAN-dog samples, and 10,000-year nuclear waste warnings based on the famous 1993 Sandia report on long-time nuclear waste warning messages for the Waste Isolation Pilot Plant. Summers-Stay tried imitating Neil Gaiman & Terry Pratchett short stories with excellent results. Arram Sabetti has done “songs, stories, press releases, guitar tabs, interviews, essays, and technical manuals”, with his Elon Musk Dr. Seuss poems a particular highlight. Paul Bellow (LitRPG) experiments with RPG backstory generation. Merzmensch Kosmopol enjoyed generating love letters written by a toaster. James Yu co-wrote a SF Singularity short story with GPT-3, featuring regular meta sidenotes where he & GPT-3 debate the story in-character. Daniel Bigham plays what he dubs “19 degrees of Kevin Bacon” which links Mongolia to (eventually) Kevin Bacon. Alexander Reben prompted for contemporary art/sculpture descriptions, and physically created some of the ones he liked best using a variety of mediums like matchsticks, toilet plungers, keys, collage, etc.
Harley Turan found that, somehow, GPT-3 can associate plausible color hex codes with specific emoji. Even more perplexingly, Sharif Shameem discovered that GPT-3 could write JSX (a Javascript+CSS hybrid) according to a specification like “5 buttons, each with a random color and number between 1–10” or increase/decrease a balance in React or a very simple to-do list and it would often work, or require relatively minor fixes. GPT-3 can also write some simple SVG shapes or SVG/Chart.js bar graphs, do text→LaTeX and SQL queries. While I don’t think programmers need worry about unemployment (NNs will be a complement until they are so good they are a substitute), the code demos are impressive in illustrating just how diverse the skills created by pretraining on the Internet can be. (Source: https://www.gwern.net/)