GLTR is a tool developed by the MIT-IBM Watson AI lab and HarvardNLP that can detect automatically generated text using forensic analysis. It detects when a text has been artificially generated by analyzing how likely it is that a language model has generated the text.
GLTR visually analyzes the output of the GPT-2 117M language model from OpenAI, which allows it to rank each word according to how likely it is to have been produced by the model.The tool then highlights the most likely words in green, followed by yellow and red, and the rest of the words in purple. GLTR provides a direct visual indication of how likely each word was under the model, making it easy to identify computer-generated text.GLTR also shows three histograms which aggregate information over the whole text. The first histogram shows how many words of each category appear in the text, the second illustrates the ratio between the probabilities of the top predicted word and the following word, and the third shows the distribution over the entropies of the predictions.By analyzing these histograms, GLTR provides additional evidence of whether a text has been artificially generated.GLTR can be used to detect fake reviews, comments, or news articles generated by large language models, which have the potential to produce texts that are indistinguishable from human-written text to a non-expert reader.GLTR can be accessed through a live demo and the source code is available on Github. Researchers can also read the ACL 2019 demo track paper, which was nominated for best demo.