Online reviews serve as a guide for consumer choice. With advancements in large language models (LLMs) and generative AI, the fast and inexpensive creation of human-like text may threaten the feedback function of online reviews if neither readers nor platforms can differentiate between human-written and AI-generated content. In two experiments, we found that humans cannot recognize AI-written reviews. Even with monetary incentives for accuracy, both Type I and Type II errors were common: human reviews were often mistaken for AI-generated reviews, and even more frequently, AI-generated reviews were mistaken for human reviews. This held true across various ratings, emotional tones, review lengths, and participants’ genders, education levels, and AI expertise. Younger participants were somewhat better at distinguishing between human and AI reviews. An additional study revealed that current AI detectors were also fooled by AI-generated reviews. We discuss the implications of our findings on trust erosion, manipulation, regulation, consumer behavior, AI detection, market structure, innovation, and review platforms.
Often times reviews are written by people with English not as a first language, or the reviews are machine translated (nowadays with AI itself). Many AIs use real reviews as templates to train on, so its not surprising to me that the differences aren’t easily spotted. The main tells are when it uses too flowery language and tone and doesn’t get to the point.
To me it seems similar to if you heard an electronic speaker playing a bird call vs. a real bird call, could you tell the real one from a distance?
If you were an expert birdwatcher you could probably tell easier. If the speaker repeated the exact same call on a loop you could tell, if you were in earshot of electronic buzzing in the background you could also, but depending on how sophisticated the speaker is setup (like delay and variety of calls, you might not.
Often times reviews are written by people with English not as a first language, or the reviews are machine translated (nowadays with AI itself). Many AIs use real reviews as templates to train on, so its not surprising to me that the differences aren’t easily spotted. The main tells are when it uses too flowery language and tone and doesn’t get to the point.
To me it seems similar to if you heard an electronic speaker playing a bird call vs. a real bird call, could you tell the real one from a distance?
If you were an expert birdwatcher you could probably tell easier. If the speaker repeated the exact same call on a loop you could tell, if you were in earshot of electronic buzzing in the background you could also, but depending on how sophisticated the speaker is setup (like delay and variety of calls, you might not.