Why AI Chatbots Struggle in Long Conversations: A Study on Performance Decline.
Chatbot Research Findings
According to Novyny.live: A joint study by Microsoft Research and Salesforce has revealed a significant drop in the reliability of AI chatbots during extended conversations, despite their strong performance on short, single-turn queries. The analysis of over 200,000 dialogues uncovered critical weaknesses in how these models handle prolonged interaction.
Models Under Scrutiny
The research evaluated several leading AI models, including:
- GPT-4.1
- Gemini 2.5 Pro
- Claude 3.7 Sonnet
- o3
- DeepSeek R1
- Llama 4
While these models can achieve success rates of around 90% on single-step tasks, their performance plummets to approximately 65% in multi-turn, long-form dialogues. This highlights a substantial degradation in their ability to provide coherent and accurate responses as conversations progress. This performance gap is a key challenge for developers aiming to use AI for complex customer service or technical support.
The study quantified this decline, noting a roughly 15% reduction in the models' core capability, while the rate of unreliable answers surged by 112%. Furthermore, responses in multi-step formats became 20% to 300% longer. These metrics underscore the difficulty AI systems have in maintaining context and quality in sustained exchanges, where they may lose track of the subject or generate inappropriate replies.
Consequently, the research raises serious concerns about the effectiveness and dependability of current AI models in complex, real-world interaction scenarios. The findings have significant implications for businesses integrating chatbots into their operations, as a drop in reliability during long dialogues can directly lead to poor user experiences and erode trust. This emphasizes the urgent need for continued development and refinement of AI technologies to ensure stable and effective communication across all interaction formats.
Read also
- Foldable Android phones last just two to three years—here’s why
- Unreliable Turbo Engines: The Most Troublesome Powerplants and How to Protect Them
- JD Power Reliability Rankings: BMW Leads While Audi and Mercedes Lag Behind
- Ukraine’s 'Barracuda' Drone Boat Transforms Into a Mini Aircraft Carrier—Here’s How Many FPV Drones It Carries
- Stuck or Dead Pixels on Your Monitor: Can You Fix the Problem Yourself?
- Brick and Tile-Like Solar Panels: A Game-Changer for Heritage Buildings

