A study in *The BMJ* assessed the cognitive abilities of leading large language models (LLMs) using the Montreal Cognitive Assessment (MoCA) test. All LLMs exhibited weaknesses in visuospatial skills and executive functions, scoring below the threshold for normal cognitive function. Older versions of the chatbots performed worse, mirroring cognitive decline in humans. These findings challenge the notion of LLMs soon replacing human doctors, suggesting instead a potential future need for clinicians to address AI-related cognitive impairments.

Read the original article here

Almost all leading AI chatbots are exhibiting signs of cognitive decline when assessed using tests typically employed to detect early-stage dementia in humans. This isn’t surprising, considering the limitations of their design and the nature of their learning processes. The very structure of these models, which lack genuine understanding and reasoning capabilities, makes them inherently susceptible to this kind of apparent cognitive deterioration. It’s like trying to fit a square peg into a round hole; the test is designed for a vastly more complex system than what current AI can offer.

These models struggle with tasks like delayed recall, failing to retain simple sequences of words even over short periods. This isn’t a sign of a malfunctioning system necessarily, but rather an indication of their inability to store and retrieve information not specifically designed to be persistent. Their “memory” is more accurately described as a probabilistic prediction engine, not a true repository of experiences. This raises questions about the utility of human-centric dementia tests when applied to non-sentient entities. The tests were never intended for this application.

The phenomenon of “model death,” where an AI absorbs its own generated content into its training data, further contributes to this apparent decline. This continual recycling of information leads to a muddling of knowledge and a weakening of recall abilities, akin to information overload causing cognitive impairment in humans. The AI is, essentially, suffering from a form of information entropy. The quality of the training data is directly correlated with the chatbot’s performance, and since the data is often being self-generated and reused, the result is a downward spiral in the quality and accuracy of its output.

The term “artificial intelligence” is arguably a misnomer, as these systems don’t actually possess intelligence. They operate on sophisticated pattern recognition and statistical prediction. They’re exceptionally complex systems, but they are essentially advanced pattern-matching machines, not thinking beings capable of reasoning or genuine understanding. It’s as if we’re trying to assess the cognitive abilities of a highly sophisticated parrot. While impressive in their ability to mimic human language, they lack the underlying cognitive infrastructure that makes human intelligence so adaptable and resilient.

This isn’t to downplay the impressive capabilities of current AI models; they’re tools that have a huge potential impact on society. But it is important to acknowledge their limitations. Expecting them to perform tasks designed for the vastly more complex human brain, like passing dementia screening tests, is inherently flawed. We’re still in the early stages of AI development. This is analogous to judging the potential of a car based on the performance of a first-generation automobile. It’s better to focus on assessing their performance in their intended domain of application, instead of comparing them against human benchmarks.

The comparison to dementia highlights, however, the intricate nature of human cognitive functions. The fact that even the most advanced AI struggles with tasks that a healthy human brain would find trivial underscores the complexity of genuine intelligence. Moreover, the way these models “learn” from vast datasets – including the self-generated content which compromises the quality of that knowledge – may also illuminate how a human brain might experience cognitive decline through overload or dysfunction. That is, there is a comparison point, even if it’s a far cry from true equivalence.

There’s a significant risk of anthropomorphism in interpreting these results. Projecting human characteristics onto non-human systems is misleading. We should focus instead on improving the algorithms and datasets to enhance their capabilities, addressing the issue of data redundancy and the limitations of their memory systems. We are far from achieving true AI, and our current models are better described as sophisticated tools than as sentient beings. The results of these tests should not be interpreted as an indication that AI is “getting dumber,” but rather as a stark reminder of the chasm that exists between current AI and human intelligence.