Year of Award
2025
Document Type
Thesis
Degree Type
Master of Science (MS)
Degree Name
Computer Science
Department or School/College
Computer Science
Committee Chair
Zedong Peng
Commitee Members
Anh Nguyen, Erik Guzik
Keywords
large language models, cognitive exam, cognitive impairment, mimicry, knowledge distillation, hallucination
Subject Categories
Artificial Intelligence and Robotics | Computer Sciences
Abstract
Large language models (LLMs) excel at many structured tasks but differ fun- damentally from human cognition. To investigate these differences, we administer a standard geriatric cognitive exam: the Self-Administered Gerocognitive Examination (SAGE) to six state-of-the-art LLMs (ChatGPT-4o, GPT-5 variants, Gemini 2.0 Pro, Claude 3.5 Sonnet, LLaMA 3.1:405B, and DeepSeek R1/V3). We adapt the SAGE (covering orientation, memory, language, reasoning, and visuospatial skills) for text- based interaction and score each model’s responses with the official rubric. Our re- sults show consistent patterns: all models perform perfectly on simple calculations and verbal fluency, but they underperform on temporal orientation (dates/time), re- call, and visuospatial tasks. For example, ChatGPT-4o achieves 20–21/22 total points, while the newer GPT-5 variants only reach 15–17/22; both groups miss memory and clock-drawing items. We also explore prompting strategies: instructing LLMs to “think step-by-step” (chain-of-thought (CoT)) and to mimic other models’ reasoning. These techniques yield moderate score improvements, suggesting latent capabilities can be unlocked. However, we observe frequent hallucinations: models confidently fabricating answers or even identities, which highlights serious reliability concerns. Contributions: We present the systematic evaluation of LLMs using a human cognitive benchmark, revealing specific cognitive deficits in current models. We show how eliciting reasoning (via CoT or imitation) affects performance, and we document hallucination tendencies in cognitive tasks. These findings provide a perspective on LLM limitations, informing future efforts to align AI systems with human-like reasoning.
Recommended Citation
Wen, Victor S., "The AI Imitation Game: A Cognitive Comparison of Mimicry in LLMs" (2025). Graduate Student Theses, Dissertations, & Professional Papers. 12579.
https://scholarworks.umt.edu/etd/12579
© Copyright 2025 Victor S. Wen