Year of Award

2025

Document Type

Thesis

Degree Type

Master of Science (MS)

Degree Name

Computer Science

Department or School/College

Computer Science

Committee Chair

Zedong Peng

Commitee Members

Anh Nguyen, Erik Guzik

Keywords

large language models, cognitive exam, cognitive impairment, mimicry, knowledge distillation, hallucination

Subject Categories

Artificial Intelligence and Robotics | Computer Sciences

Abstract

Large language models (LLMs) excel at many structured tasks but differ fun- damentally from human cognition. To investigate these differences, we administer a standard geriatric cognitive exam: the Self-Administered Gerocognitive Examination (SAGE) to six state-of-the-art LLMs (ChatGPT-4o, GPT-5 variants, Gemini 2.0 Pro, Claude 3.5 Sonnet, LLaMA 3.1:405B, and DeepSeek R1/V3). We adapt the SAGE (covering orientation, memory, language, reasoning, and visuospatial skills) for text- based interaction and score each model’s responses with the official rubric. Our re- sults show consistent patterns: all models perform perfectly on simple calculations and verbal fluency, but they underperform on temporal orientation (dates/time), re- call, and visuospatial tasks. For example, ChatGPT-4o achieves 20–21/22 total points, while the newer GPT-5 variants only reach 15–17/22; both groups miss memory and clock-drawing items. We also explore prompting strategies: instructing LLMs to “think step-by-step” (chain-of-thought (CoT)) and to mimic other models’ reasoning. These techniques yield moderate score improvements, suggesting latent capabilities can be unlocked. However, we observe frequent hallucinations: models confidently fabricating answers or even identities, which highlights serious reliability concerns. Contributions: We present the systematic evaluation of LLMs using a human cognitive benchmark, revealing specific cognitive deficits in current models. We show how eliciting reasoning (via CoT or imitation) affects performance, and we document hallucination tendencies in cognitive tasks. These findings provide a perspective on LLM limitations, informing future efforts to align AI systems with human-like reasoning.

Available for download on Saturday, December 12, 2026

Share

COinS
 

© Copyright 2025 Victor S. Wen