United States

If a person answers a question correctly, how can we tell if the answer reflects an underlying understanding of the phenomenon, or if it is based on merely surface-level associations? Cognitive science has developed multiple tests, such as Winograd Schemas, that ostensibly require a respondent to use some kind of world/situation model rather than just associations. What then are we to make of large language models (LLMs) successes on some of these tasks? We present a series of probes to LLMs and people about everyday situations, finding that models sometimes respond correctly for the wrong reason and in other cases make seemingly &#39;catastrophic&#39; mistakes by applying the wrong model--often in human-like ways. Our results suggest that probing the basis of LLMs&#39; successes and failures can help inform human problem solving and in some cases call into question our previous tests of human understanding.

CogSci 2025

Wrong for the Right Reason? Using Successes and Failures of Large Language Models to Understand Human Thinking

comparative studies

language and thought

artificial intelligence

neural networks

natural language processing

If a person answers a question correctly, how can we tell if the answer reflects an underlying understanding of the phenomenon, or if it is based on merely surface-level associations? Cognitive science has developed multiple tests, such as Winograd Schemas, that ostensibly require a respondent to use some kind of world/situation model rather than just associations. What then are we to make of large language models (LLMs) successes on some of these tasks? We present a series of probes to LLMs and people about everyday situations, finding that models sometimes respond correctly for the wrong reason and in other cases make seemingly 'catastrophic' mistakes by applying the wrong model--often in human-like ways. Our results suggest that probing the basis of LLMs' successes and failures can help inform human problem solving and in some cases call into question our previous tests of human understanding.

poster

### Welcome to CogSci Conference 2025!

The 47th Annual Meeting of the Cognitive Science Society was a hybrid meeting held in San Francisco. 

<div style="position:relative;padding-top:0;width:900px;height:500px;"><iframe style="position:absolute;border:none;width:100%;height:100%;left:0;top:0;" src="https://online.fliphtml5.com/ebtyf/amvr/"  seamless="seamless" scrolling="no" frameborder="0" allowtransparency="true" allowfullscreen="true" ></iframe></div>

#### About

The Cognitive Science Society brings together researchers from around the world who hold a common goal: understanding the nature of the human mind. The mission of the Society is to promote Cognitive Science as a discipline, and to foster scientific interchange among researchers in various areas of study, including Artificial Intelligence, Linguistics, Anthropology, Psychology, Neuroscience, Philosophy, and Education.

The Society is a non-profit professional organization and its activities include sponsoring an annual conference and publishing the journals Cognitive Science and TopiCS.

#### Our History 

* **Society Creation**<br>
The Society was incorporated as a 501(c)(3) non-profit professional organization in Massachusetts in 1979. The organizing committee included Roger Schank, Allan Collins, Donald Norman, and a number of other scholars from psychology, linguistics, computer science, and philosophy. 
<br><br>
* **Conference Creation**<br>
The first conference on cognitive science was held at La Jolla, California in August, 1979, and has occurred annually since then. The proceedings of each conference are published, and those from most years are available through Lawrence Erlbaum Associates, Inc. The annual proceedings of the Cognitive Science Conference represent a major source of information on new work and new ideas in the scientific study of thinking. In 1990, the Society, with help from an anonymous donor, established the David Marr Prize for the best student paper at each annual meeting.
<br><br>
* **Journal Creation**<br>
The Journal, Cognitive Science, began publication in 1976, and is now published by Wiley-Blackwell. The Executive Editor is currently Richard P. Cooper of Birkbeck, University of London, and there are 18 Associate Editors and a 30-member editorial board. It serves as the premier outlet for research reports that intersect two or more disciplines. Copyrights for articles published in the journal are held by the Society. The Governing Board of the Cognitive Science Society voted in late 2006 to found a new journal, Topics in Cognitive Science (topiCS). The Editor in Chief is Wayne Gray, Cognitive Science Department, Rensselaer Polytechnic Institute. The journal seeks to fill a niche not occupied by Cognitive Science Journal or other cognitive science journals. Membership in the Society includes a subscription to Cognitive Science and TopiCS. Copyrights for articles published in the journal are held by the Society.
<br><br>

#### Code of Conduct

By attending the CogSci 2025 Conference, you are required to adhere to the society’s **[Code of Conduct](https://drive.google.com/file/d/1ChPuihLy6jE_BWqfO7J2KKgX35JW2zsM/view?usp=sharing)**.
<br><br>


You need to log in with the email address you registered with. 

Login credentials were sent to you from Underline -  subject line "Welcome to the CogSci 2025 Conference". Please be sure to check your spam/promotional inbox  if you do not see an email confirmation right away.





Please log in to join this event.

To access the site, please register [**here**](https://cognitivesciencesociety.org/registration/).

If you are registered and feel like you are seeing this message by mistake, please make sure you are logged in with the same email that you registered with. 

Please register!

The 47th Annual Meeting of the Cognitive Science Society presents the latest research across cognitive science and highlights the theme of Cognition in Context.

Large language models (LLMs) such as ChatGPT have replaced conventional interface designs with prompt-based natural language interactions. LLMs exhibit dynamic capabilities to fulfill a broad range of tasks and ad-hoc functionalities (e.g., “rewrite these appliance installation instructions for a five-year-old”). However, their open-ended interface replaces Norman’s gulf of execution with a new cognitive challenge for end-users; namely, the gulf of envisioning clear intentions and task descriptions in prompts to obtain a desired LLM response. To address this gap, we propose a cognitive model of the Envisioning process based on protocols of generative AI prompt-based interactions. The model highlights three cognitive challenges people face when requesting help from LLMs: (1) what the task should be (intentionality gap), (2) how to give instructions to do the task (instruction gap), and (3) what to expect in the LLM’s output (capability gap). We make recommendations to narrow the gulf of envisioning in human-LLM interactions.

Envisioning: The Cognitive Challenge of Prompt-based LLM Interactions

How do systems of measurement influence our conceptualization of relative magnitudes? This study investigates the cognitive interplay between measurement precision and magnitude categorization. By employing morphed shapes organized by an arbitrary dimension, we examine whether exposure to high- vs low-precision numerical systems affects conceptual judgments and well-known phenomena such as semantic distance and semantic congruity effects as found for familiar dimensions. Participants trained on novel scales revealed differences in their sensitivity that depended on the precision of the trained measurement system, consistent with high-precision systems leading to relatively expanded dimensional encodings compared to low-precision systems. Our findings also shed light on other topics such as the interplay of perception and language in learning novel dimensions and the association of directionality with a mental number line.

Does Precision Affect Categorization? Magnitude Categorization and Measurement Scales

We frequently interact with others daily and experience a sense of joint agency—the feeling of performing an action together. Recent studies suggest that this sense of joint agency is influenced by the perceived "human-likeness" of partner. This study examined how a partner's behavioral process, specifically adaptation and fluctuation, affects joint agency in a cooperative task mediated by human-likeness. Participants completed a cursor-tracing task simulating collaboration, with cursor movement determined by combining their input with pre-recorded data. In this experiment, adaptation was approximated by preprogrammed changes in the cursor movement. The results revealed that adaptation enhanced joint agency, whereas fluctuation had no significant effect. Human-likeness is thus positively correlated with joint agency. Moreover, individual traits such as extraversion and attachment shaped these perceptions in unexpected ways. Poor task performance increases joint agency. These findings contribute to this field by identifying factors that influence the sense of joint agency.

Influence of a Partner's Behavioral Process on the Sense of Joint Agency During Collaborative Task 

Humans can readily integrate visual, linguistic, and numerical information to extract meaning from symbolic displays of information. For instance, answering even a simple question about a data visualization requires connecting tokens of language to visual features in the plot to support quantitative inferences. What are the core computational mechanisms that enable integration across modalities to support such reasoning? Open-source vision-language models (VLMs) might provide a useful testbed for investigating these mechanisms, but doing so requires a high degree of experimental control. To achieve this control, we procedurally generated a large dataset containing pairs of questions and data visualizations that varied along several independent and ecologically important dimensions, including the number of observations and how they were distributed. We identified several open VLMs whose performance was sensitive to this variation, establishing their viability for further exploration of the mechanisms underlying multimodal reasoning.

Exploring the mechanisms that enable multimodal reasoning about data visualizations in vision-language models

When is it beneficial to constrain creativity? Creativity thrives with freedom, but when people collaborate to create artifacts, there is tension between giving individuals freedom to revise, and protecting prior achievements. To test how imposing constraints may affect collective creativity, we performed cultural evolution experiments where participants collaborated to create melodies and images in chains. With melodies, we found that limiting step size (number of musical notes that can be changed) improved pleasantness ratings. Similar results were observed in cohorts of musicians, and with different selection regimes. This outcome was due to the tendency to overcrowd melodies. Interestingly, limiting step size in creating images consistently reduced pleasantness. These conflicting findings suggest that in domains such as music, where artifacts can be easily damaged, collective creativity may benefit from imposing small step sizes or limiting overcrowding. We discuss parallels with search algorithms and the evolution of conservative birdsong cultures.

How constraints on editing affects cultural evolution

How do people reason about others when planning deceptive actions? How do detectives infer what suspects did based on the traces their actions left behind? In this work, we explore deception in a setting where agents steal other’s snacks and try to determine the most likely thief. We propose a computational model that combines inverse planning with recursive theory of mind to select misleading actions and reason over evidence arising from such plans. In Experiment 1, we demonstrate that suspects strategically modify their behavior when acting deceptively, aligning with our model’s predictions. Experiment 2 reveals that detectives show increased uncertainty when evaluating potentially deceptive suspects—a finding consistent with our model, though alternative explanations exist. Our results suggest that people are adept at deceptive action planning, but struggle to reason about such plans, pointing to possible limits in recursive theory of mind.

Leave a trace: Recursive reasoning about deceptive behavior

Two experiments collected third-party evaluations from U.S. 4–5-year-old children (N = 80) who heard stories about caregivers helping or hindering their infants’ achievement of safe, dangerous, or ambiguous goals. Children’s evaluations were sensitive to danger: They switched from positively evaluating parents who helped access safe objects, to negatively evaluating those who helped access dangerous objects. Older children offered robustly positive evaluations of parents who protectively hindered access to dangerous objects, but younger participants were more likely to negatively evaluate these parents. Given a moderately risky goal that participants themselves judged as unsafe, children’s evaluations of helping and hindering were mixed, though there was preliminary evidence of a developmental shift. These findings show that young children go beyond basic inferences about whether an act promotes or hampers another agent’s goal when considering whether the action was good or bad. Instead, young children consider the broader consequences for the target’s welfare.

Early Evaluations of Caregivers Who Help and Hinder Safe and Dangerous Goals

In the Mod game, actions are laid out on a circle. Each round, players choose an action simultaneously and gain points for each player they are one step ahead of in clockwise direction. Cooperation is rarely used. This article facilitates cooperation and deceit by adding a signalling phase where one player signals which action they will play. Our novel Mod-Signal game lets players cooperate by adhering to their signal, but they can also lie by playing a different action. In our experiment, humans play the two-player 24-action Mod-Signal game with an agent and with each other. While cooperative play is faster and yields more points, players predominantly lie and play non-cooperatively. Furthermore, our participants usually use no more than second-order theory of mind. While the Mod game is mainly played competitively, our Mod-Signal game can also be used to investigate cooperation and deception in the context of theory of mind.

Cooperation, Deception and Theory of Mind in a Cyclic Game with Inter-Player Signalling

"FutureMind" is the composite human mental ability to imagine distant, large, complicated futures, often wildly unlike any human memory. It is an astonishing, ubiquitous, powerful, distributed ability, mostly absent from other species, yet mostly unstudied in Cognitive Science. It is indispensable for human activities from political systems to personal ambition. Our understanding of human history consists largely of notions of how previous generations used their own FutureMind. FutureMind relies on both deeply entrenched and highly creative mental operations. In this presentation, we outline a research program for a systematic science of FutureMind, with an initial theoretical framework.We present the cognitive mechanisms underlying FutureMind, which include framing, blending, compression, analogy, selective projection, viewpoint blending, and the construction of networks of mental spaces.

FutureMind: How Human Cognition Shapes the Way We Think About Our Futures

Context-dependent reinforcement learning (RL) challenges the assumption that decision makers encode the absolute values of choice outcomes. This study investigates whether the associated choice biases arise from a relative encoding of outcomes or an alternative mechanism involving cumulative reward learning and selective attention to outcomes. Using eye tracking, participants completed a RL task where choice options were initially learned in fixed contexts before being tested in novel pairings. Results revealed an overall preference for options that were contextually favored in the learning phase, even when these preferences violated expected value maximization. Computational model comparisons demonstrated that hybrid encoding models, incorporating absolute and relative values, provided the best overall account of individual behavior. While eye fixations on choice outcomes decreased over trials, fixation-dependent RL models did not fit the data well, suggesting that overt visual attention patterns do not fully explain context-dependent choice biases.

Downloads

Next from CogSci 2025

Envisioning: The Cognitive Challenge of Prompt-based LLM Interactions

.css-70qvj9{display:-webkit-box;display:-webkit-flex;display:-ms-flexbox;display:flex;-webkit-align-items:center;-webkit-box-align:center;-ms-flex-align:center;align-items:center;}Downloads

Next from CogSci 2025

Envisioning: The Cognitive Challenge of Prompt-based LLM Interactions

Downloads