United States

Mechanical reasoning is a hallmark of human intelligence, defined by its ubiquitous yet irreplaceable role in human activities ranging from routine tasks to civil engineering. Embedding machines with mechanical reasoning is therefore an important step towards building human-level artificial intelligence. Here, we leveraged 155 cognitive experiments to test the understanding of system stability, gears and pulley systems, leverage principle, inertia and motion, and fluid mechanics in 26 vision language models. Results indicate that VLMs consistently perform worse than humans on all domains, while demonstrate significant difficulty in reasoning about gear systems and fluid mechanics. Notably, their performance on these tasks do not improve as number of parameters increase, suggesting that current attention-based architecture may fail to grasp certain underlying mechanisms required for mechanical reasoning, particularly those pertaining to mental simulations.

CogSci 2025

Probing Mechanical Reasoning in Large Vision Language Models

computer science

problem solving

psychology

machine learning

reasoning

poster

### Welcome to CogSci Conference 2025!

The 47th Annual Meeting of the Cognitive Science Society was a hybrid meeting held in San Francisco. 

<div style="position:relative;padding-top:0;width:900px;height:500px;"><iframe style="position:absolute;border:none;width:100%;height:100%;left:0;top:0;" src="https://online.fliphtml5.com/ebtyf/amvr/"  seamless="seamless" scrolling="no" frameborder="0" allowtransparency="true" allowfullscreen="true" ></iframe></div>

#### About

The Cognitive Science Society brings together researchers from around the world who hold a common goal: understanding the nature of the human mind. The mission of the Society is to promote Cognitive Science as a discipline, and to foster scientific interchange among researchers in various areas of study, including Artificial Intelligence, Linguistics, Anthropology, Psychology, Neuroscience, Philosophy, and Education.

The Society is a non-profit professional organization and its activities include sponsoring an annual conference and publishing the journals Cognitive Science and TopiCS.

#### Our History 

* **Society Creation**<br>
The Society was incorporated as a 501(c)(3) non-profit professional organization in Massachusetts in 1979. The organizing committee included Roger Schank, Allan Collins, Donald Norman, and a number of other scholars from psychology, linguistics, computer science, and philosophy. 
<br><br>
* **Conference Creation**<br>
The first conference on cognitive science was held at La Jolla, California in August, 1979, and has occurred annually since then. The proceedings of each conference are published, and those from most years are available through Lawrence Erlbaum Associates, Inc. The annual proceedings of the Cognitive Science Conference represent a major source of information on new work and new ideas in the scientific study of thinking. In 1990, the Society, with help from an anonymous donor, established the David Marr Prize for the best student paper at each annual meeting.
<br><br>
* **Journal Creation**<br>
The Journal, Cognitive Science, began publication in 1976, and is now published by Wiley-Blackwell. The Executive Editor is currently Richard P. Cooper of Birkbeck, University of London, and there are 18 Associate Editors and a 30-member editorial board. It serves as the premier outlet for research reports that intersect two or more disciplines. Copyrights for articles published in the journal are held by the Society. The Governing Board of the Cognitive Science Society voted in late 2006 to found a new journal, Topics in Cognitive Science (topiCS). The Editor in Chief is Wayne Gray, Cognitive Science Department, Rensselaer Polytechnic Institute. The journal seeks to fill a niche not occupied by Cognitive Science Journal or other cognitive science journals. Membership in the Society includes a subscription to Cognitive Science and TopiCS. Copyrights for articles published in the journal are held by the Society.
<br><br>

#### Code of Conduct

By attending the CogSci 2025 Conference, you are required to adhere to the society’s **[Code of Conduct](https://drive.google.com/file/d/1ChPuihLy6jE_BWqfO7J2KKgX35JW2zsM/view?usp=sharing)**.
<br><br>


You need to log in with the email address you registered with. 

Login credentials were sent to you from Underline -  subject line "Welcome to the CogSci 2025 Conference". Please be sure to check your spam/promotional inbox  if you do not see an email confirmation right away.





Please log in to join this event.

To access the site, please register [**here**](https://cognitivesciencesociety.org/registration/).

If you are registered and feel like you are seeing this message by mistake, please make sure you are logged in with the same email that you registered with. 

Please register!

The 47th Annual Meeting of the Cognitive Science Society presents the latest research across cognitive science and highlights the theme of Cognition in Context.

Caregiving helps learners survive in the present and ultimately thrive independently without their caregiver in the future. While some caregiving provides immediate benefits, other actions focus on long-term development, even if they cause short-term discomfort or setbacks. For example, a parent might allow their child to fail in a game to learn a useful lesson about the value of perseverance. Here, we develop a probabilistic model of caregiving with a recursive theory of mind using the Memo programming language that captures these intuitions. The model considers learners as POMDP planners, and plans over such learners to intervene on their beliefs in a way that will be valuable in the future. As predicted by the model, participants favor improving learners' knowledge over immediate efficiency, but only when that knowledge has future value. Effective caregivers thus think several moves ahead, accepting short-term costs to prepare learners for long-term success.

Preparing a learner for an independent future

Linguistic features like stress and tone are often reflected in how lyrics are set to music. Intuitively, the motivation behind this phenomenon is to ensure listeners can accurately understand the lyrics in a musical environment, which begs the question: If a phonological component is more useful for accurately understanding speech in a language, then is it more likely to be reflected in text-setting? This study explores this question focusing on tone and tone-melody correspondence. Functional load of tones and degrees of tone-melody correspondence were obtained for three languages that use pitch contrastively: Cantonese, Mandarin, and Japanese. It was found that the functional load of tones and degree of tone-melody correspondence in these three languages did not correlate. Since Cantonese and Japanese alone exhibit the correlation, reasons for why Mandarin breaks the possible pattern are discussed. This study is a look into how linguistic grammar and experience interacts with musical grammar in a behavior that simultaneously involves language and music.

A Crosslinguistic Investigation on the Correlation between Functional Load of Tone and Tone-Melody Correspondence

When people interact with objects, they show incredible flexibility in learning novel motor control mappings or adapting their known control mappings to variables like object mass. Such motor learning can benefit from intuitive physical reasoning, as novel contexts of object interaction could be a new combination of a previously experienced control mapping with a different object with known mass. In this work, we present a novel object interaction paradigm in which subjects learned to slide pucks at targets by releasing kinetic energy from a compressed spring in a computer game. Participants needed to learn how their motor actions related to the final positions of the puck, while also adapting to the mass of different pucks. With a Bayesian regression model, we inferred participants' beliefs about object mass and control mappings, and show that they could transfer information about previously experienced puck mass but not the motor mappings of the springs.

Physical reasoning during motor learning aids people in transferring mass, but not motor control mappings

A central question in cognitive science is how to reconcile connectionist and symbolic models of the mind (e.g., Fodor & Pylyshyn 1988, Smolensky & Legendre 2006). Attempts have been made to bridge these competing schools of thought by showing how compositional structure can emerge in continuous vector representations (e.g., Manning et al. 2020). A key example is Mikolov et al. (2013), who demonstrated that word embeddings learned by a neural network encode semantic structure: subtracting the vector “man” from “king” and adding “woman” approximates “queen” (i.e., king - man + woman ≈ queen). Our work moves up one level of abstraction, from representations to functions. We analyze whether entire networks display emergent compositional structure by treating a trained network as a single vector (obtained by concatenating the network’s parameters) encoding its function. We show that these parameter vectors can be recomposed through simple additive analogies to create networks with new functions.

Additive Analogies Reveal Compositional Structure in Neural Network Weights

How do people generate and decide between the wide array of potential goals available to them at any given moment? We study this question in Minecraft, a game environment that is both open-ended enough to support a diverse array of goals and structured enough to facilitate quantitative evaluation of different goal features that may impact how people respond to different goals. Specifically, we explore the role of goal familiarity, concreteness, and complexity, which we operationalize using both linguistic analyses and by converting human-generated goals into a programmatic domain-specific language. Our results highlight the unique ways in which game environments like Minecraft can facilitate research into how humans engage in open-ended and creative behaviors.

Novel Goal Creation and Evaluation in Open-Ended Games

In a preregistered experiment, adults living in the United States (N = 700) expected family (here, siblings) to be more likely to reconcile than friends after a conflict. To a greater extent, participants reported that siblings (vs. friends) have to reconcile and failing to do so would be less morally permissible. Further, participants expected love between siblings to be negatively affected to a lesser extent than love between friends who experienced the same conflict. We also explored potential generational differences, and found that Baby Boomers (people born in the years 1946–1964) reported that family members were significantly more obligated to reconcile than did Millennials (people born in the years 1981–1996). Our findings indicate that ties to family members are especially anticipated and obliged to persist through thick and thin.

Through Thick and Thin: People Think Family Will and Ought to Reconcile

The use of virtual reality (VR) has become a standard procedure for studying spatial navigation, as it allows researchers to create controlled environments and paradigms that can be used across multiple research sites. These simulated environments are primarily conducted in either desktop VR (DVR) or ambulatory immersive VR (IVR), yet little work has directly investigated if navigation in these modalities reflect the same abilities when using identical environmental layouts. In 2 studies we examined participants’ abilities to learn the layout of a maze-type environment in DVR and IVR. Our findings generally show that while people exhibit better navigation performance in IVR, performance in the two modalities are highly correlated. We discuss the implications of these findings, including possible reasons for different performance in IVR compared to DVR, including body-based cues and cyber sickness, and make recommendations for future research examining navigation in VR.

Comparing Navigation in Immersive and Desktop VR Environments

This paper presents a methodology combining multimodal semantic analysis with an eye-tracking experimental protocol to investigate the cognitive effort involved in understanding the communication of future scenarios. We conduct a pilot study examining how visual fixation patterns vary during evaluation of valence and counterfactuality in fictional ad pieces describing futuristic scenarios, using a portable eye tracker. Participants' eye movements are recorded while evaluating the stimuli and describing them to a conversation partner. Gaze patterns are analyzed alongside semantic representations of the stimuli and participants' descriptions, constructed from a frame semantic annotation of both linguistic and visual modalities. Preliminary results show that far-future and pessimistic scenarios are associated with longer fixations and more erratic saccades, supporting the hypothesis that fractures in the base spaces underlying interpretation of future scenarios increase cognitive load for comprehenders.

FutureVision: A methodology for the investigation of future cognition

From comprehending language to learning new dance moves, extracting complex relationships between sequences of input is a key feature of human cognition. Prior studies have predominantly explored the cognitive mechanisms of structure learning using Markov sequences, where each element depends only on the previous one. Real-world experience, however, is rife with complex dependencies beyond Markov processes. Here, we study the effects of non-Markov dependencies on sequence learning by leveraging graph learning approaches. We introduce a motor sequence task in which transitional probabilities between pairs of stimuli are identical from a Markov perspective, but differ on higher-order non-Markov dependencies. We find that participants are better able to anticipate stimuli with higher non-Markov probabilities, providing corroboratory evidence that humans are sensitive to statistical structure beyond Markov dependencies. Further, behavior differed from other participants trained only on Markov sequences. Overall, this work demonstrates that humans can rapidly learn and represent statistical dependencies beyond the Markov regime.

Human Learning of Non-Markov Structures

To understand language, we use knowledge about everyday events to create rich internal (situation) models. Although knowledge increases with age, fluid cognitive abilities tend to decline, potentially making it more difficult to access that knowledge. Here, we asked how aging affects the ability to use event knowledge during real-time language comprehension. We recorded event-related brain potentials as younger and older adults read vignettes about everyday events. Both groups showed facilitation on the N400 (a neuroelectric marker of semantic processing) for words that fit the context. However, only younger adults showed facilitated N400s to anomalous but event-related words compared to unrelated anomalies. Among older adults (aged 53-80), there was a negative correlation between age and N400 effects of event-relatedness. We conclude that real-time access to event knowledge during language comprehension may shift across the course of the adult lifespan such that older adults restrict activation to the most immediately relevant content.

Downloads

Next from CogSci 2025

Preparing a learner for an independent future

.css-70qvj9{display:-webkit-box;display:-webkit-flex;display:-ms-flexbox;display:flex;-webkit-align-items:center;-webkit-box-align:center;-ms-flex-align:center;align-items:center;}Downloads

Next from CogSci 2025

Preparing a learner for an independent future

Downloads