Would you like to see your presentation here, made available to a global audience of researchers?
Add your own presentation or have us affordably record your next conference.
keywords:
cognitive architectures
language production
computational modeling
psychology
linguistics
To advance our understanding of referential communication and common ground formation, this study presents a novel generative cognitive model that integrates deep neural networks for visual perception, image generation, and language captioning. Using the Tangram Naming Task (TNT), we simulate the sender–receiver interaction with modular processes replicating holistic cognitive strategies. Through controlled simulation experiments, we reveal that language generation plays a more crucial role than visual perception in establishing common ground, while intermediate image generation enhances linguistic diversity—a key aspect of natural communication. Our results bridge cognitive modeling and large generative models, demonstrating how internal cognitive dynamics can be visualized and quantitatively evaluated. This study contributes to the growing field of cognitive-inspired human–AI communication and provides a blueprint for grounding-rich simulations in collaborative tasks.