Our PhD student Aydin has co-authored the paper “Compose and Fuse: Revisiting the Foundational Bottlenecks in Multimodal Reasoning,” which has been accepted at #ICLR2026.
The paper, led by first authors Yucheng Wang and Yifan Hou and co-authored with Mubashara Akhtar and Mrinmaya Sachan, takes a systematic look at how Multimodal Large Language Models integrate signals from different modalities during logical reasoning.
The work takes a systematic look at how Multimodal Large Language Models combine signals from different modalities when performing logical reasoning. The identified bottlenecks are stepstones for future research toward composition-aware training and architectural design so that additional modalities become supportive for reasoning instead of creating interference.
Read the full paper here: https://arxiv.org/pdf/2509.23744

