UNDERLINE DOI: https://doi.org/10.48448/g3te-as84
technical paper
MIRTT: Learning Multimodal Interaction Representations from Trilinear Transformers for Visual Question Answering
Would you like to see your presentation here, made available to a global audience of researchers?
Add your own presentation or have us affordably record your next conference.

