Dominican Republic

In Visual Question Answering (VQA), existing bilinear methods focus on the interaction between images and questions. As a result, the answers are either spliced into the questions or utilized as labels only for classification. On the other hand, trilinear models such as the CTI model efficiently utilize the inter-modality information between answers, questions, and images, while ignoring intra-modality information. Inspired by this observation, we propose a new trilinear interaction framework called MIRTT (Learning Multimodal Interaction Representations from Trilinear Transformers), incorporating the attention mechanisms for capturing inter-modality and intra-modality relationships. Moreover, we design a two-stage workflow where a bilinear model reduces the free-form, open-ended VQA problem into a multiple-choice VQA problem. Furthermore, to obtain accurate and generic multimodal representations, we pre-train MIRTT with masked language prediction. Our method achieves state-of-the-art performance on the Visual7W Telling task and VQA-1.0 Multiple Choice task and outperforms bilinear baselines on the VQA-2.0, TDIUC and GQA datasets.

EMNLP 2021

MIRTT: Learning Multimodal Interaction Representations from Trilinear Transformers for Visual Question Answering

## Welcome to NAACL 2022!
The Annual Meeting of the North American Chapter of the Association for Computational Linguistics (NAACL) is taking place July 10-15, 2022 as a hybrid event, in Seattle, WA and online. We are happy to welcome all of you to conference! 
 
The main conference program features oral presentations, in-person and virtual posters and demo sessions, a plenary session for our best paper presentations and awards, keynote presentations, a plenary panel on the place of linguistics and symbolic structures in NLP, and an Industrial Track panel on Careers in NLP.
 
Posters (including Findings of NAACL 2022) and demos are grouped by areas for both the in-person and the virtual sessions. For the virtual component, the talks will be on Zoom and the posters and the demos will be in GatherTown. The Student Research Workshop and will have an oral session and a poster session.
 
The program also features 6 Tutorials and 26 Workshops.
 
 We wish you a wonderful conference!
 
[The NAACL 2022 Organizing Committee](https://2022.naacl.org/committees/organization/)

NAACL 2022

2022 Annual Conference of the North American Chapter of the Association for Computational Linguistics

technical paper

EMNLP 2021 is planned to be a hybrid event in Punta Cana, Dominican Republic, with both on-site and fully virtual participation possible. The experience for on-site participants would closely approximate a normal pre-COVID *ACL conference, with 5-6 thematically organized parallel sessions and live Q/A and interactive discussion immediately after the talks. Presentations by virtual participants will be equitably interleaved with those of on-site participants, projected on the auditorium screens as if on-site, and also followed immediately by live Q/A and interactive discussion at a time during reasonable waking hours for the virtual presenter. For all participants, on-site and virtual, who are unable to attend a session due to either time-zone issues or because they are participating in another session live, talk recordings and slides will be available online at a minimum after the live presentation (and in many cases before as well), and questions may be submitted in advance on session-specific discussion boards and answered live in session with the usual visual aids if desired.

<iframe style="width:700px;height:400px" src="https://online.fliphtml5.com/ebtyf/ceby/" seamless="seamless" scrolling="no" frameborder="0" allowtransparency="true" allowfullscreen="true" ></iframe>

Please Note: The EMNLP registration system is not currently connected to the underline site as we are still in the process of building out EMNLP 2021. You will receive access instructions from underline the week of November 1st. 

Access is given only to EMNLP upon registration, if you have not registered please do so [here](https://2021.emnlp.org/registration).

Registered attendees will receive access the week of November 1st.

EMNLP 2021 is planned to be a hybrid event in Punta Cana, Dominican Republic, with both on-site and fully virtual participation possible.

Machine reading comprehension (MRC) is a challenging NLP task for it requires to carefully deal with all linguistic granularities from word, sentence to passage. For extractive MRC, the answer span has been shown mostly determined by key evidence linguistic units, in which it is a sentence in most cases. However, we recently discovered that sentences may not be clearly defined in many languages to different extents, so that this causes so-called location unit ambiguity problem and as a result makes it difficult for the model to determine which sentence exactly contains the answer span when sentence itself has not been clearly defined at all. Taking Chinese language as a case study, we explain and analyze such a linguistic phenomenon and correspondingly propose a reader with Explicit Span-Sentence Predication to alleviate such a problem. Our proposed reader eventually helps achieve a new state-of-the-art on Chinese MRC benchmark and shows great potential in dealing with other languages.

What If Sentence-hood is Hard to Define: A Case Study in Chinese Reading Comprehension

We propose CodeQA, a free-form question answering dataset for the purpose of source code comprehension: given a code snippet and a question, a textual answer is required to be generated. CodeQA contains a Java dataset with 119,778 question-answer pairs and a Python dataset with 70,085 question-answer pairs. To obtain natural and faithful questions and answers, we implement syntactic rules and semantic analysis to transform code comments into question-answer pairs. We present the construction process and conduct systematic analysis of our dataset. Experiment results achieved by several neural baselines on our dataset are shown and discussed. While research on question-answering and machine reading comprehension develops rapidly, few prior work has drawn attention to code question answering. This new dataset can serve as a useful research benchmark for source code comprehension.

CodeQA: A Question Answering Dataset for Source Code Comprehension

Numerical reasoning skills are essential for complex question answering (CQA) over text. It requires opertaions including counting, comparison, addition and subtraction. A successful approach to CQA on text, Neural Module Networks (NMNs), follows the programmer-interpreter paradigm and leverages specialised modules to perform compositional reasoning. However, the NMNs framework does not consider the relationship between numbers and entities in both questions and paragraphs. We propose effective techniques to improve NMNs' numerical reasoning capabilities by making the interpreter question-aware and capturing the relationship between entities and numbers. On the same subset of the DROP dataset for CQA on text, experimental results show that our additions outperform the original NMNs by 3.0 points for the overall F1 score.

Improving Numerical Reasoning Skills in the Modular Approach for Complex Question Answering on Text

In this paper, we present a new verification style reading comprehension dataset named VGaokao from Chinese Language tests of Gaokao. Different from existing efforts, the new dataset is originally designed for native speakers’ evaluation, thus requiring more advanced language understanding skills. To address the challenges in VGaokao, we propose a novel Extract-Integrate-Compete approach, which iteratively selects complementary evidence with a novel query updating mechanism and adaptively distills supportive evidence, followed by a pairwise competition to push models to learn the subtle difference among similar text pieces. Experiments show that our methods outperform various baselines on VGaokao with retrieved complementary evidence, while having the merits of efficiency and explainability. Our dataset and code are released for further research.

Extract, Integrate, Compete: Towards Verification Style Reading Comprehension

Question answering (QA) systems are now available through numerous commercial applications for a wide variety of domains, serving millions of users that interact with them via speech interfaces. However, current benchmarks in QA research do not account for the errors that speech recognition models might introduce, nor do they consider the language variations (dialects) of the users. To address this gap, we augment an existing QA dataset to construct a multi-dialect, spoken QA benchmark on five languages (Arabic, Bengali, English, Kiswahili, Korean) with more than 68k audio prompts in 24 dialects from 255 speakers. We provide baseline results showcasing the real-world performance of QA systems and analyze the effect of language variety and other sensitive speaker attributes on downstream performance. Last, we study the fairness of the ASR and QA models with respect to the underlying user populations.

Spoken Dialectal Question Answering for the Real World

Answering complex open-domain questions requires understanding the latent relations between involving entities. However, we found that the existing QA datasets are extremely imbalanced in some types of relations, which hurts the generalization performance over questions with long-tail relations. To remedy this problem, in this paper, we propose a Relation-Guided Pre-Training (RGPT-QA) framework. We first generate a relational QA dataset covering a wide range of relations from both the Wikidata triplets and Wikipedia hyperlinks. We then pre-train a QA model to infer the latent relations from the question, and then conduct extractive QA to get the target answer entity. We demonstrate that by pre-training with propoed RGPT-QA techique, the popular open-domain QA model, Dense Passage Retriever (DPR), achieves 2.2%, 2.4%, and 6.3% absolute improvement in Exact Match accuracy on Natural Questions, TriviaQA, and WebQuestions. Particularly, we show that RGPT-QA improves significantly on questions with long-tail relations.

Relation-Guided Pre-Training for Open-Domain Question Answering

To build robust question answering systems, we need the ability to verify whether answers to questions are truly correct, not just "good enough" in the context of imperfect QA datasets. We explore the use of natural language inference (NLI) as a way to achieve this goal, as NLI inherently requires the premise (document context) to contain all necessary information to support the hypothesis (proposed answer to the question). We leverage large pre-trained models and recent prior datasets to construct powerful question conversion and decontextualization modules, which can reformulate QA instances as premise-hypothesis pairs with very high reliability. Then, by combining standard NLI datasets with NLI examples automatically derived from QA training data, we can train NLI models to evaluate QA models' proposed answers. We show that our approach improves the confidence estimation of a QA model across different domains, evaluated in a selective QA setting. Careful manual analysis over the predictions of our NLI model shows that it can further identify cases where the QA model produces the right answer for the wrong reason, i.e., when the answer sentence cannot address all aspects of the question.

Can NLI Models Verify QA Systems' Predictions?

Automated Theorem Proving (ATP) deals with the development of computer programs being able to show that some conjectures (queries) are a logical consequence of a set of axioms (facts and rules).
There exists several successful ATPs where conjectures and axioms are formally provided (e.g. formalised as First Order Logic formulas).
Recent approaches, such as Clark et al., have proposed transformer-based architectures for deriving conjectures given axioms expressed in natural language (English).
The conjecture is verified through a binary text classifier, where the transformers model is trained to predict the truth value of a conjecture given the axioms.
The RuleTaker approach of Clark et al. achieves appealing results both in terms of accuracy and in the ability to generalize, showing that when the model is trained with deep enough queries (at least 3 inference steps), the transformers are able to correctly answer the majority of queries (97.6%) that require up to 5 inference steps.
In this work we propose a new architecture, namely the Neural Unifier, and a relative training procedure, which achieves state-of-the-art results in term of generalisation, 
showing that mimicking a well-known inference procedure, the backward chaining, it is possible to answer deep queries even when the model is trained only on shallow ones. 
The approach is demonstrated in experiments using a diverse set of benchmark data and the source code is released to the research community for reproducibility.

Neural Unification for Logic Reasoning over Natural Language

While diverse question answering (QA) datasets have been proposed and contributed significantly to the development of deep learning models for QA tasks, the existing datasets fall short in two aspects. First, we lack QA datasets covering complex questions that involve answers as well as the reasoning processes to get them. As a result, the state-of-the-art QA research on numerical reasoning still focuses on simple calculations and does not provide the mathematical expressions or evidence justifying the answers. Second, the QA community has contributed a lot of effort to improve the interpretability of QA models. However, they fail to explicitly show the reasoning process, such as the evidence order for reasoning and the interactions between different pieces of evidence. To address the above shortcoming, we introduce NOAHQA, a conversational and bilingual QA dataset with questions requiring numerical reasoning with compound mathematical expressions. With NOAHQA, we develop an interpretable reasoning graph as well as the appropriate evaluation metric to measure the answer quality. We evaluate the state-of-the-art QA models trained using existing QA datasets on NOAHQA and show that the best among them can only achieve 55.5 exact match scores, while the human performance is 89.7. We also present a new QA model for generating a reasoning graph where the reasoning graph metric still has a large gap compared with that of humans, eg, 28 scores.

NOAHQA: Numerical Reasoning with Interpretable Graph Question Answering Dataset

Understanding when a text snippet does not provide a sought after information is an essential part of natural language utnderstanding. Recent work (SQuAD 2.0; Rajpurkar et al., 2018) has attempted to make some progress in this direction by enriching the SQuAD dataset for the Extractive QA task with unanswerable questions. However, as we show, the performance of a top system trained on SQuAD 2.0 drops considerably in out-of-domain scenarios, limiting its use in practical situations. In order to study this we build an out-of-domain corpus, focusing on simple event-based questions and distinguish between two types of IDK questions: competitive questions, where the context includes an entity of the same type as the expected answer, and simpler, non-competitive questions where there is no entity of the same type in the context. We find that SQuAD 2.0-based models fail even in the case of the simpler questions. We then analyze the similarities and differences between the IDK phenomenon in Extractive QA and the Recognizing Textual Entailments task (RTE; Dagan et al., 2013) and investigate the extent to which the latter can be used to improve the performance.

Downloads

Next from EMNLP 2021

What If Sentence-hood is Hard to Define: A Case Study in Chinese Reading Comprehension

Similar lecture

Text Generation and Text-Editing Models

.css-70qvj9{display:-webkit-box;display:-webkit-flex;display:-ms-flexbox;display:flex;-webkit-align-items:center;-webkit-box-align:center;-ms-flex-align:center;align-items:center;}Downloads

Next from EMNLP 2021

What If Sentence-hood is Hard to Define: A Case Study in Chinese Reading Comprehension

Similar lecture

Text Generation and Text-Editing Models

Downloads