Understanding the Relevance of Text Passages

Principal Investigator:
W. Bruce Croft, PI
[email protected]

Center for Intelligent Information Retrieval (CIIR)
College of Information and Computer Sciences
140 Governors Drive
University of Massachusetts
Amherst, MA 01003-9264

Project Goals

Some information retrieval (IR) queries can be best answered with a web page, others can be answered with a single fact or named entity. Many other queries could best be answered with a text passage and we propose to develop new techniques for this task that will be a significant improvement on the current state-of-the-art. Developing effective passage retrieval would have a major effect on search tools by greatly extending the range of queries that could be answered directly using text passages retrieved from the web. This is particularly important for mobile search applications with limited output bandwidth based on using either a small screen or speech output. In this case, the ability to use passages to reduce the amount of output while maintaining high relevance will be critical.

We are studying research issues that have either been ignored, or only partially addressed, in prior research, such as showing whether passages can be better answers than documents for some queries, predicting which queries have good answers at the passage level, ranking passages to retrieve the best answers, and evaluating the effectiveness of passages as answers. To address these issues, we are developing new retrieval models that can define and rank “answers” for different text granularities such as sentences and passages, models of query properties that are associated with good passage-level answers, and models that differentiate between topicality and information content. Understanding the relevance of text passages will also involve obtaining new types of relevance assessments at passage granularity, and developing new evaluation metrics that combine relevance with the size of the result output.

Significant Results:

For the 2018 reporting period, we have continued to study neural IR models and representations for the tasks of passage retrieval, question answering and, more recently conversational IR. Retrieving short text passages as answers to explicit queries is rapidly becoming the most important application environment for the techniques we have been developing in this project. We have published a range a papers at major conference venues and produced new datasets that will be distributed to the research community.

For the 2017 reporting period, we finished two research efforts related to this project and submitted papers on them. One effort resulting in a publication is “Using Key Concepts in a Translation Model for Retrieval” by Park and Croft, which will be presented at SIGIR this year. In this work, we studied how identifying the important terms in a query could help us to identify answer passages in documents. More specifically, many queries, especially those in the form of longer questions, contain a subset of terms representing key concepts that describe the most important part of the user's information need. Detecting the key concepts in a query can be used as the basis for more effective weighting of query terms, but in this work, we focus on a method of using the key concepts in a translation model for query expansion and retrieval. Translation models have been used previously in community-based question answering (CQA) systems in order to bridge the semantic gap between questions and the corresponding answer documents. Our method uses the key concepts of a question as the translation context and selectively applies the translation model to the secondary (non-key) parts of the question. We evaluate the proposed method using a CQA collection and show that selectively translating key and secondary concepts can significantly improve the retrieval performance compared to a baseline that applies the translation model without considering key concepts.

In 2016, the main research result was that we made significant progress in the development of neural network models for the non-factoid question answering task. We produced a new paper on this work in 2017.

In 2016 a new dataset was made freely available to researchers in the realm of question answering tasks. The Web Answer Passages Dataset (WebAP) is based on the 2004 TREC Terabyte Track Gov2 collection and contains 8,027 answer passages to 82 TREC queries. Answer passages are annotated with four quality measures.

Students Involved in the Project:

Liu Yang

Publications:

Park, J., and Croft, W.B. “Using Key Concepts in a Translation Model for Retrieval”, in the Proceedings of the 38th Annual ACM SIGIR Conference (SIGIR 2015), Santiago, Chile, 2015, pp. 927-930.

Chen, R-C., Spina, D., Croft, W.B., and Sanderson, M., "Harnessing Semantics for Answer Sentence Retrieval," Proceedings of the Eighth Workshop on Exploiting Semantic Annotations in Information Retrieval (ESAIR '15), pp. 21-27, 2015.

Yang, L., Guo, Q., Song, Y., Meng, S., Shokouhi, M., McDonald, K. and Croft, W. B. , "Modeling User Interests for Zero-query Ranking," in the Proceedings of the 38th European Conference on Information Retrieval (ECIR 2016), Padova, Italy, March 20-23, 2016, pp. 171-184.

Yang, L., Ai, Q., Spina, D., Chen, R., Pang, L., Croft, W. B. , Guo, J. and Scholer, F., "Beyond Factoid QA: Effective Methods for Non-factoid Answer Sentence Retrieval," in The Proceedings of 38th European Conference on Information Retrieval (ECIR 2016), Padova, Italy, March 20-23, 2016, pp. 115-128.

Ai, Q., Yang, L., Guo, J. and Croft, W. B. , "Improving Language Estimation with the Paragraph Vector Model for Ad-hoc Retrieval," in the Proceedings of the 39th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 16), Pisa, Italy, pp. 869-872.

Cohen, D. and Croft, W. B. , "End to End Long Short Term Memory Networks for Non-Factoid Question Answering," in the Proceedings of the 2nd ACM International Conference on the Theory of Information Retrieval University of Delaware, Newark, DE, USA September 12-16, 2016, pp. 143-146.

Yang, L., Ai, Q., Guo, J. and Croft, W. B. , "aNMM: Ranking Short Answer Texts with Attention-Based Neural Matching Model," in the Proceedings of the ACM International Conference on Information and Knowledge Management (CIKM'16), Indianapolis, USA, October 24-28, 2016, pp. 287-292.

Ai, Q., Yang, L., Guo, J. and Croft, W. B. , "Analysis of the Paragraph Vector Model for Information Retrieval," Proceedings of the 2nd ACM International Conference on the Theory of Information Retrieval (ICTIR 16), Newark, DE, USA, September 12-16, 2016, pp. 133-142.

Yang, L., Dumais, S., Bennett, P. and Awadallah, A., "Characterizing and Predicting Enterprise Email Reply Behavior," in the Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '17) Tokyo, Japan, August 7-11, 2017,pp. 235-244.

Yang, L., Zamani, H., Zhang, Y., Guo, J. and Croft, W. B. , "Neural Matching Models for Question Retrieval and Next Question Prediction in Conversation," in Neu-IR 2017: The SIGIR 2017 Workshop on Neural Information Retrieval (SIGIR Neu-IR 2017), Tokyo, Japan, August 7-11, 2017.

Cohen, D. and Croft, W. B. , "A Hybrid Embedding approach for Noisy Answer Passage Retrieval)A Flexible Character-Based CNN-LSTM Model for CQA and Passage Retrieval," in Proceedings of the European Conference on Information Retrieval (ECIR 18), Grenoble, France, March 26-29, 2018, pp. 127-140.

Cohen, D., Croft, W. B. and Yang, L., "WikiPassageQA: A Benchmark Collection for Research on Non-factoid Answer Passage Retrieval.," in the Proceedings of the 41st International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '18) Ann Arbor, Michigan, USA, July 8-12, 2018, pp. 165-168

Zamani, H., Croft, W. B. and Culpepper, J., "Neural Query Performance Prediction using Weak Supervision from Multiple Signals," in the Proceedings of the 41st International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '18) Ann Arbor, Michigan, USA, July 8-12, 2018, pp. 105-114.

Yang, L., Qiu, M., Qu, C., Guo, J., Zhang, Y., Croft, W. B. , Huang, J. and Chen, H., "Response Ranking with Deep Matching Networks and External Knowledge in Information-seeking Conversation Systems," in the Proceedings of the 41st International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '18) Ann Arbor, Michigan, USA, July 8-12, 2018, pp. 245-254.

Qu, C., Yang, L., Croft, W. B. , Trippas, J., Zhang, Y. and Qui, M., "Analyzing and Characterizing User Intent in Information-seeking Conversations," in the Proceedings of the 41st International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '18), Ann Arbor, Michigan, USA, July 8-12, 2018, pp. 989-992.

Qiu, M., Yang, L., Feng, J., Zhou, W., Huang, J., Chen, H., Croft, W. B. and Lin, W., "Transfer Learning for Context-Aware Question Matching in Information-seeking Conversation Systems in E-commerce," in the Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, July 15-20, 2018, Melbourne, Australia, p. 2034.

This work is supported in part by the Center for Intelligent Information Retrieval (CIIR) and in part by the National Science Foundation (NSF IIS-1419693).