.
ABSTRACT:
Information retrieval addresses the problem of finding those documents whose content matches a user's request from among a large collection of documents. Currently, the most successful general purpose retrieval methods are statistical methods that treat text as little more than a bag of words. However, attempts to improve retrieval performance through more sophisticated linguistic processing have been largely unsuccessful. Indeed, unless done carefully, such processing can degrade retrieval effectiveness.
Several factors contribute to the difficulty of improving on a good statistical baseline including: the forgiving nature but broad coverage of the typical retrieval task; the lack of good weighting schemes for compound index terms; and the implicit linguistic processing inherent in the statistical methods. Natural language processing techniques may be more important for related tasks such as question answering or document summarization.
.
References:
- Sparck Jones, K., Willett, P. (eds.): Readings in Information Retrieval. Morgan Kaufmann, San Franciso (1997).
- Salton, G. Wong, A., Yang, C.S.: A Vector Space Model for Automatic Indexing. Communications of the ACM. 18 (1975) 613-620.
- Sparck Jones, K.: Further Reflections on TREC. Information Processing and Management. (To appear).
- Sparck Jones, K.: What is the Role of NLP in Text Retrieval? In: Strzalkowski, T. (ed.): Natural Language Information Retrieval. Kluwer (In press).
- Perez-Carballo, J., Strzalkowski, T.: Natural Language Information Retrieval: Progress Report. Information Processing and Mangement. (To appear).
- D'Amore, R.J., Mah, C.P.: One-Time complete Indexing of Text: Theory and Practice. Proceedings of the Eighth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM Press (1985) 155-164.
- Cormack, G.V., Clarke, C.L.A., Palmer, C.R., To, S.S.L.: Passage-Based Query Refinement. Information Processing and Management. (To appear).
- Strzalkowski, T.: NLP Track at TREC-5. Proceedings of the Fifth Text REtrieval Conference (TREC-5). NIST Special Publication 500-238 (1997), 97-101. Also at http://trec.nist.gov/pubs.html
- Fellbaum, C. (ed.): WordNet: An Electronic Lexical Database. MIT Press (1998).
- Voorhees, E.M.: Using WordNet to Disambiguate Word Senses for Text Retrieval. Proceedings of the Sixteenth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM Press (1993) 171-180.
- Voorhees, E.M.: Using WordNet for Text Retrieval. In: Fellbaum, C. (ed.): Word-Net: An Electronic Lexical Database. MIT Press (1998) 285-303.
- Rau, L.F.: Conceptual Information Extraction and Retrieval from Natural Language Input. In: Sparck Jones, K., Willett, P. (eds.): Readings in Information Retrieval. Morgan Kaufmann, San Franciso (1997) 527-533.
- Mauldin, M.L.: Retrieval Performance in FERRET. Proceedings of the Fourteenth Annual International ACM-SIGIR Conference on Research and Development in Information Retrieval. ACM Press (1991) 347-355.
- Deerwester, S., Dumais, S.T., Furnas, G.W., Landauer, T.K., Harshman, R.: Indexing by Latent Semantic Analysis. Journal of the American Society for Information Science. 41 (1990) 391-407.
- Fox, E.A.: Extending the Boolean and Vector Space Models of Information Retrieval with P-Norm Queries and Multiple Concept Types. Unpublished doctoral dissertation, Cornell University, Ithaca, NY. University Microfilms, Ann Arbor, MI.
- Sanderson, M.: Word Sense Disambiguation and Information Retrieval. Proceedings of the Seventeenth Annual International ACM-SIGIR Conference on Research and Development in Information Retrieval. Springer-Verlag (1994) 142-151.
- Krovetz, R., Croft, W.B.: Lexical Ambiguity in Information Retrieval. ACM Transactions on Information Systems. 10 (1992) 115-141.
- Leacock, C., Towell, G., Voorhees, E.M.: Towards Building Contextual Representations of Word Senses Using Statistical Models. In: Boguraev, B., Pustejovsky, J. (eds.): Corpus Processing for Lexical Acquisition. MIT Press (1996) 98-113.
- Paik, W., Liddy, E.D., Yu, E., Mckenna, M.: Categorizing and Standardizing Proper Nouns for Efficient Information Retrieval. In: Boguraev, B., Pustejovsky, J. (eds.): Corpus Processing for Lexical Acquisition. MIT Press (1996) 61-73.
- Burger, J.D., Aberdeen, J.S., Palmer, D.D.: Information Retrieval and Trainable Natural Language Processing. Proceedings of the Fifth Text REtrieval Conference (TREC-5). NIST Special Publication 500-238 (1997), 433-435. Also at http://trec.nist.gov/pubs.html
- Hull, D.A., Grefenstette, G., Schulze, B.M., Gaussier, E., Schütze, H., Pedersen, J.O.: Xerox TREC-5 Site Report: Routing, Filtering, NLP, and Spanish Tracks. Proceedings of the Fifth Text REtrieval Conference (TREC-5). NIST Special Publication 500-238 (1997), 167-180. Also at http://trec.nist.gov/pubs.html
- Zhai, C., Tong, X., Milic-Frayling, N., Evans, D.A.: Evaluation of Syntactic Phrase Indexing--CLARIT NLP Track Report. Proceedings of the Fifth Text REtrieval Conference (TREC-5). NIST Special Publication 500-238 (1997), 347-357. Also at http://trec.nist.gov/pubs.html
- Strzalkowski, T., Guthrie, L., Karlgren, J., Leistensnider, J., Lin, F., Perez-Carballo, J., Straszheim, T., Wang, J., Wilding, J.: Natural Language Information Retrieval: TREC-5 Report. Proceedings of the Fifth Text REtrieval Conference (TREC-5). NIST Special Publication 500-238 (1997), 291-313. Also at http://trec.nist.gov/pubs.html
- Taghva, K., Borsack, J., Condit, A.: Results of Applying Probabilistic IR to OCR Text. Proceedings of the Seventeenth Annual International ACM-SIGIR Conference on Research and Development in Information Retrieval. Springer-Verlag, (1994) 202-211.
- Kantor, P.B., Voorhees, E.M.: Report on the TREC-5 Confusion Track. Proceedings of the Fifth Text REtrieval Conference (TREC-5). NIST Special Publication 500-238 (1997), 65-74. Also at http://trec.nist.gov/pubs.html
- Garofolo, J., Voorhees, E.M., Auzanne, C.G.P., Stanford, V.M., Lund, B.A.: 1998 TREC-7 Spoken Document Retrieval Track Overview and Results. Proceedings of the Seventh Text REtrieval Conference (TREC-7). (In press.) Also at http://trec.nist.gov/pubs.html
- Buckley, C., Mitra M., Walz, J., Cardie, C.: Using Clustering and SuperConcepts Within SMART: TREC 6. Proceedings of the Sixth Text REtrieval Conference (TREC-6). NIST Special Publication 500-240 (1998), 107-124. Also at http://trec.nist.gov/pubs.html
- Mani, I., House, D., Klein, G., Hirschman, L., Obrst, L., Firmin, T., Chrzanowski, M., Sundheim, B.: The TIPSTER SUMMAC Text Summarization Evaluation Final Report. MITRE Technical Report MTR 98W0000138. McLean, Virginia (1998). Also at http://www.nist.gov/itl/div894/894.02/related_projects/tipster_summac/ final_rpt.html
.
https://dl.acm.org/doi/10.5555/645857.669935