On the query reformulation technique for effective MEDLINE document retrieval
Improving the retrieval accuracy of MEDLINE documents is still a challenging issue due to low retrieval precision. Focusing on a query expansion technique based on pseudo-relevance feedback (PRF), this paper addresses the problem by systematically examining the effects of expansion term selection and adjustment of the term weights of the expanded query using a set of MEDLINE test documents called OHSUMED. Implementing a baseline information retrieval system based on the Okapi BM25 retrieval model, we compared six well-known term ranking algorithms for useful expansion term selection and then compared traditional term reweighting algorithms with our new variant of the standard Rocchio`s feedback formula, which adopts a group-based weighting scheme. Our experimental results on the OHSUMED test collection showed a maximum improvement of 20.2% and 20.4% for mean average precision and recall measures over unexpanded queries when terms were expanded using a co-occurrence analysis-based term ranking algorithm in conjunction with our term reweighting algorithm (p-value < 0.05). Our study shows the behaviors of different query reformulation techniques that can be utilized for more effective MEDLINE document retrieval. (c) 2010 Elsevier Inc. All rights reserved. ; This work was supported in part by a grant from the Advanced Biometric Research Center (ABRC) and the Korea Science and Engineering Foundation (KOSEF), and in part by the National Research Foundation of Korea (NRF) grant funded by the Korea Government (MEST) (No. 2009-0075089). ; Kilicoglu H, 2009, J AM MED INFORM ASSN, V16, P25, DOI 10.1197/jamia.M2996 ; MANNING CD, 2009, RELEVANCE FEEDBACK Q, P177 ; CRABTREE DW, 2007, P 13 ACM SIGKDD INT, P191 ; Chu WW, 2005, CONTROL ENG PRACT, V13, P1105, DOI 10.1016/j.conengprac.2004.12.011 ; Zazo NF, 2005, INFORM PROCESS MANAG, V41, P1163, DOI 10.1016/j.ipm.2004.05.006 ; Ely JW, 2005, J AM MED INFORM ASSN, V12, P217, DOI 10.1197/jamia.M1608 ; Aphinyanaphongs Y, 2005, J AM MED INFORM ASSN, V12, P207, DOI 10.1197/jamia.M1641 ; LIN J, 2005, P 28 ANN INT ACM SIG, P635 ; ANH VN, 2005, P 28 ANN INT ACM SIG, P226 ; WHITE RW, 2005, SIGIR FORUM, V39, P70 ; SAVOY J, 2005, 5 WORKSH CROSS LANG, P233 ; FANG H, 2004, P 27 ANN INT ACM SIG, P49 ; FAN W, 2004, P 27 ANN INT ACM SIG, P138 ; BILLERBECK B, 2004, P 15 AUSTR DAT C ADC, P69 ; Carpineto C, 2002, ACM T INFORM SYST, V20, P259 ; Nankivell C, 2001, MED EDUC, V35, P167 ; Carpineto C, 2001, ACM T INFORM SYST, V19, P1 ; CAI D, 2001, P 10 INT C INF KNOWL, P419 ; Xu JX, 2000, ACM T INFORM SYST, V18, P79 ; PONTE JM, 2000, ADV INF RETRIEVAL, P3 ; HERSH W, 2000, P AMIA S, P344 ; RICARDO A, 1999, MODERN INFORM RETRIE ; ROBERTSON S, 1999, P 8 TEXT RETR C TREC, P151 ; Haux R, 1996, INT J BIOMED COMPUT, V41, P69 ; Srinivasan P, 1996, J AM MED INFORM ASSN, V3, P157 ; Efthimiadis EN, 1996, ANNU REV INFORM SCI, V31, P121 ; ROBERTSON SE, 1996, P 4 TEXT RETR C TREC, P73 ; SRINIVASAN P, 1995, COMPUT SCI TECH REP, P1 ; HERSH W, 1994, P 17 ANN INT ACM SIG, P192 ; ARONSON AR, 1994, P RIAO 94, P197 ; YIMING Y, 1994, P 17 ANN INT ACM SIG, P13 ; EFTHIMIADIS EN, 1994, P 2 TEXT RETR C TREC, P200 ; YIMING Y, 1993, P 16 ANN INT ACM SIG, P281 ; DAVIS HL, 1993, P 8 GENT GARD C, P329 ; YANG Y, 1993, P 17 ANN S COMP APPL, P685 ; HARMAN D, 1992, P 15 ANN INT ACM SIG, P1 ; ROBERTSON SE, 1986, J DOC, V42, P182 ; WU H, 1981, P 4 ANN INT ACM SIGI ; ROCCHIO JJ, 1971, SMART RETRIEVAL SYST, P313 ; LOVINS JB, 1968, MECH TRANSL, V11, P22 ; 0