Time and Semantic Similarity – What is the Best Alternative to Capture Implicit Links in CSCL Conversations?
In: info:eu-repo/grantAgreement/EC/H2020/644187/EU/Realising an Applied Gaming Eco-system/RAGE
The goal of our research is to compare novel semantic techniques for identifying implicit links between utterances in multi-participant CSCL chat conversations. Cohesion, reflected by the strength of the semantic relations behind the automatically identified links, is assessed using WordNet-based semantic distances, as well as unsupervised semantic models, i.e. Latent Semantic Analysis (LSA) and Latent Dirichlet Allocation (LDA). The analysis is built on top of the ReaderBench framework and multiple identification heuristics were compared, including: semantic cohesion metrics, normalized cohesion measures and Mihalcea's formula. A corpus of 55 conversations in which participants used explicit links between utterances where they considered necessary for clarity was used for validation. Our study represents an in-depth analysis of multiple methods used to identify implicit links and reveals the accuracy of each technique in terms of capturing the explicit references made by users. Statistical similarity measures ensured the best overall identification accuracy when using Mihalcea's formula, while WordNet-based techniques provided best results for un-normalized similarity scores applied on a window of 5 utterances and a time frame of 1 minute. ; This study is part of the RAGE project. The RAGE project has received funding from the European Union's Horizon 2020 research and innovation programme under grant agreement No 644187. This publication reflects only the author's view. The European Commission is not responsible for any use that may be made of the information it contains.