Establishing whether two text items have similar meanings is a central task in natural language processing. In linguistics, on the other hand, semantic similarity is less typically approached as a unique concept, despite being very much studied in relation to specific phenomena such as synonyms or diathesis alternations. In this talk, I will discuss the potential usefulness of tackling semantic similarity more broadly, in terms of linguistic analysis and with a view of contributing to natural language processing. I will present a taxonomy of semantic similarity types and indicators based on previously proposed classifications of paraphrase (Vila Rigat 2012, Vila et al. 2014, Milićević 2007, Mel’čuk 2012) and illustrate it through examples from two sets of Serbian data, newswire texts and software code comments (Miličević Petrović et al. 2022). A doubly cross-level perspective will be outlined, as the datasets contain pairs of texts of different lengths (phrase-sentence and sentence-paragraph), and similarity indicators from different levels of linguistic structure will be considered.
Mel’čuk, I. A. (2012). Semantics. From Meaning to Text. Amsterdam: John Benjamins.
Miličević Petrović, M., V. Batanović, B. Kovačević and R. Trnavac (2022). Cross-Level Semantic Similarity in newswire texts and software code comments: Insights from Serbian data in the AVANTES project. In D. Fišer and T. Erjavec (Eds), Proceedings of the Conference on Language Technologies and Digital Humanities. Ljubljana: Institute of Contemporary History. 124-131.
Milićević, J. (2007). La paraphrase. Bern: Peter Lang.
Vila Rigat, M. (2013). Paraphrase Scope and Typology. A Data-Driven Approach from Computational Linguistics. PhD dissertation, University of Barcelona.
Vila, M., A. Martí and H. Rodríguez (2014). Is this a paraphrase? What kind? Paraphrase boundaries and typology. Open Journal of Modern Linguistics 4. 205-218.
Zeit: 20. Oktober 2022, 17.15 Uhr, Ort: Merangasse 70, 1. Stock, Raum 33.1.224