July 3, 2013 — Two Brazilian physicists have now devised a method to automatically elucidate the meaning of words with several senses, based solely on their patterns of connectivity with nearby words in a given sentence — and not on semantics. Thiago Silva and Diego Amancio from the University of São Paulo, Brazil, reveal, in a paper about to be published in The European Physical Journal B,, how they modelled classics texts as complex networks in order to derive their meaning. This type of model plays a key role in several natural processing language tasks such as machine translation, information retrieval, content analysis and text processing.
In this study, the authors chose a set of ten so-called polysemous words — words with multiple meanings — such as bear, jam, just, rock or present. They then verified their patterns of connectivity with nearby words in the text of literary classics such as Jane Austen’s Pride and Prejudice. Specifically, they established a model that consisted of a set of nodes representing words connected by their “edges,” if they are adjacent in a text.
The authors then compared the results of their disambiguation exercise with the traditional semantic-based approach. They observed significant accuracy rates in identifying the suitable meanings when using both techniques. The approach described in this study, based on a so-called deterministic tourist walk characterisation, can therefore be considered a complementary methodology for distinguishing between word senses.
In future works, the authors are planning to devise new measures to connect not only adjacent words, but also words within a given interval in order to enhance the ability of the model to grasp semantic factors. This approach is supported by another recent study by the same authors, showing that traditional complex network measures mainly depend on the syntax.
- Thiago C. Silva, Diego R. Amancio. Discriminating word senses with tourist walks in complex networks. The European Physical Journal B, 2013; 86 (7) DOI: 10.1140/epjb/e2013-40025-4