Three Word-Sense Disambiguation Algorithms
Questo articolo è disponibile in italiano
Word sense disambiguation (w.s.d.) is that part of natural language processing that deals with attributing to a word (token) its meaning in the context in which it appears.
In this article there are presented three probabilistic algorithms to solve this problem, together with their implementation in Python.
The algorithms are:
- Naive Bayes;
- Lesk algorithm;
- EM (Expectation-Maximization) algorithm;
and they are tested on this corpus appropriately pre-processed the code available in corpus_preprocessing.py.
Naive Bayes Algorithm
This algorithm belongs to the supervised learning algorithm class and it’s based on three factors:
w_i: \text{word} \\ s_i: \text{sense} \\ f_i: \text{feature}
To simplify, we define a feature fi of a word wi all the other words wj that appear in the same sentence; with i ≠ j.
count(x): \text{number of occurrences of } x \\ count(s_i, w_j): \text{number of times in which } s_i \text{ is attributed to } w_j \\ count(f_j, s_i): \text{number of times in which } f_j \text{ appears in a } \\ \text{sentence in which there's a word that } s_i \text{ is attributed to}
So we have:
P(s_i)=\frac{count(s_i,w)}{count(w)} \\ P(f_j|s_i)=\frac{count(f_j,s_i)}{count(s_i)} \\ \hat{s} = \argmax_{i \in S} P(s_i) \prod_j P(f_j,s_i)
The result of the last expression represents the most suitable meaning for the word w.
Python implementation: naive_bayes.py
Lesk Algorithm
The Lesk algorithm uses a dictionary to disambiguate words: WordNET is the database we need.
The concept behind Lesk’s algorithm is the overlapping between the dictionary definition and the context in which the word to be disambigured appears. More precisely, the meaning given by the algorithm is the one in whose definition (which can be extended with some example sentences) appears the largest number of words (features) of the context.
\overrightarrow{d_s}: \text{words that make up the definition of } s \\ \overrightarrow{e_s}: \text{words that make up the examples for } s \\ \overrightarrow{f}: \text{words that appear in the context} \\ \hat{s} = \argmax_{s \in S} |(\overrightarrow{d_s} \cup \overrightarrow{e_s}) \cap \overrightarrow{f}|
Python implementation: lesk.py
EM Algorithm
The Expectation-Maximization algorithm is part of the unsupervised learning algorithm class.
The goal is to maximize the likelihood of the corpus:
l(C) = log(\prod_i P(c_i))
where C is the corpus and ci are the individual contexts that make up the corpus, i.e. the sentences.
P(c_i) = \sum_k P(c_i|s_k)P(s_k) \\ P(c_i|s_k) = \prod_{f_j \in c_i} P(f_j|s_k)
At the beginning the parameters P(sk) and P(fj|sk) are initialized randomly. As the name suggests, the iterative algorithm is composed of the phases:
- E-step: esitimate the posterior probability that the sense sk generates the context ci.
h_{ik} = \frac{P(c_i|s_k)}{\sum_{k=1}^{K}(P(c_i|s_k)}
- M-step: recalculation of the parameters P(sk) and P(fj|sk).
P(s_k) = \frac{\sum_{i=1}^{I}h_{ik}}{\sum_{k=1}^{K}\sum_{i=1}^{I}h_{ik}} \\ P(w_j|s_k) = \frac{\sum_{\{c_i|w_j \in c_i\}}h_{ik}}{Z_k} \\ Z_k = \sum_{k=1}^{K} \sum_{\{c_i|w_j \in c_i\}}h_{ik}
The iterative cycle continues until l(C) is improved.
Python implementation: em.py
Notes
Although I have not found any errors in the source code except those reported with the comments in the files themselves, I do not guarantee their complete correctness.
References
- Alessandro Raganato, Jose Camacho-Collados and Roberto Navigli.Word Sense Disambiguation: A Unified Evaluation Framework and Empirical Comparison. Proceedings of EACL 2017, Valencia, Spain.
- Word Sense Disambiguation (WSD) – UAIC
Lorenzo Vainigli
Android developer and passionate about computers and technology since I was a child.
I have a master's degree in Computer Science obtained at the University of Bologna.
I like astronomy 🌌 and Japanese culture ⛩️.
Here I tell more about myself.