More on Mutual Information
Below is the code for WAPMI or the Weighted Average Point-wise Mutual Information. And this code measures the distance between two probabilities distributions.
It is a method used in computational linguistics to measure the strength of association between words in a given context, typically in the analysis of text data.
Imagine you have a box of different colored marbles, and you want to know which colors tend to appear together. WAPMI helps you figure out how often certain words (or colors) appear together more often than by random chance. It’s like a smart way to understand word relationships in sentences!
1 | import math |
A walkthrough of the code part by part.
Imports
1 | import math |
import math
: Imports themath
module, which provides mathematical functions such as logarithms.import collections
: Imports thecollections
module, which provides specialized container datatypes, likedefaultdict
.
Function Definitions
1. calculate_pmi
Function
1 | def calculate_pmi(word1, word2, joint_prob, marginal_prob1, marginal_prob2): |
- Purpose: Calculates the Pointwise Mutual Information (PMI) score between two words.
- Parameters:
word1
andword2
: Words for which PMI is calculated.joint_prob
: Probability of both words appearing together.marginal_prob1
: Probability ofword1
appearing.marginal_prob2
: Probability ofword2
appearing.
pmi
Calculation:math.log(joint_prob / (marginal_prob1 * marginal_prob2), 2)
: Computes the logarithm (base 2) of the ratio of the joint probability to the product of the marginal probabilities.
- Returns: PMI score.
2. calculate_joint_prob
Function
1 | def calculate_joint_prob(word1, word2, corpus): |
- Purpose: Calculates the joint probability of two words appearing together in the same sentence.
- Parameters:
word1
andword2
: Words to check.corpus
: List of sentences (each sentence is a list of words).
joint_count
: Counts how many sentences contain bothword1
andword2
.joint_prob
Calculation: Dividesjoint_count
by the total number of sentences to get the joint probability.- Returns: Joint probability of the two words.
3. calculate_marginal_prob
Function
1 | def calculate_marginal_prob(word, corpus): |
- Purpose: Calculates the marginal probability of a single word.
- Parameters:
word
: The word for which probability is calculated.corpus
: List of sentences.
word_count
: Counts how many sentences containword
.marginal_prob
Calculation: Dividesword_count
by the total number of sentences to get the marginal probability.- Returns: Marginal probability of the word.
4. calculate_pmi_corpus
Function
1 | def calculate_pmi_corpus_optimized(corpus): |
- Purpose: Calculates PMI scores for all pairs of words in the corpus.
- Parameters:
corpus
: List of sentences.
word_counts
: Adefaultdict
to count occurrences of each word.- Count Words: Iterates over each sentence to count occurrences of each word.
- Compute PMI:
- Iterates over all pairs of words (excluding pairs where
word1
is the same asword2
). - Calculates joint and marginal probabilities, then computes PMI for each pair.
- Iterates over all pairs of words (excluding pairs where
- Returns: A dictionary of PMI scores for all word pairs.
Example Usage
1 | corpus = [ |
- Purpose: Runs the PMI calculations on a sample corpus and prints the PMI scores for all word pairs.
Summary
calculate_pmi
computes PMI given probabilities.calculate_joint_prob
finds how often two words appear together.calculate_marginal_prob
finds how often one word appears.calculate_pmi_corpus
calculates PMI for all word pairs in a corpus.
This code helps measure how strongly two words are associated compared to what you would expect by chance.
Comparing Mutural Informaiton, WAPMI, VAE, and KL Divergence.
Let’s break down Wappmi, mutual information, and how they relate to Variational Autoencoders (VAEs) and KL divergence in simple terms, like you’re five.
Mutual Information
Imagine you have two sets of toys, and you want to know how much one set tells you about the other. Mutual information is a way to measure how much knowing about one set of toys helps you predict what’s in the other set. If you always find a red truck when you find a blue car, that’s high mutual information.
Wappmi (Weighted Average Prediction Pointwise Mutual Information)
Now, Wappmi takes this idea of mutual information and looks at words in sentences. It asks, “How often do these words appear together, and is it more than just by chance?” It’s like seeing if certain toys always end up next to each other more than they would randomly.
Variational Autoencoders (VAEs) and KL Divergence
Imagine you have a machine that tries to guess which toys you might pull out of a box based on previous toys. VAEs are like that machine—they learn patterns to predict what comes next.
The KL Divergence (Kullback-Leibler Divergence) is a way for the machine to measure how different its guesses (predictions) are from what actually happens. It helps the machine get better by adjusting its guesses to be more like what it observes.
How They Relate
- Mutual Information: Helps understand how related different pieces of data are, like words in a sentence.
- Wappmi: Uses the idea of mutual information to find strong word pairings in text.
- VAE: Tries to model data (like sentences) to understand it better.
- KL Divergence: Helps the VAE improve its understanding by comparing its guesses to reality and adjusting accordingly.
Example Code Walkthrough
Let’s imagine you have a simple code where you’re trying to guess if two words often appear together:
1 | import collections |
This code checks how often pairs of words (or symbols) appear together. If “new” and “news” show up together a lot, the program will tell you. This is similar to how Wappmi works.
In a VAE, you might use something like KL divergence to measure how far off your model’s guesses are from reality, then adjust it to improve. The model might then get better at understanding the relationships between words (like mutual information).
This approach helps in building smarter models that can understand and predict data more accurately.
Simple code demonstration of each one of them
1. Mutual Information
Concept:
Mutual Information (MI) measures how much knowing one random variable reduces uncertainty about another. It quantifies the “shared information” between two variables.
Code Example:
1 | from sklearn.metrics import mutual_info_score |
Explanation:
In this example, mutual_info_score
from sklearn
calculates the MI between two lists X
and Y
. MI quantifies how much knowing X
helps predict Y
.
2. WAPPMI (Weighted Average Pointwise Mutual Information)
Concept:
WAPPMI is used in NLP to find out how often words co-occur in a text corpus beyond what would be expected by chance. It gives more weight to frequently co-occurring pairs.
Code Example:
1 | import collections |
Explanation:
The PMI function calculates the pointwise mutual information between word pairs in the corpus
. WAPPMI extends this by weighting these values.
3. Variational Autoencoder (VAE)
Concept:
VAEs are used to generate data that’s similar to a given dataset. They work by learning a probabilistic model of the data and then sampling from it.
Code Example:
1 | import torch |
Explanation:
The VAE class in this example defines an encoder and decoder. The encoder outputs a mean (mu
) and log-variance (logvar
) for a latent variable z
. The decoder reconstructs the input data from z
. The KL divergence between the latent distribution and a normal distribution helps regularize the model.
4. KL Divergence
Concept:
KL Divergence measures how one probability distribution diverges from a second, expected probability distribution. It’s used in VAEs to ensure that the latent variables follow a desired distribution (usually a Gaussian).
Code Example:
1 | import torch |
Explanation:
This code calculates the KL Divergence for a batch of latent variables with mu
and logvar
as parameters. It shows how much the approximate posterior (the learned distribution) diverges from the prior (standard normal distribution).
Comparison:
- Mutual Information tells us how much knowing one variable helps us know another.
- WAPPMI applies this idea to words in text, showing how often they appear together.
- VAEs are generative models that learn to create data similar to a training set. They use KL Divergence to ensure the generated data follows a specific distribution.
Overall, these concepts are connected through their use in understanding and modeling the relationships within data, whether through measuring information (MI, WAPPMI) or generating and adjusting data distributions (VAEs, KL Divergence).