Positional Embeddings

Positional Embeddings#

Cosine Similarity#

Cosine similarity measures how closely 2 vectors align in direction, regardless of magnitude. It is calculated as the cosine of the angle between the two vectors. The formula for cosine similarity is:

\[ \text{cosine similarity} = \frac{A \cdot B}{\|A\| \|B\|} \]

where \(A\) and \(B\) are the two vectors, and \(\|A\|\) and \(\|B\|\) are the magnitudes of the vectors, which can be calculated as:

\[ \|A\| = \sqrt{A \cdot A} \]

\[\|B\| = \sqrt{B \cdot B} \]

Numerator#

The numerator of the cosine similarity formula is the dot product of the two vectors. The dot product measures the extent to which 2 vectors point in the same direction. The dot product is obviously unbounded, so by itself, it is not a good measure of similarity.

Denominator#

The denominator of the cosine similarity formula is the product of the magnitudes of the two vectors. The magnitude of a vector is the length of the vector, which can be calculated using the Pythagorean theorem. The magnitudes represent the lengths of the vectors, calculated as the square root of the sum of the squares of their components. The denominator normalizes the dot product by the magnitudes of the vectors, so that the cosine similarity is bounded between -1 and 1. Multiplying the magnitudes normalizes the dot product, ensuring that the cosine similarity is independent of the vectors’ lengths. This normalization allows the similarity measure to focus solely on the direction of the vectors rather than their scale.

Example#

Consider 2 vectors, \(A = [1,1]\) and \(B = [10, 10]\). The dot product of the two vectors is:

\[ A \cdot B = 1 \cdot 10 + 1 \cdot 10 = 20 \]

The magnitudes of the two vectors are:

\[ \|A\| = \sqrt{1^2 + 1^2} = \sqrt{2} \]

\[ \|B\| = \sqrt{10^2 + 10^2} = \sqrt{200} \]

The cosine similarity of the two vectors is:

\[ \text{cosine similarity} = \frac{20}{\sqrt{2} \sqrt{200}} = \frac{20}{\sqrt{400}} = \frac{20}{20} = 1 \]

The cosine similarity of the two vectors is 1, indicating that the two vectors are identical in direction. The cosine similarity is independent of the scale of the vectors, so the similarity is 1 even though the vectors have different magnitudes.

import numpy as np

A = np.array([1,1])
B = np.array([10,10])

AdotB = np.dot(A,B)

AdotB

np.int64(20)

# euclidean norm of A and B
Anorm = np.linalg.norm(A)
Bnorm = np.linalg.norm(B)

Anorm * Bnorm

np.float64(20.000000000000004)

AdotB / (Anorm * Bnorm)

np.float64(0.9999999999999998)