Thursday, January 23, 2025
HomeComputer ScienceCosine Similarity - GeeksforGeeks

Cosine Similarity – GeeksforGeeks

Cosine Similarity is a metric used to measure how similar two vectors (or text documents) are, by computing the cosine of the angle between them. It is widely used in text mining and natural language processing (NLP) to compare documents or sentences, based on the assumption that similar documents will have similar word distributions.

Mathematically, Cosine Similarity is defined as:

Cosine Similarity=A⋅B

See also  Difference Between OSI Model and TCP/IP Model

Steps for Calculating Cosine Similarity:

  1. Tokenization: Break down the text into words.
  2. Vectorization: Convert the text into a vector representation (e.g., TF-IDF, Bag of Words).
  3. Compute the Dot Product: Find the dot product of the two vectors.
  4. Find the Magnitudes: Calculate the magnitude (norm) of each vector.
  5. Compute Cosine Similarity: Apply the formula above.

Example:

Let’s say you have two vectors A=[1,2,3]

  1. Dot product: 1×4+2×5+3×6=32
  2. Magnitude of A 12+22+32=14
  3. Magnitude of B 42+52+62=77
  4. Cosine Similarity: 3214×77≈0.974
See also  Taking Input in Python

A cosine similarity value closer to 1 indicates that the vectors are very similar, while a value closer to 0 indicates they are dissimilar.

Use Cases:

  • Document similarity: Comparing two text documents.
  • Recommendation systems: Finding similar items.
  • Clustering: Grouping similar text documents.

GeeksforGeeks typically provides code examples in languages like Python to help understand the concept better. If you want, I can show you how to implement Cosine Similarity in Python!

RELATED ARTICLES

B-Trees

What is AWS Elasticsearch?

0 0 votes
Article Rating

Leave a Reply

0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
- Advertisment -

Most Popular

Recent Comments

0
Would love your thoughts, please comment.x
()
x