Swiftorial Logo
Home
Swift Lessons
Tutorials
Learn More
Career
Resources

Term Frequency Tutorial

What is Term Frequency?

Term Frequency (TF) is a measure that calculates how frequently a term occurs in a document. It is a fundamental concept in the field of text mining and information retrieval. The basic idea is that the more a term appears in a document, the more important it is likely to be.

Understanding Term Frequency

Term Frequency can be calculated using the following formula:

TF = (Number of times term t appears in a document) / (Total number of terms in the document)

This formula normalizes the frequency of the term by the total number of terms in the document, which helps in comparing the relevance of terms across documents of different lengths.

Example of Term Frequency Calculation

Let’s consider a simple example. Suppose we have the following document:

"The cat sat on the mat. The mat was warm."

In this document, let's calculate the term frequency for the term "mat".

  • The term "mat" appears 2 times.
  • The total number of terms in the document is 10.

Using the TF formula, we can calculate:

TF(mat) = 2 / 10 = 0.2

This means that the term "mat" constitutes 20% of the total terms in this document.

Term Frequency in R Programming

In R, we can easily calculate term frequency using the tm package, which is designed for text mining. Below is an example of how to calculate term frequency for a given document:

library(tm)
text <- "The cat sat on the mat. The mat was warm."
corpus <- Corpus(VectorSource(text))
tdm <- TermDocumentMatrix(corpus)
inspect(tdm)

The output will show the term frequency for each term in the document.

Term Document Matrix (terms are rows, documents are columns):

mat 1

sat 1

the 2

warm 1

cat 1

on 1

From the output, you can see how many times each term appears in the document.

Conclusion

Term Frequency is a crucial concept in text mining that helps in understanding the significance of terms in documents. By calculating TF, one can derive insights into the content and focus of the text. In R, the use of packages like tm makes it straightforward to compute term frequency and analyze text data efficiently.