Leveraging TF-IDF Matrix for Document  Clustering with K-Means Algorithm

Shilpi Kulshrestha; Dharmesh Santani

doi:10.38124/ijsrmt.v3i10.61

Authors

Shilpi Kulshrestha Department of CSE, Jaipur National University, Jaipur, India
Dharmesh Santani Department of CSE, Jaipur National University, Jaipur, India

DOI:

https://doi.org/10.38124/ijsrmt.v3i10.61

Keywords:

Document Clustering,, TF- IDF Matrix, K-Means Algorithm,, Evaluation Metrics, Text pre-processing

Abstract

Document clustering is an important task for information retrieval, it aims for grouping of similar kind of documents together for efficient organization and retrieval. This paper presents a new approach for document clustering by combination of the Term Frequency-Inverse Document Frequency (TF-IDF) matrix with the K-Means algorithm. The Proposed system overcomes the obstacles of the traditional methods integrating TF-IDF matrices to convey document semantics and K-Means clustering to get homogeneous document clusters. Key components of the system include text pre-processing techniques such as stop-word removal, stemming, and tokenization, which improve the quality of TF-IDF representations. Additionally, evaluation metrics like purity, F-measure, and silhouette score are applied to evaluate the system’s clustering performance. Our proposed approach shows that it is feasible to process large volumes of documents and at the same time ensuring robustness by discarding outliers and noisy data in the data. The obtained results upon a benchmark dataset demonstrate the superiority of suggested approach in comparison to the baseline techniques and these results underline the effectiveness of the proposed method in terms of the efficiency of the document clustering and facilitating the streamlined document organization and retrieval in different domains.

Downloads

Download data is not yet available.

Leveraging TF-IDF Matrix for Document Clustering with K-Means Algorithm

Authors

DOI:

Keywords:

Abstract

Downloads

Downloads

Published

How to Cite

Issue

Section

License

Similar Articles

Make a Submission

Latest Published Issue

Announcements

Current Issue

Browse

Information

Explore

Join as a Reviewer