What is it about?
Clustering is an important technique in data science that groups similar data points together. It has many applications like separating different types of tissues in medical images or finding relevant documents for a search query. Existing clustering methods often struggle with complex data that has arbitrary cluster shapes, varying densities, or unbalanced classes. This paper presents a new clustering algorithm called DenMune that handles these challenges well. It works by first identifying dense data regions based on mutual nearest neighborhoods. These dense points act as seeds that grow into full clusters. Weak points either join existing clusters or are removed as noise. Compared to other popular clustering algorithms, DenMune performs better on synthetic and real-world datasets with complex cluster structures. It automatically detects the number of clusters, handles noise, and is stable across parameter changes. The algorithm is simple to implement and fast to run. By improving clustering of complex datasets, this work can enable more accurate data mining in fields like biomedicine, fraud detection, and search engines.
Featured Image
Photo by Kier in Sight Archives on Unsplash
Why is it important?
- This clustering algorithm is novel in its use of mutual nearest neighbors to identify dense cluster seeds in a robust, parameter-free way. This approach sets it apart from other density-based methods. - The ability to accurately cluster complex datasets with arbitrary shapes, varying densities, and unbalanced classes addresses an ongoing challenge in the field. Many existing methods still struggle with these issues. - Clustering is a timely technique that continues to enable progress in critical applications like medical imaging, cybersecurity, and search engines. Improvements to clustering, especially for complex data, can directly impact these domains. - This research comes when larger, messier datasets are becoming more common. DenMune's robustness to noise and varying densities makes it well-suited for modern big data. - By requiring only one parameter and automatically detecting the number of clusters, DenMune simplifies the clustering process compared to algorithms needing extensive parameter tuning. This makes clustering more accessible. - The algorithm consistently performed well on synthetic test cases and real-world datasets. This demonstrates its potential for broad applicability across diverse data mining tasks. - With its conceptual simplicity, logical soundness, and computational efficiency, DenMune represents an intuitive yet powerful approach to improved clustering. This combination of strengths sets it apart.
Perspectives
Read the Original
This page is a summary of: DenMune: Density peak based clustering using mutual nearest neighbors, Pattern Recognition, January 2021, Elsevier,
DOI: 10.1016/j.patcog.2020.107589.
You can read the full text:
Resources
pyMune: A Python package for complex clusters detection
An open-source Python implementation of the DenMune clustering algorithm is presented. The software is packaged in PyPi packages to ease and simplify the installation and integration processes. A number of notebooks on how to use and interact with the software are made available. The datasets used in the published article are made available to ease reproducibility of the obtained results. • To further enable reproducibility, test drives on both Codeocean.com and mybinder.org are given.
DenMune Clustering Algorithm Video
DenMune Clustering Algorithm Video Illustration on Youtube
Open-source project
Open-source project on Github
Open-access version on SSRN
This is the open-access version on SSRN
Data sets
Data on Mendeley
Contributors
The following have contributed to this page