The Pairwise Similarity Partitioning algorithm: a method for unsupervised partitioning of geoscientific and other datasets using arbitrary similarity metrics

Grant W. Petty

doi:10.1175/aies-d-22-0005.1

What is it about?

This method allows one to group data objects according to similarity. The meaning of "similarity" is arbitrary, as long as it can be expressed as a number between 0 (completely dissimilar) and 1 (identical). Among other things, this method allows one to easily identify objects that are dissimilar from all other members of the data set. This can be useful for finding rare or anomalous cases.

Photo by Greg Rosenke on Unsplash

Why is it important?

There are many standard clustering and partitioning algorithms in wide use. This one combines certain features not found together in other algorithms. Examples include the ability to specify a hard minimum on the mutual similarity of members assigned to a group and the ability to specify similarity based on completely arbitrary criteria.

Perspectives

I devised this method because I had a specific task to accomplish that seemed impossible to accomplish using the existing methods I knew about. I assumed that I was simply re-inventing something that already existed, especially because it is so simple, but after a lengthier search and consultation with clustering experts, I concluded that this was probably a new method, leading to my decision to publish it.
Grant Petty
University of Wisconsin-Madison

This page is a summary of: The Pairwise Similarity Partitioning algorithm: a method for unsupervised partitioning of geoscientific and other datasets using arbitrary similarity metrics, Artificial Intelligence for the Earth Systems, August 2022, American Meteorological Society,
DOI: 10.1175/aies-d-22-0005.1.
You can read the full text:

Read

Contributors

The following have contributed to this page

Grant Petty
University of Wisconsin-Madison

A flexible method for unsupervised clustering of data sets based on user-defined similarity.

What is it about?

Why is it important?

Perspectives

Contributors

Discover more

Medical Research

Life Sciences

Physical Sciences

Technology and Engineering

Environmental Research

Arts and Humanities

Social Sciences

Business and Management

A flexible method for unsupervised clustering of data sets based on user-defined similarity.

What is it about?

Featured Image

Why is it important?

Perspectives

Read the Original

Contributors

Share this page:

Discover more

Medical Research

Life Sciences

Physical Sciences

Technology and Engineering

Environmental Research

Arts and Humanities

Social Sciences

Business and Management