What is it about?

Classifying citation trajectories of scientific publications is crucial. However, they diffuse anomalously due to non-linear, non-stationary, and long-ranged correlations. Previous studies define hard thresholds, arbitrary parameters, and subjective rules to classify based on their rise and fall patterns. It leads to substantial variance and, thus, ambiguous classification. This paper proposes CiteDEK, a hybrid EMD-kNN-DTW classification model framework. It predicts the nature of 5,039 trajectories, each 30 years in length, using only raw time series. We get a classification accuracy of 76%, and Cohen's kappa-statistic is 0.63, which is significant.

Featured Image

Why is it important?

There are several diverse applications of classifying citation time series. It can help evaluate same or cross-discipline articles and researchers, track field evolution, retract articles based on their obsolescence, and recommend breakthrough discoveries early.

Perspectives

A broad debate goes on in literature regarding the number of distinct trajectories. The number varies from a minimum of 2 to 6 classes. Different classification methods capture different groups. Some identified classes are sub-classes, so studying them separately is insignificant. Few works precisely study only specialised trajectories-- Hot Papers (HP) (immediate citations after publication without lasting impact) and Sleeping Beauties (SB) (citations long after a prolonged time of publication). Besides, trajectories with similar trends are named and studied under different classes. For example, HPs are studied separately as early rise-rapid decline, flashes-in-the-pan, sprinters, transient-knowledge-claim, MonDec class, etc. Further, SBs are studied independently as delayed documents, peaklate class, etc. Initial findings suggest that the {\em CiteDEK} model classifies raw citation trajectories of scientific publications with 76% accuracy, which is significant. To the best of our knowledge, it is the first attempt to classify trajectories without explicitly defining or extracting subjective thresholds or parameters. However, one drawback of the 1-kNN-DTW algorithm is that it performs poorly on computationally expensive data sets. In the future, we aim to test the model on larger data sets and improve accuracy using deep learning-based techniques.

Joyita Chakraborty

Read the Original

This page is a summary of: CiteDEK: A hybrid EMD-KNN-DTW model for classification of paper citation trajectories, January 2024, ACM (Association for Computing Machinery),
DOI: 10.1145/3632410.3632481.
You can read the full text:

Read

Contributors

The following have contributed to this page