What is it about?

To compare the performance of the clustering algorithm on two data processing architectures, the implementations of k-means clustering algorithm on two big data architectures are given at first in this paper. Then we focus on the differences of theoretical performance of k-means algorithm on two architectures from the mathematical point of view. The theoretical analysis shows that Spark architecture is superior to the Hadoop in aspects of the average execution time and I/O time. Finally, a text data set of social networking site of users’ behaviors is employed to conduct algorithm experiments. The results show that Spark is significantly less than MapReduce in aspects of the execution time and I/O time based on k-means algorithm. The theoretical analysis and the implementation technology of the big data algorithm proposed in this paper are a good reference for the application of big data technology

Featured Image

Why is it important?

Big data is the current research hotspot

Perspectives

The theoretical analysis and the implementation technology of the big data algorithm proposed in this paper are a good reference for the application of big data technology

Prof. weiwei lin
South China University of Technology

The theoretical analysis and the implementation technology of the big data algorithm proposed in this paper are a good reference for the application of big data technology.

Beibei Li

Read the Original

This page is a summary of: Performance analysis of clustering algorithm under two kinds of big data architecture, Journal of High Speed Networks, January 2017, IOS Press,
DOI: 10.3233/jhs-170556.
You can read the full text:

Read

Contributors

The following have contributed to this page