What is it about?
To compare the performance of the clustering algorithm on two data processing architectures, the implementations of k-means clustering algorithm on two big data architectures are given at first in this paper. Then we focus on the differences of theoretical performance of k-means algorithm on two architectures from the mathematical point of view. The theoretical analysis shows that Spark architecture is superior to the Hadoop in aspects of the average execution time and I/O time. Finally, a text data set of social networking site of users’ behaviors is employed to conduct algorithm experiments. The results show that Spark is significantly less than MapReduce in aspects of the execution time and I/O time based on k-means algorithm. The theoretical analysis and the implementation technology of the big data algorithm proposed in this paper are a good reference for the application of big data technology
Featured Image
Why is it important?
Big data is the current research hotspot
Perspectives
Read the Original
This page is a summary of: Performance analysis of clustering algorithm under two kinds of big data architecture, Journal of High Speed Networks, January 2017, IOS Press,
DOI: 10.3233/jhs-170556.
You can read the full text:
Contributors
The following have contributed to this page