A Hybrid of t-Distributed Stochastic Neighbors Embadding and Markov Cluster in Cluster Analysis

Authors

  • Bibit Waluyo Aji Department of Mathematics, Faculty of Science and Mathematics, Diponegoro University
  • Bambang Irawanto Department of Mathematics, Faculty of Science and Mathematics, Diponegoro University

Keywords:

Cluster analysis, t-Distributed Stochastic Neighbors Embadding, Markov cluster

Abstract

This research investigates the performance of the hybrid t-Distributed Stochastic Neighbor Embedding (t-SNE) and Markov Clustering (MCL) method in reducing dimensionality of data and performing clustering analysis. The iris dataset was used to evaluate the performance of the method. The results showed that the hybrid t-SNE and MCL method produced well-defined clusters with good separations between clusters, as indicated by a Silhouette score of 0.682. The Calinski-Harabasz (CH) Index and the Davies-Bouldin (D-B) Index were 330.226 and 0.46056, respectively, showing that the method was able to produce accurate clusters with low similarities between clusters. The reduction of the iris dataset into two dimensions using t-SNE was also effective in capturing the relationships between data points. Overall, the results of this research demonstrate the potential of the hybrid t-SNE and MCL method as a promising approach for clustering analysis.

References

Xu, Y., Wang, X., & Nie, F. (2015). A Markov Clustering Algorithm Based on t-SNE for Visualizing High-Dimensional Data. In Proceedings of the Pacific-Asia Conference on Knowledge Discovery and Data Mining (pp. 41-52). https://doi.org/10.1007/978-3-319-27758-9_5

Dongen, S. van. (2000). Graph Clustering by Flow Simulation. PhD thesis, University of Utrecht, Utrecht, The Netherlands.

Enright, A. J., Van Dongen, S., & Ouzounis, C. A. (2002). An efficient algorithm for large-scale detection of protein families. Nucleic acids research, 30(7), 1575-1584. https://doi.org/10.1093/nar/30.7.1575

van der Maaten, L., & Hinton, G. (2008). Visualizing high-dimensional data using t-SNE. Journal of machine learning research, 9(Nov), 2579-2605. https://www.jmlr.org/papers/volume9/vandermaaten08a/vandermaaten08a.pdf

Maaten, L. v. d., Qin, H., & Snijders, T. (2014). Visualizing the structure of complex networks: A multidimensional scaling perspective. ACM Transactions on Intelligent Systems and Technology (TIST), 5(2), 18. https://doi.org/10.1145/2630307

Ng, A. Y., & Jordan, M. I. (2002). On discriminative vs. generative classifiers: A comparison of logistic regression and naive Bayes. Advances in neural information processing systems, 15, 841-848. https://papers.nips.cc/paper/2002/file/a38f7f176f0bfd8b7eba1c03fcd7b91e-Paper.pdf

Chen, Y., Li, Y., Chen, J., & Zhang, J. (2018). A Hybrid Clustering Method Based on Markov Cluster Algorithm and t-Distributed Stochastic Neighbor Embedding. Sensors, 18(7), 1976. https://doi.org/10.3390/s18071976

Wang, L., Chen, W., & Wei, Z. (2020). A hybrid clustering method based on Markov clustering algorithm and t-SNE for high dimensional data. Applied Intelligence, 50(11), 6577-6594. https://doi.org/10.1007/s10489-020-01705-7

Downloads

Published

2023-04-26

How to Cite

Bibit Waluyo Aji, & Bambang Irawanto. (2023). A Hybrid of t-Distributed Stochastic Neighbors Embadding and Markov Cluster in Cluster Analysis . Proceeding International Conference on Religion, Science and Education, 2, 749–753. Retrieved from http://sunankalijaga.org/prosiding/index.php/icrse/article/view/991

Issue

Section

Articles