A Hybrid of t-Distributed Stochastic Neighbors Embadding and Markov Cluster in Cluster Analysis
Keywords:
Cluster analysis, t-Distributed Stochastic Neighbors Embadding, Markov clusterAbstract
This research investigates the performance of the hybrid t-Distributed Stochastic Neighbor Embedding (t-SNE) and Markov Clustering (MCL) method in reducing dimensionality of data and performing clustering analysis. The iris dataset was used to evaluate the performance of the method. The results showed that the hybrid t-SNE and MCL method produced well-defined clusters with good separations between clusters, as indicated by a Silhouette score of 0.682. The Calinski-Harabasz (CH) Index and the Davies-Bouldin (D-B) Index were 330.226 and 0.46056, respectively, showing that the method was able to produce accurate clusters with low similarities between clusters. The reduction of the iris dataset into two dimensions using t-SNE was also effective in capturing the relationships between data points. Overall, the results of this research demonstrate the potential of the hybrid t-SNE and MCL method as a promising approach for clustering analysis.
References
Xu, Y., Wang, X., & Nie, F. (2015). A Markov Clustering Algorithm Based on t-SNE for Visualizing High-Dimensional Data. In Proceedings of the Pacific-Asia Conference on Knowledge Discovery and Data Mining (pp. 41-52). https://doi.org/10.1007/978-3-319-27758-9_5
Dongen, S. van. (2000). Graph Clustering by Flow Simulation. PhD thesis, University of Utrecht, Utrecht, The Netherlands.
Enright, A. J., Van Dongen, S., & Ouzounis, C. A. (2002). An efficient algorithm for large-scale detection of protein families. Nucleic acids research, 30(7), 1575-1584. https://doi.org/10.1093/nar/30.7.1575
van der Maaten, L., & Hinton, G. (2008). Visualizing high-dimensional data using t-SNE. Journal of machine learning research, 9(Nov), 2579-2605. https://www.jmlr.org/papers/volume9/vandermaaten08a/vandermaaten08a.pdf
Maaten, L. v. d., Qin, H., & Snijders, T. (2014). Visualizing the structure of complex networks: A multidimensional scaling perspective. ACM Transactions on Intelligent Systems and Technology (TIST), 5(2), 18. https://doi.org/10.1145/2630307
Ng, A. Y., & Jordan, M. I. (2002). On discriminative vs. generative classifiers: A comparison of logistic regression and naive Bayes. Advances in neural information processing systems, 15, 841-848. https://papers.nips.cc/paper/2002/file/a38f7f176f0bfd8b7eba1c03fcd7b91e-Paper.pdf
Chen, Y., Li, Y., Chen, J., & Zhang, J. (2018). A Hybrid Clustering Method Based on Markov Cluster Algorithm and t-Distributed Stochastic Neighbor Embedding. Sensors, 18(7), 1976. https://doi.org/10.3390/s18071976
Wang, L., Chen, W., & Wei, Z. (2020). A hybrid clustering method based on Markov clustering algorithm and t-SNE for high dimensional data. Applied Intelligence, 50(11), 6577-6594. https://doi.org/10.1007/s10489-020-01705-7
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2023 Bibit Waluyo Aji, Bambang Irawanto

This work is licensed under a Creative Commons Attribution 4.0 International License.