bc52895b-d49a-474d-a614-3f46a3eebe1120210528091146820naun:naunmdt@crossref.orgMDT DepositInternational Journal of Computers and Communications2074-129410.46300/91013http://www.naun.org/cms.action?id=3050414202141420211510.46300/91013.2021.15https://www.naun.org/cms.action?id=23319Computational analysis of incremental clustering approaches for Large DataArun Pratap SinghKushwahReseach Scholar, Dept. of Computer Science,, BU,Bhopal, Madhya Pradesh, IndiaShaileshJaloreeDept. of Appl. Maths & Computer Applications, SATI, Vidisha, Madhya pradesh, IndiaRamjeevan SinghThakurDept. of Computer Applications, MANIT, Bhopal, Madhya pradesh, IndiaClustering is an approach of data mining, which helps us to find the underlying hidden structure in the dataset. K-means is a clustering method which usages distance functions to find the similarities or dissimilarities between the instances. DBSCAN is a clustering algorithm, which discovers the arbitrary shapes & sizes of clusters from huge volume of using spatial density method. These two approaches of clustering are the classical methods for efficient clustering but underperform when the data is updated frequently in the databases so, the incremental or gradual clustering approaches are always preferred in this environment. In this paper, an incremental approach for clustering is introduced using K-means and DBSCAN to handle the new datasets dynamically updated in the database in an interval.528202152820211418https://www.naun.org/main/UPress/cc/2021/a062012-003(2021).pdf10.46300/91013.2021.15.3https://www.naun.org/main/UPress/cc/2021/a062012-003(2021).pdfJavier Bejar Alonso, “Strategies and Algorithms for Clustering Large Datasets: A Review,” Report 2013. M. Kiruthika and S. Sukumaran, “A Survey on partitioning and hierarchical based data mining clustering techniques”, International Journal of Applied Engineering Research, vol.13. No.24, pp.16787-16791, 2018. M. Ester, et al., “A density-based algorithm for discovering clusters in large spatial databases with noise” in Proceedings of the International Conference on Knowledge Discovery and Data Mining (KDD’96), United States: AII Press, pp.226-231 1996. S.U. Rehman and M.N.A. Khan,” An Incremental DensityBased Clustering Technique for Large Datasets”. Computational Intelligence in Security for Information Systems, pp.3–11, 2010. 10.1016/j.aej.2015.08.009A. M. Bakr, et al., “Efficient incremental density-based algorithm for clustering large datasets,”Alexandria Engineering Journal, vol.54,no.4, pp.1147–1154,2015, Saroj and Tripti Chaudhary, “Study on Various Clustering Techniques,” International Journal of Computer Science and Information Technologies (IJCSIT), vol. 6 no.3 , pp.3031- 3033, 2015. 10.1243/0954406041319509D. T.Pham, et al., “An Incremental K-means algorithm. Proceedings of the Institution of Mechanical Engineers,” in Part C: Journal of Mechanical Engineering Science, vol.218, no.7, pp.783–795, 2004. Sanjay Chakraborty and N.K. Nagwani, “Performance Evaluation of Incremental K-means Clustering Algorithm,” IFRSA International Journal of Data Warehousing & Mining , vol.1, no.1, pp.54-61, 2011. 10.1109/cecnet.2012.6202079Yongli Liu, et al., “Research on Incremental Clustering,” 2nd International Conference on Consumer Electronics, Communications and Networks (CECNet), pp.2803-2806, 2012. Nidhi Gupta and R.L Ujjwal, "An Efficient Incremental Clustering Algorithm," World Of Computer Science and Information Technolgy Journal (WCSIT), vol. 3, no. 5, pp.97- 99,2013. A.M.Sowjanya and M.Shashi, “Cluster Feature-Based Incremental Clustering Approach(CFICA) For Numerical Data,” International Journal of Computer Science and Network Security(IJCSNS), vol.10, no.9, 2010. Fuyuan Cao, et al., “An initialization method for the KMeans algorithm using neighborhood model,” Computers & Mathematics with Applications, vol. 58, no. 3, pp. 474-483, 2009. 10.1109/icroit.2014.6798312Anupama Chadha and Suresh Kumar. “An Improved KMeans Clustering Algorithm: A Step Forward for Removal of Dependency on K,”International Conference on Reliability Optimization and Information Technology (ICROIT), Faridabad, India, pp. 136-140, 2014. 10.1109/icict50816.2021.9358627S. K. Sunori, et al., "K-Means Clustering of Ambient Air Quality Data of Uttarakhand, India during Lockdown Period of Covid-19 Pandemic,” 6th International Conference on Inventive Computation Technologies (ICICT), Coimbatore, India, pp. 1254-1259, 2021. Md. Hossain, et al., “A dynamic K-means clustering for data mining,” Indonesian Journal of Electrical Engineering and Computer Science(IEECS), vol.13, no.2, pp.521-526, 2019. 10.1109/icnc.2007.24C. Zhuo, et al., "A Fast Incremental Clustering Algorithm Based on Grid and Density,” Third International Conference on Natural Computation (ICNC 2007), Haikou, China, 2007, pp. 207-211, 2007. 10.1109/cloudcom.2013.89R.M. Esteves, et al., "Competitive K-Means, a New Accurate and Distributed K-Means Algorithm for Large Datasets,” 5th IEEE International Conference on Cloud Computing Technology and Science, Bristol, UK, pp. 17-24, 2013. 10.1109/wiecon-ece.2015.7443889Shreya Banerjee, et al., "Empirical evaluation of K-Means, Bisecting K-Means, Fuzzy C-Means and Genetic K-Means clustering algorithms,” IEEE International WIE Conference on Electrical and Computer Engineering (WIECON-ECE), Dhaka, Bangladesh, pp. 168-172, 2015. 10.1109/tencon.2016.7848362G. R. Kingsy, et al., "Air pollution analysis using enhanced K-Means clustering algorithm for real time sensor data," IEEE Region 10 Conference (TENCON), Singapore, pp. 1945-1949, 2016. 10.1109/ictai.2004.27Y. El-sonbaty, et al., “An efficient density based clustering algorithm for large databases,” 16th IEEE International Conference on Tools with Artificial Intelligence,pp.673- 677,2004. S.T. Mai, et al., "Incremental Density-based Clustering on Multicore Processors”, IEEE Transactions on Pattern Analysis and Machine Intelligence, pp.1-1,2019. 10.26599/tst.2020.9010024X.Zhao, et al., "Incremental face clustering with optimal summary learning via graph convolutional network,” Tsinghua Science and Technology, vol. 26, no. 4, pp. 536- 547,2021. 10.1109/icssem.2011.6081310Ying He, et al., “Incremental clustering-based spam image filtering using representative images,”International Conference on System science, Engineering design and Manufacturing informatization, Guiyang, China,pp. 323- 327,2011. 10.1016/j.datak.2020.101809D. Vatsalan, et al.,“Incremental clustering techniques for multi-party Privacy-Preserving Record Linkage,” Data & Knowledge Engineering, vol.128, pp.101809, 2020. 10.1016/j.future.2020.08.031E. Azhir, et al., “An efficient automated incremental densitybased algorithm for clustering and classification,” Future Generation Computer Systems, vol.114, pp.665-678, 2021. 10.1109/actea.2016.7560102M. Baydoun, et al.,“Enhanced parallel implementation of the K-Means clustering algorithm,” 3 rd International Conference on Advances in Computational Tools for Engineering Applications (ACTEA), Beirut, pp.7-11, 2016. 10.15866/irecos.v9i10.1639D. Vanisri, “A Novel Fuzzy Clustering Algorithm Based on K-Means Algorithm,” in International Review on Computers and Software (IRECOS), vol.9 ,no.10, pp.1731, 2014. 10.1016/j.aej.2015.08.009A.M. Bakr, et al., “Efficient incremental density-based algorithm for clustering large datasets,” in Alexandria Engineering Journal, vol.54, issue.4, pp.1147–1154, 2015. 10.1109/icoac.2015.7562795R. Ranjith, et al., “Anomaly detection using DBSCAN clustering technique for traffic video surveillance,” 7th International Conference on Advanced Computing (ICoAC),pp.1-6, 2015. 10.3390/agriculture10100465S. Wan and Y-P Wang ,“ The Comparison of Density-Based Clustering Approach among Different Machine Learning Models on Paddy Rice Image Classification of Multispectral and Hyper spectral Image Data,” Agriculture, vol.10, no.10, pp.465, 2020. 10.31449/inf.v43i4.2629A. Chefrour, and L. Souici-Meslati , “AMF-IDBSCAN: Incremental Density Based Clustering Algorithm Using Adaptive Median Filtering Technique,” Informatica. vol.43, no.4, pp. 495–506, 2019. P. N. Tan, et al., Cluster Analysis: Basic Concepts and Algorithms. Introduction to Data Mining, pp.487-568, 2006. Amit Yadav and Gambhir Singh, "Incremental k-means clustering algorithms: a review," International Journal of Latest Trends in Engineering and Technology (IJLTET), vol.5, no 4,pp.56-59, 2015. 10.23919/chicc.2019.8865861D. Ao, et al., "Hybrid model of Air Quality Prediction Using K-Means Clustering and Deep Neural Network,” Chinese Control Conference (CCC), Guangzhou, China, pp. 8416- 8421,2019.Javier Bejar Alonso, “Strategies and Algorithms for Clustering Large Datasets: A Review,” Report 2013. M. Kiruthika and S. Sukumaran, “A Survey on partitioning and hierarchical based data mining clustering techniques”, International Journal of Applied Engineering Research, vol.13. No.24, pp.16787-16791, 2018. M. Ester, et al., “A density-based algorithm for discovering clusters in large spatial databases with noise” in Proceedings of the International Conference on Knowledge Discovery and Data Mining (KDD’96), United States: AII Press, pp.226-231 1996. S.U. Rehman and M.N.A. Khan,” An Incremental DensityBased Clustering Technique for Large Datasets”. Computational Intelligence in Security for Information Systems, pp.3–11, 2010. 10.1016/j.aej.2015.08.009A. M. Bakr, et al., “Efficient incremental density-based algorithm for clustering large datasets,”Alexandria Engineering Journal, vol.54,no.4, pp.1147–1154,2015, Saroj and Tripti Chaudhary, “Study on Various Clustering Techniques,” International Journal of Computer Science and Information Technologies (IJCSIT), vol. 6 no.3 , pp.3031- 3033, 2015. 10.1243/0954406041319509D. T.Pham, et al., “An Incremental K-means algorithm. Proceedings of the Institution of Mechanical Engineers,” in Part C: Journal of Mechanical Engineering Science, vol.218, no.7, pp.783–795, 2004. Sanjay Chakraborty and N.K. Nagwani, “Performance Evaluation of Incremental K-means Clustering Algorithm,” IFRSA International Journal of Data Warehousing & Mining , vol.1, no.1, pp.54-61, 2011. 10.1109/cecnet.2012.6202079Yongli Liu, et al., “Research on Incremental Clustering,” 2nd International Conference on Consumer Electronics, Communications and Networks (CECNet), pp.2803-2806, 2012. Nidhi Gupta and R.L Ujjwal, "An Efficient Incremental Clustering Algorithm," World Of Computer Science and Information Technolgy Journal (WCSIT), vol. 3, no. 5, pp.97- 99,2013. A.M.Sowjanya and M.Shashi, “Cluster Feature-Based Incremental Clustering Approach(CFICA) For Numerical Data,” International Journal of Computer Science and Network Security(IJCSNS), vol.10, no.9, 2010. Fuyuan Cao, et al., “An initialization method for the KMeans algorithm using neighborhood model,” Computers & Mathematics with Applications, vol. 58, no. 3, pp. 474-483, 2009. 10.1109/icroit.2014.6798312Anupama Chadha and Suresh Kumar. “An Improved KMeans Clustering Algorithm: A Step Forward for Removal of Dependency on K,”International Conference on Reliability Optimization and Information Technology (ICROIT), Faridabad, India, pp. 136-140, 2014. 10.1109/icict50816.2021.9358627S. K. Sunori, et al., "K-Means Clustering of Ambient Air Quality Data of Uttarakhand, India during Lockdown Period of Covid-19 Pandemic,” 6th International Conference on Inventive Computation Technologies (ICICT), Coimbatore, India, pp. 1254-1259, 2021. Md. Hossain, et al., “A dynamic K-means clustering for data mining,” Indonesian Journal of Electrical Engineering and Computer Science(IEECS), vol.13, no.2, pp.521-526, 2019. 10.1109/icnc.2007.24C. Zhuo, et al., "A Fast Incremental Clustering Algorithm Based on Grid and Density,” Third International Conference on Natural Computation (ICNC 2007), Haikou, China, 2007, pp. 207-211, 2007. 10.1109/cloudcom.2013.89R.M. Esteves, et al., "Competitive K-Means, a New Accurate and Distributed K-Means Algorithm for Large Datasets,” 5th IEEE International Conference on Cloud Computing Technology and Science, Bristol, UK, pp. 17-24, 2013. 10.1109/wiecon-ece.2015.7443889Shreya Banerjee, et al., "Empirical evaluation of K-Means, Bisecting K-Means, Fuzzy C-Means and Genetic K-Means clustering algorithms,” IEEE International WIE Conference on Electrical and Computer Engineering (WIECON-ECE), Dhaka, Bangladesh, pp. 168-172, 2015. 10.1109/tencon.2016.7848362G. R. Kingsy, et al., "Air pollution analysis using enhanced K-Means clustering algorithm for real time sensor data," IEEE Region 10 Conference (TENCON), Singapore, pp. 1945-1949, 2016. 10.1109/ictai.2004.27Y. El-sonbaty, et al., “An efficient density based clustering algorithm for large databases,” 16th IEEE International Conference on Tools with Artificial Intelligence,pp.673- 677,2004. S.T. Mai, et al., "Incremental Density-based Clustering on Multicore Processors”, IEEE Transactions on Pattern Analysis and Machine Intelligence, pp.1-1,2019. 10.26599/tst.2020.9010024X.Zhao, et al., "Incremental face clustering with optimal summary learning via graph convolutional network,” Tsinghua Science and Technology, vol. 26, no. 4, pp. 536- 547,2021. 10.1109/icssem.2011.6081310Ying He, et al., “Incremental clustering-based spam image filtering using representative images,”International Conference on System science, Engineering design and Manufacturing informatization, Guiyang, China,pp. 323- 327,2011. 10.1016/j.datak.2020.101809D. Vatsalan, et al.,“Incremental clustering techniques for multi-party Privacy-Preserving Record Linkage,” Data & Knowledge Engineering, vol.128, pp.101809, 2020. 10.1016/j.future.2020.08.031E. Azhir, et al., “An efficient automated incremental densitybased algorithm for clustering and classification,” Future Generation Computer Systems, vol.114, pp.665-678, 2021. 10.1109/actea.2016.7560102M. Baydoun, et al.,“Enhanced parallel implementation of the K-Means clustering algorithm,” 3 rd International Conference on Advances in Computational Tools for Engineering Applications (ACTEA), Beirut, pp.7-11, 2016. 10.15866/irecos.v9i10.1639D. Vanisri, “A Novel Fuzzy Clustering Algorithm Based on K-Means Algorithm,” in International Review on Computers and Software (IRECOS), vol.9 ,no.10, pp.1731, 2014. 10.1016/j.aej.2015.08.009A.M. Bakr, et al., “Efficient incremental density-based algorithm for clustering large datasets,” in Alexandria Engineering Journal, vol.54, issue.4, pp.1147–1154, 2015. 10.1109/icoac.2015.7562795R. Ranjith, et al., “Anomaly detection using DBSCAN clustering technique for traffic video surveillance,” 7th International Conference on Advanced Computing (ICoAC),pp.1-6, 2015. 10.3390/agriculture10100465S. Wan and Y-P Wang ,“ The Comparison of Density-Based Clustering Approach among Different Machine Learning Models on Paddy Rice Image Classification of Multispectral and Hyper spectral Image Data,” Agriculture, vol.10, no.10, pp.465, 2020. 10.31449/inf.v43i4.2629A. Chefrour, and L. Souici-Meslati , “AMF-IDBSCAN: Incremental Density Based Clustering Algorithm Using Adaptive Median Filtering Technique,” Informatica. vol.43, no.4, pp. 495–506, 2019. P. N. Tan, et al., Cluster Analysis: Basic Concepts and Algorithms. Introduction to Data Mining, pp.487-568, 2006. Amit Yadav and Gambhir Singh, "Incremental k-means clustering algorithms: a review," International Journal of Latest Trends in Engineering and Technology (IJLTET), vol.5, no 4,pp.56-59, 2015. 10.23919/chicc.2019.8865861D. Ao, et al., "Hybrid model of Air Quality Prediction Using K-Means Clustering and Deep Neural Network,” Chinese Control Conference (CCC), Guangzhou, China, pp. 8416- 8421,2019.