Improving Performance of Spatial Database Based on Hybrid Machine Learning and Hilbert Curve Data Structure

Authors

  • Galawizh Muhammad Najeeb 1 Information Technology Department, Technical College of Informatics, Sulaimani Polytechnic University, Sulaimani, Kurdistan Region, Iraq. 2 Mechanical Engineering Department, Technical College of Engineering, Sulaimani Polytechnic University, Sulaimani, Kurdistan Region, Iraq
  • Nzar A. Ali Department of Statistics and Informatics, College of Administration and Economic, University of Sulaimani, Sulaimani, Kurdistan Region, Iraq

DOI:

https://doi.org/10.32410/huj-10505

Keywords:

Spatial index, Hilbert curve, Machine learning, Traditional, Performance

Abstract

This work introduces and analyzes a novel approach to multi-dimensional indexing. It is based on the concepts of the hybrid learnt spatial indexing by using Hilbert space filling curve algorithm with machine learning. Using Hilbert algorithm to obtain indexing for each spatial objects (point, line, polygon), then executing nearest neighbor queries in traditional technique. Taking benefits of machine learning method to learn indices of spatial objects, in learnt method  we also used Hilbert curve to indexing spatial objects as in traditional method, and learning those indices, then implement nearest neighbor query as in traditional, finally calculate execution time. An important result that goes beyond proposed hybrid learning indexing algorithm (HLI)  that is the performance improvement over the Hilbert curve is great in learnt method by making comparison between traditional and learned methods which is done through calculating execution time of each techniques of query processing for all three spatial objects types. We tested both indexing methods to compare and evaluate both techniques, our proposed HLI, has significant results in term of less query execution time which is due to enhance performance of spatial database. Proposed indexing evaluated through receiver operating characteristic curve (ROC- curve) for system optimality model, also MSE and R2 statistical measures.

References

Alpaydin, E. (1997). Voting over multiple condensed nearest neighbors. Lazy learning, 115-132. DOI: https://doi.org/10.1007/978-94-017-2053-3_4

Bailey, T. (1978). A note on distance-weighted k-nearest neighbor rules. Trans. on Systems, Man, Cybernetics, 8, 311-313. DOI: https://doi.org/10.1109/TSMC.1978.4309958

Beckmann, N., Kriegel, H.-P., Schneider, R., & Seeger, B. (1990). The R*-tree: An efficient and robust access method for points and rectangles. Proceedings of the 1990 ACM SIGMOD international conference on Management of data, DOI: https://doi.org/10.1145/93597.98741

Chen, H.-L., & Chang, Y.-I. (2011). All-nearest-neighbors finding based on the Hilbert curve. Expert Systems with Applications, 38(6), 7462-7475. DOI: https://doi.org/10.1016/j.eswa.2010.12.077

Chen, Y., & Patel, J. M. (2006). Efficient evaluation of all-nearest-neighbor queries. 2007 IEEE 23rd International Conference on Data Engineering, DOI: https://doi.org/10.1109/ICDE.2007.368964

Chiu, C.-Y., Prayoonwong, A., & Liao, Y.-C. (2019). Learning to index for nearest neighbor search. IEEE transactions on pattern analysis and machine intelligence, 42(8), 1942-1956. DOI: https://doi.org/10.1109/TPAMI.2019.2907086

Cover, T., & Hart, P. (1967). Nearest neighbor pattern classification. IEEE transactions on information theory, 13(1), 21-27. DOI: https://doi.org/10.1109/TIT.1967.1053964

Davitkova, A., Milchevski, E., & Michel, S. (2020). The ML-Index: A Multidimensional, Learned Index for Point, Range, and Nearest-Neighbor Queries. EDBT,

Dhanabal, S., & Chandramathi, S. (2011). A review of various k-nearest neighbor query processing techniques. International Journal of Computer Applications, 31(7), 14-22.

Emrich, T., Kriegel, H.-P., Kröger, P., Renz, M., & Züfle, A. (2009). Constrained reverse nearest neighbor search on mobile objects. Proceedings of the 17th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, DOI: https://doi.org/10.1145/1653771.1653801

Freeston, M. (1995). A general solution of the n-dimensional B-tree problem. Proceedings of the 1995 ACM SIGMOD international conference on Management of data, DOI: https://doi.org/10.1145/223784.223796

Gates, G. (1972). The reduced nearest neighbor rule (corresp.). IEEE transactions on information theory, 18(3), 431-433. DOI: https://doi.org/10.1109/TIT.1972.1054809

Güting, R. H. (1994). An introduction to spatial database systems. the VLDB Journal, 3(4), 357-399. DOI: https://doi.org/10.1007/BF01231602

Guttman, A. (1984). R-trees: A dynamic index structure for spatial searching. Proceedings of the 1984 ACM SIGMOD international conference on Management of data, DOI: https://doi.org/10.1145/602259.602266

Jagadish, H. V., Ooi, B. C., Tan, K.-L., Yu, C., & Zhang, R. (2005). iDistance: An adaptive B+-tree based indexing method for nearest neighbor search. ACM Transactions on Database Systems (TODS), 30(2), 364-397. DOI: https://doi.org/10.1145/1071610.1071612

Jia, L., Liang, B., Li, M., Liu, Y., Chen, Y., & Ding, J. (2022). Efficient 3D Hilbert Curve Encoding and Decoding Algorithms. Chinese Journal of Electronics, 31(2), 277-284. DOI: https://doi.org/10.1049/cje.2020.00.171

Kang, J. M., Mokbel, M. F., Shekhar, S., Xia, T., & Zhang, D. (2006). Continuous evaluation of monochromatic and bichromatic reverse nearest neighbors. 2007 IEEE 23rd International Conference on Data Engineering, DOI: https://doi.org/10.1109/ICDE.2007.367926

Korn, F., & Muthukrishnan, S. (2000). Influence sets based on reverse nearest neighbor queries. ACM Sigmod Record, 29(2), 201-212. DOI: https://doi.org/10.1145/335191.335415

Kraska, T., Alizadeh, M., Beutel, A., Chi, E. H., Ding, J., Kristo, A., Leclerc, G., Madden, S., Mao, H., & Nathan, V. (2021). Sagedb: A learned database system.

Kraska, T., Beutel, A., Chi, E. H., Dean, J., & Polyzotis, N. (2018). The case for learned index structures. Proceedings of the 2018 international conference on management of data, DOI: https://doi.org/10.1145/3183713.3196909

Krishnan, C. G., Rengarajan, A., & Manikandan, R. (2015). Delay reduction by providing location based services using hybrid cache in peer to peer networks. KSII Transactions on Internet and Information Systems (TIIS), 9(6), 2078-2094. DOI: https://doi.org/10.3837/tiis.2015.06.006

Lawder, J. K., & King, P. J. H. (2001). Querying multi-dimensional data indexed using the Hilbert space-filling curve. ACM Sigmod Record, 30(1), 19-24. DOI: https://doi.org/10.1145/373626.373678

Li, C., & Feng, Y. (2005). Algorithm for analyzing n-dimensional Hilbert curve. International Conference on Web-Age Information Management, DOI: https://doi.org/10.1007/11563952_60

Li, S. Z., Chan, K. L., & Wang, C. (2000). Performance evaluation of the nearest feature line method in image classification and retrieval. IEEE transactions on pattern analysis and machine intelligence, 22(11), 1335-1339. DOI: https://doi.org/10.1109/34.888719

Liao, S., López, M. A., & Leutenegger, S. T. (2001). High dimensional similarity search with space filling curves. Proceedings 17th international conference on data engineering,

Liu, X., & Schrack, G. (1996). Encoding and decoding the Hilbert order. Software: Practice and Experience, 26(12), 1335-1346. DOI: https://doi.org/10.1002/(SICI)1097-024X(199612)26:12<1335::AID-SPE60>3.0.CO;2-A

Lomet, D. B., & Salzberg, B. (1990). The hB-tree: A multiattribute indexing method with good guaranteed performance. ACM Transactions on Database Systems (TODS), 15(4), 625-658. DOI: https://doi.org/10.1145/99935.99949

Mamoulis, N. (2022). Spatial data management. Springer Nature.

Markl, V. (2000). Mistral: Processing relational queries using a multidimensional access technique. In Ausgezeichnete Informatikdissertationen 1999 (pp. 158-168). Springer. DOI: https://doi.org/10.1007/978-3-322-84823-9_15

McNames, J. (2001). A fast nearest-neighbor algorithm based on a principal axis search tree. IEEE transactions on pattern analysis and machine intelligence, 23(9), 964-976. DOI: https://doi.org/10.1109/34.955110

Moore, A., & Gray, A. (2003). New algorithms for efficient high dimensional non-parametric classification. Advances in Neural Information Processing Systems, 16.

Nievergelt, J., Hinterberger, H., & Sevcik, K. C. (1984). The grid file: An adaptable, symmetric multikey file structure. ACM Transactions on Database Systems (TODS), 9(1), 38-71. DOI: https://doi.org/10.1145/348.318586

Omohundro, S. M. (1989). Five balltree construction algorithms. International Computer Science Institute Berkeley.

Ooi, B. C., Sacks-Davis, R., & Han, J. (1993). Indexing in spatial databases. Unpublished/Technical Papers.

Qin, D., Wang, H., Liu, Z., Xu, H., Zhou, S., & Bu, J. (2022). Hilbert Distillation for Cross-Dimensionality Networks. arXiv preprint arXiv:2211.04031.

Samet, H. (1990). The design and analysis of spatial data structures (Vol. 85). Addison-wesley Reading, MA.

Schmidt, C., & Parashar, M. (2004). Analyzing the search characteristics of space filling curve-based indexing within the squid P2P data discovery system. Rutgers University, Technique Report.

Sellis, T., Roussopoulos, N., & Faloutsos, C. (1987). The R+-Tree: A Dynamic Index for Multi-Dimensional Objects.

Shah, M. I., Javed, M. F., & Abunama, T. (2021). Proposed formulation of surface water quality and modelling using gene expression, machine learning, and regression techniques. Environmental Science and Pollution Research, 28(11), 13202-13220. DOI: https://doi.org/10.1007/s11356-020-11490-9

Singh, A., Ferhatosmanoglu, H., & Tosun, A. Ş. (2003). High dimensional reverse nearest neighbor queries. Proceedings of the twelfth international conference on Information and knowledge management, DOI: https://doi.org/10.1145/956863.956882

Son, S.-o., Park, J., Oh, C., & Yeom, C. (2021). An algorithm for detecting collision risk between trucks and pedestrians in the connected environment. Journal of advanced transportation, 2021, 1-9. DOI: https://doi.org/10.1155/2021/9907698

Sproull, R. F. (1991). Refinements to nearest-neighbor searching in k-dimensional trees. Algorithmica, 6, 579-589. DOI: https://doi.org/10.1007/BF01759061

Stanoi, I., Agrawal, D., & El Abbadi, A. (2000). Reverse nearest neighbor queries for dynamic databases. ACM SIGMOD workshop on research issues in data mining and knowledge discovery,

Tao, Y., Papadias, D., & Lian, X. (2004). Reverse knn search in arbitrary dimensionality. Proceedings of the Very Large Data Bases Conference (VLDB), Toronto, DOI: https://doi.org/10.1016/B978-012088469-8.50066-8

Wang, H., Fu, X., Xu, J., & Lu, H. (2019). Learned index for spatial queries. 2019 20th IEEE International Conference on Mobile Data Management (MDM), DOI: https://doi.org/10.1109/MDM.2019.00121

Wang, X., Sun, Y., Sun, Q., Lin, W., Wang, J. Z., & Li, W. (2022). HCIndex: a Hilbert-Curve-based clustering index for efficient multi-dimensional queries for cloud storage systems. Cluster Computing, 1-15. DOI: https://doi.org/10.1007/s10586-022-03723-y

Wu, Y., Cao, X., & An, Z. (2020). A spatiotemporal trajectory data index based on the Hilbert curve code. IOP Conference Series: Earth and Environmental Science, DOI: https://doi.org/10.1088/1755-1315/502/1/012005

Yang, C., & Lin, K.-I. (2001). An index structure for efficient reverse nearest neighbor queries. Proceedings 17th International Conference on Data Engineering,

Yong, Z., Youwen, L., & Shixiong, X. (2009). An improved KNN text classification algorithm based on clustering. Journal of computers, 4(3), 230-237. DOI: https://doi.org/10.4304/jcp.4.3.230-237

Zeng, Y., Yang, Y., & Zhao, L. (2009). Pseudo nearest neighbor rule for pattern classification. Expert Systems with Applications, 36(2), 3587-3595. DOI: https://doi.org/10.1016/j.eswa.2008.02.003

Zheng, W., Zhao, L., & Zou, C. (2004). Locally nearest neighbor classifiers for pattern classification. Pattern recognition, 37(6), 1307-1309. DOI: https://doi.org/10.1016/j.patcog.2003.11.004

Zhou, Y., Zhang, C., & Wang, J. (2004). Tunable nearest neighbor classifier. Pattern Recognition: 26th DAGM Symposium, Tübingen, Germany, August 30-September 1, 2004. Proceedings 26,

Published

2023-12-30

How to Cite

Najeeb , G. M., & Ali, N. A. (2023). Improving Performance of Spatial Database Based on Hybrid Machine Learning and Hilbert Curve Data Structure: . Halabja University Journal, 8(4), 250-272. https://doi.org/10.32410/huj-10505

Similar Articles

1-10 of 40

You may also start an advanced similarity search for this article.