Improving Performance of Spatial Database Based on Hybrid Machine Learning and Hilbert Curve Data Structure


  • Galawizh Muhammad Najeeb 1 Information Technology Department, Technical College of Informatics, Sulaimani Polytechnic University, Sulaimani, Kurdistan Region, Iraq. 2 Mechanical Engineering Department, Technical College of Engineering, Sulaimani Polytechnic University, Sulaimani, Kurdistan Region, Iraq
  • Nzar A. Ali Department of Statistics and Informatics, College of Administration and Economic, University of Sulaimani, Sulaimani, Kurdistan Region, Iraq



Spatial index, Hilbert curve, Machine learning, Traditional, Performance


This work introduces and analyzes a novel approach to multi-dimensional indexing. It is based on the concepts of the hybrid learnt spatial indexing by using Hilbert space filling curve algorithm with machine learning. Using Hilbert algorithm to obtain indexing for each spatial objects (point, line, polygon), then executing nearest neighbor queries in traditional technique. Taking benefits of machine learning method to learn indices of spatial objects, in learnt method  we also used Hilbert curve to indexing spatial objects as in traditional method, and learning those indices, then implement nearest neighbor query as in traditional, finally calculate execution time. An important result that goes beyond proposed hybrid learning indexing algorithm (HLI)  that is the performance improvement over the Hilbert curve is great in learnt method by making comparison between traditional and learned methods which is done through calculating execution time of each techniques of query processing for all three spatial objects types. We tested both indexing methods to compare and evaluate both techniques, our proposed HLI, has significant results in term of less query execution time which is due to enhance performance of spatial database. Proposed indexing evaluated through receiver operating characteristic curve (ROC- curve) for system optimality model, also MSE and R2 statistical measures.


Alpaydin, E. (1997). Voting over multiple condensed nearest neighbors. Lazy learning, 115-132. DOI:

Bailey, T. (1978). A note on distance-weighted k-nearest neighbor rules. Trans. on Systems, Man, Cybernetics, 8, 311-313. DOI:

Beckmann, N., Kriegel, H.-P., Schneider, R., & Seeger, B. (1990). The R*-tree: An efficient and robust access method for points and rectangles. Proceedings of the 1990 ACM SIGMOD international conference on Management of data, DOI:

Chen, H.-L., & Chang, Y.-I. (2011). All-nearest-neighbors finding based on the Hilbert curve. Expert Systems with Applications, 38(6), 7462-7475. DOI:

Chen, Y., & Patel, J. M. (2006). Efficient evaluation of all-nearest-neighbor queries. 2007 IEEE 23rd International Conference on Data Engineering, DOI:

Chiu, C.-Y., Prayoonwong, A., & Liao, Y.-C. (2019). Learning to index for nearest neighbor search. IEEE transactions on pattern analysis and machine intelligence, 42(8), 1942-1956. DOI:

Cover, T., & Hart, P. (1967). Nearest neighbor pattern classification. IEEE transactions on information theory, 13(1), 21-27. DOI:

Davitkova, A., Milchevski, E., & Michel, S. (2020). The ML-Index: A Multidimensional, Learned Index for Point, Range, and Nearest-Neighbor Queries. EDBT,

Dhanabal, S., & Chandramathi, S. (2011). A review of various k-nearest neighbor query processing techniques. International Journal of Computer Applications, 31(7), 14-22.

Emrich, T., Kriegel, H.-P., Kröger, P., Renz, M., & Züfle, A. (2009). Constrained reverse nearest neighbor search on mobile objects. Proceedings of the 17th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, DOI:

Freeston, M. (1995). A general solution of the n-dimensional B-tree problem. Proceedings of the 1995 ACM SIGMOD international conference on Management of data, DOI:

Gates, G. (1972). The reduced nearest neighbor rule (corresp.). IEEE transactions on information theory, 18(3), 431-433. DOI:

Güting, R. H. (1994). An introduction to spatial database systems. the VLDB Journal, 3(4), 357-399. DOI:

Guttman, A. (1984). R-trees: A dynamic index structure for spatial searching. Proceedings of the 1984 ACM SIGMOD international conference on Management of data, DOI:

Jagadish, H. V., Ooi, B. C., Tan, K.-L., Yu, C., & Zhang, R. (2005). iDistance: An adaptive B+-tree based indexing method for nearest neighbor search. ACM Transactions on Database Systems (TODS), 30(2), 364-397. DOI:

Jia, L., Liang, B., Li, M., Liu, Y., Chen, Y., & Ding, J. (2022). Efficient 3D Hilbert Curve Encoding and Decoding Algorithms. Chinese Journal of Electronics, 31(2), 277-284. DOI:

Kang, J. M., Mokbel, M. F., Shekhar, S., Xia, T., & Zhang, D. (2006). Continuous evaluation of monochromatic and bichromatic reverse nearest neighbors. 2007 IEEE 23rd International Conference on Data Engineering, DOI:

Korn, F., & Muthukrishnan, S. (2000). Influence sets based on reverse nearest neighbor queries. ACM Sigmod Record, 29(2), 201-212. DOI:

Kraska, T., Alizadeh, M., Beutel, A., Chi, E. H., Ding, J., Kristo, A., Leclerc, G., Madden, S., Mao, H., & Nathan, V. (2021). Sagedb: A learned database system.

Kraska, T., Beutel, A., Chi, E. H., Dean, J., & Polyzotis, N. (2018). The case for learned index structures. Proceedings of the 2018 international conference on management of data, DOI:

Krishnan, C. G., Rengarajan, A., & Manikandan, R. (2015). Delay reduction by providing location based services using hybrid cache in peer to peer networks. KSII Transactions on Internet and Information Systems (TIIS), 9(6), 2078-2094. DOI:

Lawder, J. K., & King, P. J. H. (2001). Querying multi-dimensional data indexed using the Hilbert space-filling curve. ACM Sigmod Record, 30(1), 19-24. DOI:

Li, C., & Feng, Y. (2005). Algorithm for analyzing n-dimensional Hilbert curve. International Conference on Web-Age Information Management, DOI:

Li, S. Z., Chan, K. L., & Wang, C. (2000). Performance evaluation of the nearest feature line method in image classification and retrieval. IEEE transactions on pattern analysis and machine intelligence, 22(11), 1335-1339. DOI:

Liao, S., López, M. A., & Leutenegger, S. T. (2001). High dimensional similarity search with space filling curves. Proceedings 17th international conference on data engineering,

Liu, X., & Schrack, G. (1996). Encoding and decoding the Hilbert order. Software: Practice and Experience, 26(12), 1335-1346. DOI:<1335::AID-SPE60>3.0.CO;2-A

Lomet, D. B., & Salzberg, B. (1990). The hB-tree: A multiattribute indexing method with good guaranteed performance. ACM Transactions on Database Systems (TODS), 15(4), 625-658. DOI:

Mamoulis, N. (2022). Spatial data management. Springer Nature.

Markl, V. (2000). Mistral: Processing relational queries using a multidimensional access technique. In Ausgezeichnete Informatikdissertationen 1999 (pp. 158-168). Springer. DOI:

McNames, J. (2001). A fast nearest-neighbor algorithm based on a principal axis search tree. IEEE transactions on pattern analysis and machine intelligence, 23(9), 964-976. DOI:

Moore, A., & Gray, A. (2003). New algorithms for efficient high dimensional non-parametric classification. Advances in Neural Information Processing Systems, 16.

Nievergelt, J., Hinterberger, H., & Sevcik, K. C. (1984). The grid file: An adaptable, symmetric multikey file structure. ACM Transactions on Database Systems (TODS), 9(1), 38-71. DOI:

Omohundro, S. M. (1989). Five balltree construction algorithms. International Computer Science Institute Berkeley.

Ooi, B. C., Sacks-Davis, R., & Han, J. (1993). Indexing in spatial databases. Unpublished/Technical Papers.

Qin, D., Wang, H., Liu, Z., Xu, H., Zhou, S., & Bu, J. (2022). Hilbert Distillation for Cross-Dimensionality Networks. arXiv preprint arXiv:2211.04031.

Samet, H. (1990). The design and analysis of spatial data structures (Vol. 85). Addison-wesley Reading, MA.

Schmidt, C., & Parashar, M. (2004). Analyzing the search characteristics of space filling curve-based indexing within the squid P2P data discovery system. Rutgers University, Technique Report.

Sellis, T., Roussopoulos, N., & Faloutsos, C. (1987). The R+-Tree: A Dynamic Index for Multi-Dimensional Objects.

Shah, M. I., Javed, M. F., & Abunama, T. (2021). Proposed formulation of surface water quality and modelling using gene expression, machine learning, and regression techniques. Environmental Science and Pollution Research, 28(11), 13202-13220. DOI:

Singh, A., Ferhatosmanoglu, H., & Tosun, A. Ş. (2003). High dimensional reverse nearest neighbor queries. Proceedings of the twelfth international conference on Information and knowledge management, DOI:

Son, S.-o., Park, J., Oh, C., & Yeom, C. (2021). An algorithm for detecting collision risk between trucks and pedestrians in the connected environment. Journal of advanced transportation, 2021, 1-9. DOI:

Sproull, R. F. (1991). Refinements to nearest-neighbor searching in k-dimensional trees. Algorithmica, 6, 579-589. DOI:

Stanoi, I., Agrawal, D., & El Abbadi, A. (2000). Reverse nearest neighbor queries for dynamic databases. ACM SIGMOD workshop on research issues in data mining and knowledge discovery,

Tao, Y., Papadias, D., & Lian, X. (2004). Reverse knn search in arbitrary dimensionality. Proceedings of the Very Large Data Bases Conference (VLDB), Toronto, DOI:

Wang, H., Fu, X., Xu, J., & Lu, H. (2019). Learned index for spatial queries. 2019 20th IEEE International Conference on Mobile Data Management (MDM), DOI:

Wang, X., Sun, Y., Sun, Q., Lin, W., Wang, J. Z., & Li, W. (2022). HCIndex: a Hilbert-Curve-based clustering index for efficient multi-dimensional queries for cloud storage systems. Cluster Computing, 1-15. DOI:

Wu, Y., Cao, X., & An, Z. (2020). A spatiotemporal trajectory data index based on the Hilbert curve code. IOP Conference Series: Earth and Environmental Science, DOI:

Yang, C., & Lin, K.-I. (2001). An index structure for efficient reverse nearest neighbor queries. Proceedings 17th International Conference on Data Engineering,

Yong, Z., Youwen, L., & Shixiong, X. (2009). An improved KNN text classification algorithm based on clustering. Journal of computers, 4(3), 230-237. DOI:

Zeng, Y., Yang, Y., & Zhao, L. (2009). Pseudo nearest neighbor rule for pattern classification. Expert Systems with Applications, 36(2), 3587-3595. DOI:

Zheng, W., Zhao, L., & Zou, C. (2004). Locally nearest neighbor classifiers for pattern classification. Pattern recognition, 37(6), 1307-1309. DOI:

Zhou, Y., Zhang, C., & Wang, J. (2004). Tunable nearest neighbor classifier. Pattern Recognition: 26th DAGM Symposium, Tübingen, Germany, August 30-September 1, 2004. Proceedings 26,



How to Cite

Najeeb , G. M., & Ali, N. A. (2023). Improving Performance of Spatial Database Based on Hybrid Machine Learning and Hilbert Curve Data Structure: . Halabja University Journal, 8(4), 250-272.

Similar Articles

1-10 of 40

You may also start an advanced similarity search for this article.