2025 №2(67)

Applied information technologies

Bredikhin S. V., Scherbakova N. G. Structural Properties of the Multiplex Network of Scientific Journal Authors

Starchenko A. V., Del I.V. Short-term Wind Speed Forecast Based on Artificial Neural Networks and the Method of Variational Mode Decomposition

Parallel system programming and computational technologies

Ivlev A.D., Liniov A. V. The Implementation of Quantum State Preparation Algorithms, Considering the Limitations of Modern Quantum Computers

Leonova Yu. F. Code Optimisation on the Example of an Algorithm for Solving the Traveling Salesman Problem

Tursunov Sh. A., Rashidov A. E. Analysis of Categorical Data Encoding Algorithms Rules of Presentation and Preparation of Manuscripts Offered for Publication

S.V. Bredikhin, N. G. Scherbakova

Institute of Computational Mathematics and Mathematical Geophysics SB RAS,

630090, Novosibirsk, Russia

STRUCTURAL PROPERTIES OF THE MULTIPLEX NETWORK OF SCIENTIFIC JOURNAL AUTHORS

DOI: 10.24412/2073-0667-2025-2-5-18

EDN: YYQLGV

A multilayer network — of which multiplex network is particular case — is a network made up by multiple layers, each of which represents a certain binary relationship between network actors. An example of such a system is a social network in which actors are interconnected by several types of social relations. Bibliographic information on a set of publications of scientific journals can be a source for constructing and studying multilayer networks of various types. Examples include multiplex networks, where different layers reflect the collaboration of authors in different scientific sections of the same scientific field [1]; two-layer networks of authors who are co-authors in the first layer and cite each other in the second [2]; three-layer co-authorship/citation/keyword networks, where the third layer reflects the use of the same keywords by authors in their works [3].

This paper contains the results of the analysis of the parameters of the weighted multiplex network built on the basis of real data extracted from a long-term archive of articles of the scientific journal “Sakharnyi Diabet”. The network consists of two layers: scientific co-authorship and citation. The nodes of the network are the authors of the journal articles. The first layer is the co-authorship graph, the vertices correspond to the authors, the connection between two vertices is established if the corresponding authors have joint publications, the edge weight is equal to the number of publications. The second layer is the directed citation graph, the edge between the citing and cited is established if the reference list of the article in which the citing author participates contains a link to an article from the journal in which the cited author participates. The details of the construction and the parameters of the layers are given in the work [8].

For multilayer networks structural properties of single networks must be modified to take into account their multilayer nature and to distinguish links. In this paper we investigate the set of basic metrics that characterize the structural properties of multiplex network and are the extension of classical network metrics to the case of multiplexes [6, 7]. These include centrality measures which allow ranking nodes and node clustering which reflect the tendency of nodes to form triangles. A number of methods for modifying parameter are considered. One way is a transformation of a multilayer network into a single-layer network for which the corresponding parameters (for example, the degree Ki) arc calculated. Aggregation of the paramctcr values calculated separately for each layer (C^) gives an idea of the features of the node’s connections in the layers. And consideration of several layers simultaneously, as for example when determining the clustering coefficient C/i, allows us to identify the interconnectedness of the structures in which the node is involved.

The results of parameters calculating and analysis are presented. The article is a continuation of the work [8].

This work was carried out under state contract with ICMMG SB RAS (FWNM-2025-0005).

Key words: complex systems, data analysis, bibliometrics, scientific co-authorship, citation, multiplex networks, actor centrality, clustering.

References

1. NICOSIA V., LATORA V. Measuring and modeling correlations in multiplex networks // Physical Review E. 2015. Vol. 92, 032805. DOI: 10.1103/PhysRevE.92.032805.

2. MENICHETTI G., REMONDINI D., PANZARASA P., MONDRAGON R. J., BIANCONI. Weighted multiplex networks // PLoS ONE. 2014. Vol. 9, iss. 6. e97857. DOI: 101371/journal pone 0097857.

3. TUNINETTI M., ALETA A., PAOLOTTI D., MORENO Y., STARNINI M. Prediction of scientific collaborations through multiplex interaction networks // Phys. Rev. Research. 2020. Vol. 2, 042029. DOI: 10.1103/PhysRevResearch.2.042029.

4. DlCKISON M. E., MAGNANI M., ROSSI L. Multilayer social networks. Cambridge: Cambridge Univ. Press, 2016. ISBN 9781139941907. DOI: 10.1017/CBO9781139941907.

5. LUSSEAU D., BARRETT L., HENZI S. P. Formalizing the multidimensional nature of social networks // Philos. Trans, of the Royal Soc. of London B. 2012. Vol. 367, iss. 1599. P. 2108-2118. DOI: 10.1098/rstb.2012.0113.

6. BATTISTON F., NICOSIA V., LATORA V. Structural measures for multiplex networks // Phys. Rev. 2014. E 89, 032804. DOI: 10.1103/PhysRevE.89.032804.

7. BlANCONI G. Multilayer network. Structure and function. Oxford: Oxford Univ. Press, 2018. ISBN: 9780191815676.

8. BREDIKHIN S. V., SCHERBAKOVA N. G. Vzveshennaya multipleksnaya set’ avtorov nauchnogo zhrnala // Probleme informatiki. 2025. № 1. P. 45-59. DOI: 10.24412/2073-0667-2025-1-45-59.

9. BOCCALETTI S., BIANCONI G., CRIADO R., DEL GENIO C. L, GOMEZ-GARDENES J., ROMANCE M., SENDINA-NADAL L, WANG Z., ZANIN M. The structure and dynamics of multilayer networks // Phys. Rep. 2014. V.544, iss, 1. P. 1-122. DOI: 10.1016/j.physrep.2014.07.001.

10. GOMEZ S., DIAZ-GUILERA A., GOMEZ-GARDENES J., PEREZ-VICENTE C. J., MORENO Y., ARENAS A. Diffusion dynamics on multiplex networks // Phys. Rev. Lett. 2013. Vol. 110, 028701. DOL 10.1103/ PhysRevLett.110.028701.

11. BREDIKHIN S. V., LYAPUNOV V. M., SCHERBAKOVA N. G. Bibliometricheskie seti nauchnykh statey i zhurnalov. Novosibirsk: IVMiMG SO RAN, 2021. ISBN 978-5-901548-44-8.

12. BERLINGERIO M., COSCIA M., GIANNOTTI F., MONREALE A., PEDRESCHI D. Foundations of multidimensional network analysis // Intern. Conf, on Adv. in Soc. Networks Analysis and Mining. 2011. P. 485-489. DOI: 10.1109/ASONAM.2011.103.

13. BRODKA P., SKIBICKI K., KAZIENKO P., MUSIAL K. A degree centrality in multi-layered social network // 2011 Intern. Conf, on Comput. Aspects of Soc. Networks (CASoN). 2011. P. 237-242. DOI: 10.1109/CASON.2011.6085951.

14. BONACICH P. Power and centrality: A family of measures // Amer. J. Sociol. 1987. Vol. 92, iss. 5. P. 1170-1182. DOI: 10.4236/ajcc.2016.51001.

15. SOLA L., ROMANCE M., CRIADO R., FLORES J., DEL AMO A. G., BOCCALETTI S. Eigenvector centrality of nodes in multiplex networks // Chaos. 2013. Vol. 2, iss. 3, 033131. DOI: 10.1063/1.4818544.

16. DE DOMENICO M., SOLE-RIBALTA A., OMODEI E., GOMEZ S., ARENAS A. Centrality in interconnected multilayer networks // Nature Commun. 2015. Vol. 6. 6868. DOI: 10.1038/incomms7868.

17. WATTS D. J., STROGATZ S. H. Collective dynamics of’small-world’ networks // Nature. 1998. Vol. 393. P. 440-442. DOI: 10.1038/30918.

18. LATORA V., MARCHIORI M. Economic small-world behavior in weighted networks // Eur. Phys. J. 2003. B. 52. P. 249-263. DOI: 10.1140/epjb/e2003-00095-5.

19. CRIADO R., FLORES J., GARCIA DEL AMO A., GOMEZ-GARDENES J., ROMANCE M. A mathematical model for networks with structures in the mesoscale // Intern. J. of Comput. Math. 2012. Vol. 89, iss. 3. P. 291-309. DOI: 10.1080/00207160.2011.577212.

20. BOLLOBAS B., RIORDAN О. M. Mathematical results on scale-free random graphs. Handbook of graphs and networks: From genome to Internet. Weinheim (FRG): Wiley-VCH Verlag GmbH & Co. KGaA, 2002. P. 1-34. ISBN: 9783527403363.

21. Cozzo E., KIVELA M., DE DOMENICO M., SOLE A., ARENAS A. GOMEZ S., PORTER M. A., MORENO Y. Structure of triadic relations in multiplex networks // New J. of Phys. 2015. Vol. 17, No 7, 073029. DOI: 10.1088/1367-2630/17/7/07/3029.

22. BRODKA P., MlJSIAL K., KAZIENKO P. A method for group extraction in complex social networks // Commun. in Comput. and Inform. Sei. Springer Berlin Heidelberg. 2010. Vol. 111. P. 238-247. DOI: 10.1007/978-3-642-16318-0_27.

23. BRODKA P., KAZIENKO P., MUSIAL K., SKIBICKI K. Analysis of neighbourhoods in multi-layered dynamic social networks // International J. of Computational Intelligence Systems. 2012. Vol. 5, iss. 3. P. 582-596. DOI: 1.1080/18756891.2012.696922.

Bibliographic reference: S.V. Bredikhin, N. G. Scherbakova. STRUCTURAL PROPERTIES OF THE MULTIPLEX NETWORK OF SCIENTIFIC JOURNAL AUTHORS// journal “Problems of informatics”. 2025, № 2. P.5-18. DOI: 10.24412/2073-0667-2025-2-5-18 - EDN: YYQLGV

I.V. Del*. A.V. Starchenko**

*Tomsk State University,

634050, Tomsk, Russia

**V. E. Zuev Institute of Atmospheric Optics SB RAS,

634055, Tomsk, Russia

SHORT-TERM WIND SPEED FORECAST BASED ON ARTIFICIAL NEURAL NETWORKS AND THE METHOD OF VARIATIONAL MODE DECOMPOSITION

DOI: 10.24412/2073-0667-2025-2-19-32

EDN: VKPBAD

In the current conditions, climate change and the increasing frequency of extreme weather phenomena make the task of wind speed forecasting particularly relevant. In addition, short-term forecasting of local wind speed is extremely important to ensure safe and efficient operation of wind power stations and airports.

Classical forecasting methods based on physical models of atmospheric processes are often inferior in accuracy to machine learning methods. Machine learning methods are able to efficiently process large amounts of data, detecting complex nonlinear dependencies. However, one of the main problems remains the presence of “noise” in the input data. This “noise” caused by external factors such as measurement error, turbulence, changes in temperature, humidity and other meteorological parameters, reduces the accuracy of the constructed models and, as a consequence, negatively affects the forecasting results. To solve this problem, approaches combining machine learning with data preprocessing methods are used. One of the promising directions is the use of artificial neural networks (ANN) combined with input signal filtering.

In this paper, a hybrid method that combines neural networks with the Variational Mode Decomposition (VMD) method has been developed to improve the accuracy of short-term local wind speed prediction. This method allows to decomposed the input signal into several components (variation modes), each of which represents a certain frequency range, thus reducing the influence of noise and increasing the accuracy of useful information extraction. The method of decomposing the input signal into variation modes is applied to the input dataset (hourly measured values of surface wind speed) before using the ANN model for wind speed prediction.

The aim of the work is to develop and apply a hybrid method for short-term prediction of local wind speed with an advance of up to 24 hours, which uses ANN in combination with pre-filtering of the input signal by VMD. Using historical wind speeds measured by a stationary weather station for the previous 24 hours decomposed into modes using the VMD, it is necessary to predict the wind speed in 1, 3, 6, 12 and 24 hours using an ANN.

The ANN architecture is a classical fully connected neural network consisting of three layers: input, hidden and output layers. The size of the input layer is 576 neurons (24 time steps per 24 modes). Each neuron takes a numerical value corresponding to the characteristics of the modes decomposed using the VMD method. The hidden layer of the neural network contains 64 neurons that use the ReLU (Rectified Linear Unit) activation function. The output layer represents a single numerical value — the predicted wind speed in 1, 3, 6, 12 or 24 hours.

The application of the hybrid method has achieved a significant increase in forecasting accuracy. In particular, the Mean Absolute Error (MAE) and Mean Absolute Percentage Error (MAPE) decreased by at least 90 % (to 0.013-0.101 m/s and 0.9 %-6.1 %, respectively) for all considered advance options. The obtained values of the MAE and MAPE metrics confirm the high accuracy of the developed method, since a MAPE of less than 10 % can be classified as excellent prediction. In addition, the hybrid method shows high robustness to changes in data structure, which makes it a versatile tool for dealing with different types of meteorological conditions.

The evaluation of the hybrid method results showed that the use of the VMD combined with ANN not only improves the quality of wind speed prediction, but also opens new opportunities for predicting other meteorological parameters. For example, temperature and humidity time series can also be processed using this approach, which will provide a comprehensive solution to the problems of meteorological analysis. The developed hybrid method for short-term wind speed forecasting is a promising tool that can significantly improve the accuracy of forecasts. Its application is especially relevant in the conditions of growing demand for reliable forecasts necessary to ensure the safety and efficiency of various weather-dependent systems.

Further work in this direction can be aimed at improving the architecture of neural networks used within the method, as well as optimizing the VMD parameters. This will further improve the accuracy and adaptability of the models, which will make them indispensable in a wide range of tasks related to the analysis and forecasting of meteorological data.

Key words: artificial neural networks, time series, variational mode decomposition, local short-term wind speed forecast.

References

1. Huang N. E., Shen Z., Long S. R., et al. The empirical mode decomposition and the Hilbert spectrum for nonlinear and non-stationary time series analysis // Proceedings of the Royal Society of London A. 1998. V. 454. P. 903-995.

2. Torres, M. E., Colominas M. A., Schlotthauer G., Flandrin P. A complete ensemble empirical mode decomposition with adaptive noise // 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). 2011. P. 4144-4147.

3. Gilles J. Empirical Wavelet Transform // IEEE Transactions on Signal Processing. 2013. V. 61(16). P. 3999-4010.

4. Singh P., Joshi S.D., Patney R. K., Saha K. The Fourier decomposition method for nonlinear and non-stationary time series analysis // Proc. R. Soc. A. 47320160871. 2017. V. 473.

5. Zhou W., Feng Z., Xu Y. F., Wang X., Lv H. Empirical Fourier decomposition: an accurate signal decomposition method for nonlinear and non-stationary time series analysis // Meeh. Syst. Signal Process. 2022. V. 163.

6. Dragomiretskiy K., Zosso D.. Variational mode decomposition // IEEE Transactions on Signal Processing. 2013. V. 62(3). P. 531-544.

7. Lv S., Wang L., Wang S. A Hybrid Neural Network Model for Short-Term Wind Speed Forecasting // Energies. 2023. V. 16(4).

8. Zhang Y., Zhao Y., Gao S. A Novel Hybrid Model for Wind Speed Prediction Based on VMD and Neural Network Considering Atmospheric Uncertainties // IEEE Access. 2019. V. 7, P. 60322-60332.

9. Gladkih V.A., Makienko V. E. Cifrovaya ul’trazvukovaya meteostanciya // Pribory. 2009. V. 7(109). pp. 21-25.

10. Hestenes M. R. Multiplier and Gradient Methods // Journal of Optimization Theory and Applications. 1969. V. 4(5). P. 303-320.

11. Rockafellar R. T. A dual approach to solving nonlinear programming problems by unconstrained optimization // Mathematical Programming. 1973. V. 5(1). P. 354-373.

12. Bertsekas D. Р. Constrained optimization and Lagrange Multiplier methods // Computer Science and Applied Mathematics, Boston: Academic Press. 1982.

13. Wiener N. Extrapolation, Interpolation, and Smoothing of Stationary Time Series // The MIT Press. 1949.

14. Goodfellow I., Bengio Y., Courville A. Deep Learning // MIT Press, 2016.

15. Loshchilov L, Hutter F. Fixing weight decay regularization in adam // arXiv:1711.05101. 2017.

16. Huber P. J. Robust Estimation of a Location Parameter // Ann. Math. Statist. 1964. V. 35(1). P. 73-101.

Bibliographic reference: I.V. Del, A.V. Starchenko. SHORT-TERM WIND SPEED FORECAST BASED ON ARTIFICIAL NEURAL NETWORKS AND THE METHOD OF VARIATIONAL MODE DECOMPOSITION.// journal “Problems of informatics”. 2025, № 2. P.19-32. DOI: 10.24412/2073-0667-2025-2-19-32 - EDN: VKPBAD

A.D. Ivlev, A.V. Liniov

Lobachevsky State University of Nizhny Novgorod,

603950, Nizhny Novgorod, Russia

THE IMPLEMENTATION OF QUANTUM STATE PREPARATION ALGORITHMS, CONSIDERING THE LIMITATIONS OF MODERN QUANTUM COMPUTERS

DOI: 10.24412/2073-0667-2025-2-33-47

EDN: FCJOXG

Preparation of an arbitrary initial state of a system of qubits is an important and actual problem of quantum computing. Its importance because many quantum algorithms require preloading classical data to quantum devices, such as quantum neural networks [1] or solving systems of linear equations [3]. The computational cost of data loading can limit potential quantum acceleration, while the accuracy of the prepared state directly impacts the correctness of algorithmic outcomes. This paper analyzes three contemporary approaches to initial quantum state preparation, their translation into QASM code under the constraints of modern quantum hardware, considering the topology and basic sets of gates. The first of them is a modified algorithm for preparing the quantum state using controlled gates [5]. This algorithm does not use additional qubits, and the size and depth of its quantum circuit are estimated at O(2n). The main idea, of the algorithm is I hat iteratively, considering the first i + 1 qubits at the i step, we distribute 1 (corresponding to the iuitial state, where all qubits arc in the state |0)) over the entire vector in the required proportions and add a phase at the last iteration. The second algorithm called “divide and conquer”, proposed in the article [5], is an optimization of the previous algorithm, which, by using 2n — n — 1 additional qubits, reduces the asymptotic depth of the circuit to O(n2), but maintaining its size is O(2n). The idea is to expand the algorithm in width instead of length due to the tree structure of the algorithm and the gates CSW AP. This arrangement makes it possible to perform quantum gates in parallel. The third algorithm is the preparation of a quantum state by solving a more general problem of approximation of a unitary operator using template schemes. This approach, unlike the previous ones, is not theoretically accurate and requires resource-intensive classical calculations. In this paper, all algorithms were implemented considering the limitations of the modern ibm_sherbrooke quantum computer [8]. To do this, the basic gates were approximated from theoretical calculations using available gates, the CNOT gates were applied considering the topology, and the problem of the global phase was considered separately. The algorithmic implementation of algorithms for preparing the initial quantum state showed in all cases a shift in the estimation of the asymptotic of the circuit size and its depth for the worse. The algorithm showed the best result in the form of an approximation of the unitary operator with O(n2n) and O(2n) accordingly, but it requires preliminary classical preparation of 3n2n circuit parameters. All algorithms showed low accuracy when algorithmically implemented on the architecture of the ibm_sherbrooke quantum computer. And at the moment, due to the high asymptotic of practical implementation, they are explicitly suitable only for preparing the states of systems of no more than three qubits. Perhaps by applying various optimization

The work was carried out with the support of the Research and Education Mathematical Center “Mathematics of Future Technologies”, contract No. 075-02-2024-1439.

methods and error compensation algorithms, it will be possible to increase this number, but without new approaches that reduce the asymptotic size of the circuit or its depth in practice, one should not expect significant improvement. At the same time, it is worth remembering that all the described algorithms allow you to prepare an arbitrary state, that is, they are universal.

Key words: quantum computing, quantum state preparation, translation, QASM.

References

1. Hur T., Kim L., Park D. K. Quantum convolutional neural network for classical data classification // Quantum Machine Intelligence. 2022, 4(1), 3.

2. Barenco A. et al. Stabilization of quantum computations by symmetrization // SIAM Journal on Computing. 1997, 26(5), 1541-1557.

3. Harrow A. W., Hassidim A., Lloyd S. Quantum algorithm for linear systems of equations // Physical review letters. 2009, 103(15), 150502.

4. Sun X. et al. Asymptotically optimal circuit depth for quantum state preparation and general unitary synthesis //IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems. 2023, 42(10), 3301-3314.

5. Araujo I. F. et al. A divide-and-conquer algorithm for quantum state preparation // Scientific reports. 2021, 11(1), 6329.

6. Sim S., Johnson P. D., Aspuru-Guzik A. Expressibility and entangling capability of parameterized quantum circuits for hybrid quantum-classical algorithms // Advanced Quantum Technologies. 2019, 2(12), 1900070.

7. Cross A. W. et al. Open quantum assembly language // arXiv preprint arXiv: 1707.03429. 2017.

8. IBM Quantum. [Electron. Res.]: https://quantum.ibm.coni.

9. Barenco A. et al. Elementary gates for quantum computation //Physical review A. 1995, 52(5), 3457.

10. Shende V. V., Bullock S. S., Markov I. L. Synthesis of quantum logic circuits // Proceedings of the 2005 Asia and South Pacific Design Automation Conference. 2005, 272-275.

11. Qiskit. [Electron. Res.]: https://github.com/Qiskit.

12. Luo J., Li L. Circuit complexity of sparse quantum state preparation // arXiv preprint arXiv:2406.16142. 2024.

Bibliographic reference: A.D. Ivlev, A.V. Liniov. THE IMPLEMENTATION OF QUANTUM STATE PREPARATION ALGORITHMS, CONSIDERING THE LIMITATIONS OF MODERN QUANTUM COMPUTERS.// journal “Problems of informatics”. 2025, № 2. P.33-47. DOI: 10.24412/2073-0667-2025-2-33-47 - EDN: FCJOXG

Yu. F. Leonova

National Research South Ural State University,

454080, Chelyabinsk, Russia

CODE OPTIMISATION ON THE EXAMPLE OF AN ALGORITHM FOR SOLVING THE TRAVELING SALESMAN PROBLEM

DOI: 10.24412/2073-0667-2025-2-48-64

EDN: KQAAFR

This paper presents a comprehensive approach to optimizing the cycle merging algorithm applied to the Traveling Salesman Problem (TSP), a classic NP-hard problem that has challenged researchers and practitioners alike in logistics, manufacturing, and data-intensive applications. The TSP requires finding the shortest possible route that visits a list of cities and returns to the starting point. As the number of cities grows, finding an exact solution becomes computationally prohibitive, making approximation techniques both necessary and valuable in practical applications.

The cycle merging algorithm is a well-established heuristic approach to solving TSP. It constructs an initial 2-factor solution that includes a set of cycles covering all vertices, and iteratively merges the cycles based on optimal edge replacement until only one cycle remains. Along with the choice of the solution algorithm, the quality of the application code plays an important role.

In the process of work, a number of measures aimed at optimising the program code implementing the cycle merging algorithm have been performed. The approach includes optimising the algorithm, optimising the data storage structure and using parallel programming techniques.

Experimental results show that the optimised algorithm significantly out-performs the baseline implementation, achieving a speedup factor proportional to the number of computational cores and nodes. Tests conducted on instances with up to 1000 nodes showed that our approach makes it possible to solve larger problems without a commensurate increase in computational resources. The study also observed a consistent performance gain in cache utilisation and a reduction in latency at key stages of the algorithm, which confirms the effectiveness of the chosen optimisations.

This work provides a sound basis for solving large TSP instances by combining heuristic methods with advanced computational optimisations. The results highlight the importance of both algorithm efficiency and imple-mentation techniques when solving computationally intensive problems. The approach and results presented here are not only applicable to TSP, but also to a broader class of combinatorial optimisation problems where parallelism and memory efficiency are important. Future work may investigate additional optimisations through GPU acceleration or hybrid parallelism techniques, potentially providing even better performance.

Key words: traveling salesman problem, combinatorial optimisation, software code optimisation, performance optimisation, parallel computing, instrumentation and profiling.

References

1. Jarrah A., Bataineh A. S. A., Almomany A. The optimisation of travelling salesman problem based on parallel ant colony algorithm // International Journal of Computer Applications in Technology. 2022. V. 69. N 4. P. 309-321.

2. Rhee Y. Gpu-based parallel ant colony system for traveling salesman problem // Journal of The Korea Society of Computer and Information. 2022. V. 27. N 2. P. 1-8.

3. Wang Z. et al. A fine-grained fast parallel genetic algorithm based on a ternary optical computer for solving traveling salesman problem // The Journal of Supercomputing. 2023. V. 79. N о. P. 4760-4790.

4. Peng C. Parallel genetic algorithm for travelling salesman problem // In: International conference on automation control, algorithm, and intelligent bionics (ACAIB 2022). 2022. V. 12253, P. 259-267.

5. Alhenawi E. et al. Solving Traveling Salesman Problem Using Parallel River Formation Dynamics Optimization Algorithm on Multi-core Architecture Using Apache Spark // International Journal of Computational Intelligence Systems. 2024. V. 17. N 1. P. 4.

6. Qiao Y. et al. A hybridized parallel bats algorithm for combinatorial problem of traveling salesman // Journal of Intelligent & Fuzzy Systems. 2020. T 38. N 5. P. 5811-5820.

7. Korol Z. A., Ankudinov K. A., Korolkova L. N. Research of the methods of software code optimisation to improve performance // Innovative directions of development in education, economics, engineering and technology. 2023. P. 257-260. (in Russian).

8. Taik A. M., Lupin S. A. A., Fedyashin D. A. MPI library usage for parallel realisation of the algorithm of the complete variant search // Software Products and Systems. 2023. V 36. N 4. P. 607-614. (in Russian).

9. Panyukov A. V., Leonova Y. F. Algorithm for approximate solution of the travelling salesman problem // Optimization Problems and Their Applications (OPTA-2018). 2018. P. 31. (in Russian).

10. Leonova Yu. F. Cycles merging algorithm for an approximate solution of the traveling salesman problem // Abstracts of the XIX All-Russian Conference of Young Scientists on Mathematical Modeling and Information Technologies, Novosibirsk: ICT SB RAS. 2018. P. 28.

11. Panyukov A.V., Leonova Yu. F. Cycle Merging Algorithm for MAX TSP Problems // XVIII International Conference “Mathematical Optimization Theory and Operations Research” (MOTOR 2019), Ekaterinburg, Russia: Publisher “UMC UrFU”. 2019. P. 57.

12. Panyukov A. V., Leonova Yu. F. Cycle merging algorithm for the maximal metric traveling salesman problem // Bulletin of the South Ural State University, V. 10, N 4: a series of Computational Mathematics and Informatics. Chelyabinsk: Publishing House of SUSU. 2021. P. 26-36. (in Russian) DOL 10.14529/cmse210402.

13. Certificate of state registration of computer programme N 2021669214 Russian Federation. Programme for implementing the cycle merging algorithm for solving the travelling salesman problem : № 2021668239 : applied. 16.11.2021: published 25.11.2021 / Yu. F. Leonova, A. V. Panyukov ; applicant Federal State Autonomous Educational Institution of Higher Education ‘South Ural State University’. - EDN RPNSPO.

14. Guntheroth, K. Optimized C++: Proven Techniques for Heightened Performance, O’Reilly Media. 2016.

15. Panyukov A. V., Telegin V. A. Technique of software implementation of streaming algorithms // Bulletin of South Ural State University. Series: Mathematical modelling and programming. 2008. N 27 (127). P. 78-99 (in Russian).

16. 17. Leonova Yu. Parallel implementation of the cycle merging algorithm for solving the traveling salesman problem. [Electron. Res.]: https://github.com/YuliyaLeoiiova/CycleMergingAlgoritlini. git, last accessed 2025/05/19.

18. Meyers, S. Effective Modern C++: 42 Specific Ways to Improve Your Use of С++И and C++14. 2018. 304 p.

19. std::mersenne_twister_engine. [Electron. Res.]: https://en.cppreference.eom/w/cpp/ numeric/random/mersenne_twister_engine.html, last accessed 2024/11/02.

20. Best Practices in the Parallel Patterns Library. [Electron. Res.]: https ://learn.microsoft. com/en-us/epp/parallel/concrt/best-practices-in-the-parallel-patterns-library?view= msvc-170, last accessed 2024/10/15.

21. Intel VTune Profiler. [Electron. Res.]: https://www.intel.com/content/www/us/en/ developer/tools/oneapi/vtune-profiler .html/gs.hfjczc, last accessed 2024/11/11.

Bibliographic reference: Yu. F. Leonova. CODE OPTIMISATION ON THE EXAMPLE OF AN ALGORITHM FOR SOLVING THE TRAVELING SALESMAN PROBLEM.// journal “Problems of informatics”. 2025, № 2. P.48-64. DOI: 10.24412/2073-0667-2025-2-48-64 - EDN: KQAAFR

Sh.A. Tursunov, A. E. Rashidov

Samarkand State University named after Sharof Rashidov

703004, Samarkand, Uzbekistan

ANALYSIS OF CATEGORICAL DATA ENCODING ALGORITHMS

UDC 519

DOI: 10.24412/2073-0667-2025-2-65-80

EDN: ALXCCT

It is known that the efficiency of artificial intelligence, which is recognized as the most useful tool in all fields, is closely related to several factors. One of these important factors is that the data entering the artificial intelligence algorithms must be in a form that these algorithms understand. That is, since artificial intelligence algorithms are based on mathematical operations and expressions, there must be an opportunity to perform mathematical operators on the incoming data. However, there are projects in which, during the use of artificial intelligence algorithms, data is encountered that does not allow arithmetic operations to be performed. Since discarding this data can negatively affect the result of artificial intelligence, it is necessary to convert it from one form to another. That is, this data is converted to the numeric type. It is known that there are several methods for categorical data encoding, and selecting the best one from these methods is a complex research process. This requires the artificial intelligence user to know not only information about the data set, but also information about all the methods. This research work is aimed at analyzing the methods of categorical data encoding. During the research, 12 different methods of form transformation in text data are studied and analyzed. The advantages and disadvantages of each studied method of encoding are studied. At the same time, a comparative analysis of the studied methods is conducted and a general conclusion is given.

Key words: artificial intelligence, categorical data encoding, data encoding methods.

References

1. Nazarov, F. M., O’G’Li, Y. S. S., & O’G’Li, E. B. S. Algorithms To Increase Data Reliability In Video Transcription // 2022 IEEE 16th International Conference on Application of Information and Communication Technologies (AICT), 2022. 1-6. DOI: https://doi.org/10.1109/AICT55583.2022. 10013558.

2. Rashidov, A., Akhatov, A., Aminov, I., Mardonov, D., & Dagur, A. Distribution of data flows in distributed systems using hierarchical clustering. In A. Dagur, D. K. Shukla, N. F. Makhmadiyarovich, A. A. Rustamovich, & J. J. Sindorovich, Artificial Intelligence and Information Technologies. 2024. 1st ed., P. 207-212. CRC Press. DOI: https://doi.org/10.1201/9781032700502-34.

3. Zaynidinov, H., Xuramov, L., & Khodjaeva, D. Intelligent algorithms of digital processing of biomedical images in wavelet methods. In A. Dagur, K. Singh, P. S. Mehra, & D. K. Shukla, Artificial Intelligence, Blockchain, Computing and Security. 2023. Vol. 2. 1st ed., P. 648-653. CRC Press. DOI: https://doi.org/10.1201/9781032684994-105.

4. Nazarov, F. M., Yarmatov, S., & Xamidov, M. Machine Learning Price Prediction on Green Building Prices // 2024 International Russian Smart Industry Conference (SmartlndustryCon), 2024. P. 906-911. DOI: https://doi.org/10.1109/SmartIndustryCon61328.2024.10515790.

5. Rashidov, A., Akhatov, A., & Nazarov, F. The Same Size Distribution of Data Based on Unsupervised Clustering Algorithms. In Z. Hu, Q. Zhang, & M. He (Eds.), Advances in Artificial Systems for Logistics Engineering III. 2023. Vol. 180, P. 437-447. Springer Nature Switzerland. DOL https://doi.org/10.1007/978-3-031-36115-9_40.

6. Rustamovich, A. A., Mekhriddin, N., Fayzullo, N., & Sabharwal, M. Intelligent system of labor market regulation based on the evolutionary modeling of employment // 2022 4th International Conference on Advances in Computing, Communication Control and Networking (ICAC3N), 2022. P. 2534-2539. DOL https://doi.org/10.1109/ICAC3N56670.2022.10074149.

7. Rashidov, A., Akhatov, A., & Mardonov, D. The Distribution Algorithm of Data Flows Based on the BIRCH Clustering in the Internal Distribution Mechanism // 2024 International Russian Smart Industry Conference (SmartlndustryCon), 2024. P. 923-927. DOL https://doi.org/10.1109/ SmartIndustryCon61328.2024.10516193.

8. Rashidov Akbar Ergash o’g’li, S. J. S. o’g’li. Selecting Methods Of Significant Data From Gathered Datasets For Research. 2024. DOL https://doi.org/10.5281/ZEN0D0.10781255.

9. Rashidov, A., & Madaminjonov, A. Sun’iy intellekt modelini qurishda ma’lumotlarni tozalash bosqichi tahlili: Sun’iy intellekt modelini qurishda ma’lumotlarni tozalash bosqichi tahlili // Modern problems and prospects of applied mathematics, 2024. N 1 (01). [El. Res.]: https://oj s. qarshidu. uz/index.php/mp/art icle/view/473.

10. Rashidov A., Axatov A., & Nazarov F. ICHKI TAQSIMLASH MEXANIZMIDA MA’LUMOTLAR OQIMLARINI BOSHQARISH ALGORITMI. 2024. DOL https://doi.org/10. 5281/ZEN0D0.11475559.

11. Ахатов A. P., Ашвини P., Рашидов А. Э. Угли, & Назаров Ф. M. Оптимизация количества баз данных при обработке больших данных. 2023. DOL https://doi.org/10.24412/2073-0667- 2023-1-33-47.

12. Akhatov А. & Rashidov A. Big Data va unig turli sohalardagi tadbiqi // Descendants of Muhammad ALKhwarizmi, 2021. № 4 (18), P. 135-44.

13. Bolikulov, F., Nasimov, R., Rashidov, A., Akhmedov, F., & Cho, Y.-I. Effective Methods of Categorical Data Encoding for Artificial Intelligence Algorithms // Mathematics, 2024. N 12 (16), P. 2553. DOL https://doi.org/10.3390/mathl2162553.

14. Rashidov, A., Akhatov, A., & Nazarov, F. The Same Size Distribution of Data Based on Unsupervised Clustering Algorithms. In Z. Hu, Q. Zhang, & M. He (Eds.) // Advances in Artificial Systems for Logistics Engineering III. 2023. Vol. 180, P. 437-447. Springer Nature Switzerland. DOL https://doi.org/10.1007/978-3-031-36115-9_40.

15. Hancock, J. T., & Khoshgoftaar, T. M. Survey on categorical data for neural networks // Journal of Big Data, 2020. N 7 (1), P. 28. DOL https://doi.org/10.1186/s40537-020-00305-w.

16. Larionov, М. Sampling Techniques in Bayesian Target Encoding (Version 2). arXiv. 2020. DOI: https://doi.org/10.48550/ARXIV.2006.01317.

17. Potdar, K., S., T., & D., C. A Comparative Study of Categorical Variable Encoding Techniques for Neural Network Classifiers // International Journal of Computer Applications, 2017. N 175 (4), P. 7-9. DOI: https://doi.org/10.5120/ijca2017915495.

18. Sun, W., Cai, Y., & Liu, Y. MSR14 Comparisons of Encoding Techniques for Categorical Features in Linear Regression Models // Value in Health, 2022. N 25 (7), S520. DOL https://doi. org/10.1016/j.jval.2022.04.1221.

19. Uyar, A., Bener, A., Ciray, H. N., & Bahceci, M. A frequency based encoding technique for transformation of categorical variables in mixed IVF dataset // 2009 Annual International Conference of the IEEE Engineering in Medicine and Biology Society, 2009. P. 6214-6217. DOL https://doi. org/10.1109/IEMBS.2009.5334548.

20. Seger, C. An investigation of categorical variable encoding techniques in machine learning: Binary versus one-hot and feature hashing. 2018. [El. Res.]: https://api.semanticscholar.org/ CorpusID:250534659.

21. Akhatov, A., Rashidov, A., & Renavikar, A. Optimization of the database structure based on Machine Learning algorithms in case of increased data flow. In A. Dagur, K. Singh, P. S. Mehra, & D. K. Shukla, Artificial Intelligence, Blockchain, Computing and Security, 2023. Vol. 2, 1st ed., P. 675-680. CRC Press. DOL https://doi.org/10.1201/9781032684994-109.

22. Jawthari, M., & Stoffova, V. Effect of Encoding Categorical Data on Student’s Academic Performance Using Data Mining Methods. 2020. P. 521-526. DOL https: //doi . org/10.12753/2066- 026X-20-068.

Bibliographic reference: Sh.A. Tursunov, A. E. Rashidov. ANALYSIS OF CATEGORICAL DATA ENCODING ALGORITHMS.// journal “Problems of informatics”. 2025, № 2. P.65-80. DOI: 10.24412/2073-0667-2025-2-65-80 – EDN: ALXCCT

Main menu

2025 №2(67)

CONTENTS

Main menu

You are here

2025 №2(67)

CONTENTS