Volume 1(58)

CONTENTS

Lyakhov 0. A. Renewable resources accounting in integer models of project scheduling
Skopin I. N. Time Model for Studying Evolving Systems
Akhatov A. R., Renavikar A.. Rashidov A. E., Nazarov F. M. Optimization of the number ofdatabases in the big data processing
Kossov G. A., Seleznev I. A. Influence of neural network parameters for the quality of prediction for the tasks of automatic lithotype description
Snytnikova T. V. Associative computing implementation library cuSTAR: data representation for bioinformatics problems
Kharyutkina S. A., Gavrilov A. V., Yakimenko A. A. Choosing operator emotions as feedback for training neural networks

Institute of Computational Mathematics and Mathematical Geophysics SB RAS, 630090, Novosibirsk, Russia

RENEWABLE RESOURCES ACCOUNTING IN INTEGER MODELS OF PROJECT SCHEDULING

DOI: 10.24412/2073-0667-2023-1-5-11

EDN: PWEWCU

In complicated complexes of operations scheduling renewable resources are assumed by constants that does not always agree with practice of management. Determining of renewable resources as not stored (type “power”) which non-use leads to their loss, does not fully reflect their specificity. Formalizing of redistribution of renewable resources is linked to representation conditions of their usage in models. Redistribution of resources is considered on an example of network model for minimizing an unbalance of resources at set directive times for completion scheduling.

Key words: project, network models, scheduling, renewable resources.

References

1. BALASHOV A. I., ROGOVA E. M., TIKHONOVA M. V., TKACHENKO E. A. Upravlcnie procktami. M., URAIT, 2016.

2. TZITZAROVA N. M. Upravlcnie procktami, uchcbnoc posobic, Ulyanovskii gos. universitet, Ulyanovsk. 2021. 205 P.

3. KOFMAN A., DEBAZEI G. Sctcvvc metodv planirovaniva: Primcncnic sistemv PERT i ее raznovidvostei pri upravlcnii proizvodstvennymi i nauchno-isslcdovatclskimi procktami. M.: Progress, 1968.

4. ZUKHOVITZKII S.I., RADCHIK I. A. Matcmatichcskic metody setevogo planirovaniva. M.: Nauka, 1965.

5. PRITSKER A.A.B., WATTERS L.J., WOLFE P.M. Multi Project Scheduling with Limited Resources: A Zero-One Programming Approach // Management Science, 1969, 16. P. 93-108.

6. KOLISCH R., SPRECHER A. PSPLIB — A project scheduling library // European Journal of Operational Research, 1996, V. 96. P. 205-216.

7. HARTMANN S., BRISKORN D. A Survey of Variants and Extensions of the Resource-Constrained Project Scheduling Problem // European Journal of Operational Research, 2010. V. 207. N 1. P. 1-14.

Bibliographic reference: Lyakhov 0. A. Renewable resources accounting in integer models of project scheduling//journal “Problems of informatics”. 2023, № 1. P.5-11. DOI:10.24412/2073-0667-2023-1-5-11

article

I. N. Skopin

Institute of Computational Mathematics and Mathematical Geophysics SB RAS, 630090, Novosibirsk, Russia

Novosibirsk State University, 630090, Novosibirsk, Russia

TIME MODEL FOR STUDYING EVOLVING SYSTEMS

DOI: 10.24412/2073-0667-2023-1-12-32

EDN: PXACQI

Approaches to the determination of model time in studies of developing systems are discussed. The possibility of setting the global time of the system using the local times of its elements, understood as protocols of events in which they participate, is shown. Combining all such protocols leads to a partial order of events. It is proposed to use this order as the global time of the system. The correctness of such a definition of time is shown, as well as the fact that it is well combined with the use of an event control mechanism in simulation models.

Key words: local and global time; partial order relation on the set of events; events, reaction of elements to events; event protocols.

References

1. Augustine A. Confession. Series Monuments of religious and philosophical thought. Trans, from Latin by M. K. Sergeenko. 1991. M.: Publishing house “Renaissance”, SP IVO - SID, 488 p. ISBN 5-7664-0472-7.

2. Namcstnikov A. M. Development of simulation models in the MATLAB environment !! Guidelines for students of specialties 01719, 351400. Ulyanovsk, U1GTU, 2004.

3. Shevchenko A. A. Time management in the design of simulation models !! Part of Sat. Applied Informatics. 2006. N 3. P. 113-119. Laboratory of Mathematical and Computer Modeling.

4. Skopin I. N. Local and global time in modeling developing systems !! In Proceedings of the 7th International Conf, in memory of Acad. A. P. Ershov “Perspectives of Informatics Systems”. Workshop “Science intensive software”. Novosibirsk: ООО Siberian Scientific Publishing House, 2009. P. 255-259.

5. System. Big Russian Encyclopedic Dictionary. M.: BRE. 2003. 1437 p.

6. Smirnov G. A. Ockham, William // New Philosophical Encyclopedia / Institute of Philosophy RAS; National social-scientific fund. 2nd cd., corrected, and add. M.: Thought, 2010. ISBN 978-5-244¬01115-9.

7. Skopin I. N. Hierarchical relations are the methodological basis for studying the concept of hierarchies / / Bulletin of the Peoples’ Friendship University of Russia. Series “Informatization of Education” / M.: RUDN, 2014. N 1. P. 56-63.

8. Skopin I. N. Subordination relations in the methodology for studying the concept of hierarchy !! Bulletin of the Peoples’ Friendship University of Russia. Series “Informatization of Education” / M.: RUDN, 2014. N 2. P. 35-54.

9. Dijkstra, E. W. Co-operating sequential processes. In Programming languages: NATO Advanced Study Institute: lectures given at a three weeks Summer School held in Villard-le-Lans, 1966 / ed. by F. Gcnuys (P. 43-112). Academic Press Inc.

10. Lamport L. Time, clocks, and the ordering of events in a distributed systems ff Commun. ACM. 1978. Vol. 21(7). P. 558-565.

11. Chandy К. M., Misra J. Distributed simulation: a case study in design and verification of distributed programs j j IEEE Transactions on Software Engineering. 1978. Vol. SE-5(5). P. 440-452.

12. Fersclia A. Parallel and distributed simulation of discrete event systems ff Parallel and Distributed Computing Handbook. McGraw-Hill. 1996. P. 1003-1041.

13. Kazakov Yu. P., Smelyansky R. L. On the organization of distributed simulation ff Programming. 1994. N 2. S. 45-63.

14. Fujimoto R. M. Parallel and Distributed Simulation Systems ff Wiley Interscience, 2000.

15. Fujimoto R. M. Parallel and Distributed Simulation Systems {{ Proc, of the Winter Simulation Conf. 2001. P. 147-157.

16. Dal О. I., Nygaard K. Simula — language for programming and description of systems with discrete events f f Norwegian Computing Center publ. 1967. N 11. 72 p.

17. Dal О. I. , Myhrhaug B., Nygaard K. Simula 67 universal programming language f f Perevod s angl. K. S. Kuz’mina i E. I. YAkovleva. M.: Mir, 1969.

18. Nygaard K., Dahl O.-J. The Development of the SIMULA Languages f f History of programming languages. ACM New York, NY, USA @1981. P. 439-480.

19. Nepeyvoda N. N. Skopin I. N. Fundamentals of programming. Publishing House: Moscow- Izhevsk: Institute of Computer Research. 2003. 919 p. ISBN: 5-93972-299-7.

20. Backus J. Can programming be liberated from the von Neumann style? A functional style and its algebra of programs. CACM, 21(8), August 1978. P. 613-641.

21. Malyshkin V. Assembling of Parallel Programs for Large Scale Numerical Modeling //In the Handbook of Research on Scalable Computing Technologies. IGI Global, USA, 2010, 1021 pp., Chapter 13, P. 295-311. ISBN 978-1-60566-661-7.

22. Keene S. Object-Oriented Programming in Common Lisp: A Programmer’s Guide to CLOS, 1988, Addison-Wesley. ISBN 0-201-17589-4.

23. Haskell 98 Language and Libraries. The Revised Report. Dec. [Electron. Res.]: https: / / www.haskell.org/onlinereport /

24. Akhmed-Zaki D., Lebedev D., Malyshkin V., Perepelkin V. Automated Construction of High Performance Distributed Programs in LuNA System f f PaCT-2019 proceedings, LNCS 11657, Springer, 2019, P. 3-9. DOI: 10.1007/ 978-3-030-25636-4_l

25. Akhmed-Zaki D., Lebedev D., Malyshkin V., Perepelkin V. Automatisatsiya konstruirovaniya rashredelennyh program chislennogo modelirovaniya v sisteme LuNa на primere modelnoy zadachi f f Journal “Problems of Informatics”, 2019, N 4. S. 53-64. DOI: 10.24411/2073-0667-2019-00017

26. Okolnishnikov V. V. Representation of time in simulation modeling // In Computing Technologies. 2005. Vol. 10, N 5. P. 57-80.

27. IEEE Std P1516. IEEE Standard for Modeling and Simulation (M&S) High Level Architecture (HLA) — Framework and Rules. N. Y.: Institute of Electrical and Electronics Engineers, Inc., 2000.

Bibliographic reference: Skopin I. N. Time Model for Studying Evolving Systems //journal “Problems of informatics”. 2023, № 1. P.12-32. DOI:10.24412/2073-0667-2023-1-12-32

article

A. R. Akhatov, A. Renavikar*, A. E. Rashidov, F.M. Nazarov

Samarkand State University, 140101, Samarkand, Uzbekistan

*NeARIech Solution, 411033, Pune, India

OPTIMIZATION OF THE NUMBER OF DATABASES IN THE BIG DATA PROCESSING

DOI: 10.24412/2073-0667-2023-1-33-47

EDN: QBRKTM

Today, many organizations and companies increasingly need to use Big Data in order to increase their income, strengthen competitiveness, and study the interests of customers. However, most approaches to real-time processing and analysis of Big Data are based on the cooperation of several servers. In turn, the use of multiple servers limits the possibilities of many organizations and companies due to cost, management and other parameters. This research paper presents an approach for real¬time processing and analysis of Big Data on a single server based on a distributed computing engine, and it is based on research that the approach leads to efficiency in terms of cost, reliability, integrity, network independence, and manageability. Also, in order to improve the efficiency of the approach, the methodology of optimizing the number of databases on a single server was developed. This methodology uses MinMaxScalcr, StandardScaler, RobustScaler, MaxAbsScalcr, QuantilcTransformcr Power Transformer scaling functions together with Machine Learning Linear Regression, Random Forest Regression, Multiple Linear Regression, Polynomial Regression, Lasso Regression algorithms. The obtained results were analyzed and the effectiveness of the regression algorithm and scaling function was determined for the experimental data.

Key words: Big Data, Real Time Processing, Single Server Distributed Computing Engine, Architecture, Machine Learning, Regression Algorithms, Scaling.

References

1. Alabdullah B., Beloff N., White M. Rise of Big Data — Issues and Challenges. 2018 / / 21st Saudi Computer Society National Computer Conference (NCC) 25-26 April 2018, DOL 10.1109/NCG.2018.8593166.

2. Big Data — Global Market Trajectory and Analytics. Global Industry Analysts. Inc., 2020.

3. Technology and Media, Big Data Analytics Market, Report ID: FBI 106179, Jul, 2022.

4. Amonov M. T.: The Importance of Small Business in a Market Economy / / Academic Journal of Digital Economics and Stability, 2021. V. 7. P. 61-68.

5. Akhatov A.R., Rashidov A.E. Big Data va unig turli sohalardagi tadbiqi // Descendants of Muhammad Al-Khwarizmi, 2021. N 4 (18). P. 135—44.

6. Sassi L, Anter S., Bekkhoucha A. Fast Parallel Constrained Viterbi Algorithm for Big Data wi Applications to Financial Time Series / / International Conference on Robot Systems and Applications, ICRSA 9 April 2021, P. 50-55. DOL 10.1145/3467691.3467697.

7. Alaeddine В., Nabil Н., Habiba Ch. Parallel processing using big data and machine learning techniques for intrusion detection // I AES International Journal of Artificial Intelligence (I J-AI), September 2020. V. 9. N 3. P. 553-560. DOI: 10.11591/ijai.v9.i3.pp553-560.

8. Akhatov A.R., Nazarov F.M., Rashidov A.E. Increasing data reliability by using bigdata parallelization mechanisms // ICISCT 2021: Applications, Trends and Opportunities, 3-5.11.2021, DOI: 10.1109/ICISCT52966.2021.9670387.

9. Landset S., Khoshgoftaar T.M., Richter A.N., Hasanin T. A survey of open source tools for machine learning wi big data in the Hadoop ecosystem // Journal of Big Data (2015). 2:24, DOI: 10.1186/s40537-015-0032-l.

10. Oussous A., Benjelloun F.-Z., Lahcen A. A., Belfkih S. Big Data technologies: A survey // Journal of King Saud University — Computer and Information Sciences2018. N 30. P. 431-448. DOI: 10.1016/j.jksuci.2017.06.001.

11. Tang B., Chen Z., Hefferman G., Wei T., He H., Yang Q. A Hierarchical Distributed Fog Computing Architecture for Big Data Analysis in Smart Cities // ASE BigData and Socialinformatics, ASE BD and SI 2015, DOI: 10.1145/2818869.2818898.

12. Chen P., Chun-Yang Z. Data-intensive applications, challenges, techniques and technologies: A survey on Big Data // Information Sciences, 10 August 2014. V. 275. P. 314-347. 10.1016/j.ins.2014.01.015.

13. Kunanets N., Vasiuta O., Boiko N. Advanced Technologies of Big Data Research in Distributed Information Systems // International Scientific and Technical Conference on Computer Sciences and Information Technologies, September 2019. P. 71-76. DOI: 10.1109/STC-CSIT.2019.8929756.

14. Smeliansky R. L. Model of Distributed Computing System Operation wi Time // Programming and Computer Software, 2013. V. 39. N 5. P. 233-241. DOI: 10.1134/S0361768813050046.

15. Akhatov A., Nazarov F., Rashidov A. Mechanisms of information reliability in big data and blockchain technologies // ICISCT 2021: Applications, Trends and Opportunities, 3-5.11.2021, DOI: 10.1109/ICISCT52966.2021.9670052.

16. B.M. Alom, Henskens F., Hannaford M. Query Processing and Optimization in Distributed Database Systems // IJCSNS International Journal of Computer Science and Network Security, Sept. 2009. V. 9. N 9. P. 143-152.

17. Fabian P., Alfonsa K. Efficient distributed query processing for autonomous RDF databases // International Conference on Extending Database Technology, EDBT 2012. DOI: 10.1145/2247596.2247640.

18. Ali A., Hamidah I., Izura U. N., Fatimah S. Processing skyline queries in incomplete distributed databases // Journal of Intelligent Information Systems, 2017. N 48. P. 399-420. DOI: 10.1007/sl0844- 016-0419-2.

19. Reyes-Ortiz J.L., Oneto L., Anguita D. Big Data Analytics in the Cloud: Spark on Hadoop vs MPI/OpenMP on Beowulf // Procedia Computer Science, 2015. N 53. P. 121-130. DOI: 10.1016/j.procs.2015.07.286.

20. Reis Marco Antonio de Sousa, de Araujo Aleteia Patricia Favacho. ArchaDIA: An Architecture for Big Data as a Service in Private Cloud // CLOSER 2019 — 9th International Conference on Cloud Computing and Sendees Science, P. 187-197, DOI: 10.5220/0007787801870197.

21. Sandhu A.K. Big Data wi Cloud Computing: Discussions and Challenges // Big Data Mining And Analytics, 2022. V. 5. P. 32-40. DOI: 10.26599/BDMA.2021.9020016.

22. Nagarajan R., Thirunavukarasu R. Big Data Analytics in Cloud Computing: Effective Deployment of Data Analytics Tools // IGI Global, 2022, 17 pages, DOI: 10.4018/978-1-6684-3662- 2.ch011.

23. Wu C. Research on Clustering Algorithm Based on Big Data Background // Journal of Physics: Conf. 2019. Ser. 1237. P. 22-131. DOI: 10.1088/1742-6596/1237/2/022131.

24. Kurasova О., Marcinkevicius V., Medvedev V., Rapecka A., Stefanovic Р. Strategies for Big Data Clustering // IEEE 26th International Conference on Tools wi Artificial Intelligence, 2014. P. 739-747. DOI: 10.1109/ICT AI.2014.115.

25. Garlasu D., Sandulescu V., Halcu I., Neculoiu G., Grigoriu O., Marinescu M., Marinescu V. A Big Data implementation based on Grid Computing // Conference: Roedunet International Conference (RoEduNet), 2013 11th, DOI: 10.1109/RoEduNet.2013.6511732.

26. Yuanyuan J. Smart grid big data processing technology and cloud computing application status quo and challenges // 2022 IEEE 2nd International Conference on Power, Electronics and Computer Applications (ICPECA), 21-23 January 2022, DOI: 10.1109/ICPECA53709.2022.9719287.

27. Akhatov A.R., Sabharwal M., Nazarov F.M., Rashidov A.E. Application of cryptographic methods to blockchain technology to increase data reliability // 2nd International Conference on Advance Computing and Innovative Technologies in Engineering 2022, 28-29 April, DOI: 10.1109 ICACHT.53722.2022.9823674.

28. Bollegala D. Dynamic Feature Scaling for Online Learning of Binary Classifiers // Knowledge¬Based Systems, July 2014, DOI: 10.1016/j.knosys.2017.05.010.

Bibliographic reference: Akhatov A. R., Renavikar A.. Rashidov A. E., Nazarov F. M. Optimization of the number of databases in the big data processing //journal “Problems of informatics”. 2023, № 1. P.33-47. DOI:10.24412/2073-0667-2023-1-33-47

article

G.A. Kossov, I. A. Seleznev

LLC “TCS”, 125171, Moskow, Russia

INFLUENCE OF NEURAL NETWORK PARAMETERS FOR THE QUALITY OF PREDICTION FOR THE TASKS OF AUTOMATIC LITHOTYPE DESCRIPTION

DOI 10.24412/2073-0667-2023-1-48-59

EDN: QQFRGC

Machine learning methods are widely used for solving problems of interpreting and describing geological and geophysical data. One of them is automatic lithology extraction during the analysis of a whole core photographs. In this paper we propose to analyze the parameters that represent the textural and color features of the images. The advantage of this approach is that it allows online training and retraining of the classification model. Among the existing classification methods, such as boosting, random forests, support vector machines, neural networks are preferred for their universality and implementation in various sets of programming tools. The application of neural networks requires the user to have a clear understanding of the modelling goals, because an important factor is the choice of model architecture.

There are many parameters that are set by the user, and all of them affect the quality of the prediction. Therefore, the purpose of this research is to study the behavior of networks with various configurations and to find any common regularities. The paper considers the problem of classifying lithotypes using fully connected neural networks. The data for processing are color and textural features that were obtained as a result of the processing of whole core images. Thus, we consider the classification task of training examples with 48 features into 20 classes corresponding to certain lithotypes. The test sample consisted of 2998 elements. We trained the model on samples consisting of 10,000 and 1,000 elements, respectively. The hyperparameters of the model include loss function, optimization method, activation function, batch size, number of epochs, number of hidden layers, and number of neurons in a layer. Based on a given issue, it is already possible to explain the choice of one or another parameter or function in advance. For the classification problem the optimal way is using ReLU and LogSoftMax activation function. CrossEntropyLoss was used as a loss function. This loss function combines LogSoftMax and NLLLoss, so the use of LogSoftMax is also justified by simplifying the calculation of CrossEntropyLoss. We use the Adam algorithm as the method of optimization. The quality of the model was evaluated using the fl-score metric. According to the results of training a model with a fixed number of layers and nodes, but with a different batch size, it was figured out that the optimal batch size consists of 256 elements. Based on this assumption we determined that 30 epochs are enough to train the model. All in all among a large set of network hyperparameters it is complicated to determine the exact number of network elements, i.e. the number of layers and neurons. Therefore, in the current research we study the dependence of fl-score and the value of the loss function on the number of nodes in the layer. The paper shows that an increase in the number of neurons definitely leads to a gain in quality. Fl-score equals 1 for all cases after 10 neurons in a layer. Moreover, a model with incorrect number of layers can be improved by increasing the amount of neurons in each layer. Increasing the number of layers allows the model to construct a more complex approximation,

which can improve the quality of the prediction. However, as the number of layers increases, there is a risk of network overfitting and the appearance of local minima of the error function that leads to training problems. Thus, the number of nodes in a layer is the defining parameter and we should set this parameter up first. An important factor in the model training is the time spending. In this research, we propose a following estimate of the algorithm complexity. Besides, we have studied the influence of the number of layers (m) and nodes (n). The estimate is given in terms of O-notation. It is shown that the number of performed operations increase linearly O(m) in the number of layers and cubically O(n3) in the number of neurons. Consequently, with relation to the number of operations it is preferably to increase the number of network layers. However, many elements does not guarantee the rise in the fl-score. The predictions of some classification algorithms (for example, boosting or random forest) are highly dependent on the first initialization of the parameters. In our case, the dependence of the loss value on the random initialization of the neural network weights was investigated. We use the Epps-Pally test to check the normality of the loss value distribution. Tests have shown that the distribution of the value of the loss is not a Gaussian one. This fact should be taken into account in setting the requirement for the reproducibility of experiments result. The starting model weights should be initialized accordingly.

Key words: neural network, lithotype description, core analysis, hyperparameters, supervised learning.

References

1. Thomas A., et al. Automated lithology extraction from core photographs // First Break. 2011. V. 29. N 6.

2. Baraboshkin E. E., et al. Deep convolutions for in-depth automated rock typing // Computers and Geosciences. 2020. V. 135. P. 104330.

3. Abashkin V. V., et al. Quantitative analysis of whole core photos for continental oilfield of Western Siberia // SPE Russian Petroleum Technology Conference, OnePetro, 2020.

4. Seleznev I. A., et al. Joint Usage of Whole Core Images Obtained in Different Frequency Ranges for the Tasks of Automatic Lithotype Description and Modeling of Rocks’ Petrophysics Properties // Geomodel 2020, European Association of Geoscientists and Engineers, 2020. V. 2020. N 1. P. 1-5.

5. Amirgaliev E. N., i dr. Integratsiva algoritmov raspoznavaniva litologicheskikh tipov // Problemv informatiki. 2013. № 4 (21). S. 11-20.

6. Chang В. T. T., i dr. Klassifikatsiya izobrazhenii na osnove primeneniya tsvetovoi informatsii, veivlet-preobrazovaniya Khaara i mnogosloinoi neironnoi seti // Problemy informatiki. 2011. № 5. S. 81-86.

7. Mukhamedgaliev A. F., Razakova M. G., Smirnov V. V. Sozdanie i razvitie geoinformatsionnykh tekhnologii tematicheskoi interpretatsii dannykh radiolokatsionnogo zondirovaniya s ispol’zovaniem matematicheskikh metodov i vychislitel’nykh algoritmov teksturnoi klassifikatsii i neironnykh setei // Problemy informatiki. 2012. № 3. S. 69-73.

8. Manurangsi, P., Reichman, D. The computational complexity of training ReLU (s). arXiv:1810.04207v2 [cs.CC]. 2018.

9. Kingma, D. P., Ba, J. Adam: A method for stochastic optimization. arXiv:1412.6980 [cs.LG]. 2014.

10. Maksimushkin V. V., Arzamascev A. A. Sravnitel’naya ocenka vychislitel’noj slozhnosti obucheniya iskusstvennoj nejronnoj seti s zhestkim yadrom i seti s klassicheskoj strukturoj // Vestnik Tambovskogo universiteta. Seriya: Estestvennye i tekhnicheskie nauki. 2006. T. 11. № 2. S. 190-197.

11. Makienko D., Seleznev I., Safonov I. The effect of the imbalanced training dataset on the quality of classification of lithotypes via whole core photos // Creative Commons License Attribution. 2020. V. 4.

12. П. Bernard, S., Heutte, L., Adam, S. Influence of hyperparameters on random forest accuracy // International workshop on multiple classifier systems, Springer, Berlin, Heidelberg, 2009. P. 171-180.

13. Epps, T. W., Pulley, L. B. A test for normality based on the empirical characteristic function // Biometrika. 1983. V. 70. N 3. P. 723-726.

14. GOST R. 5479-2002. Statisticheskie metodv. Proverka otkloneniva raspredeleniva verovatnostei ot normal’nogo raspredeleniva / M.: Izd-vo standartov, 2002.

15. Lemeshko B. YU. Kriterii proverki otkloneniya raspredeleniva ot normal’nogo zakona. Rukovodstvo po primeneniyu / B. YU. Lemeshko. M.: ООО «Nauchno-izdatel’skij centr INFRA-M», 2015. 160 s.

Bibliographic reference: Kossov G. A., Seleznev I. A. Influence of neural network parameters for the quality of prediction for the tasks of automatic lithotype description //journal “Problems of informatics”. 2023, № 1. P.48-59. DOI:10.24412/2073-0667-2023-1-48-59

article

T. V. Snytnikova

Institute of Computational Mathematics and Mathematical Geophysics SB RAS, 630090, Novosibirsk, Russia

ASSOCIATIVE COMPUTING IMPLEMENTATION LIBRARY CUSTAR: DATA REPRESENTATION FOR BIOINFORMATICS PROBLEMS

DOI: 10.24412/2073-0667-2023-1-60-68

EDN: QWFFMA

Over the past few years, genome processing has become a widely sought-after task. Both medical laboratories (from PCR tests to genetic passports) and research teams are engaged in various processing options. At the same time, both the first and the second process large amounts of data either due to the number of samples, or due to the length of these samples: from tens of thousands to several billion nucleotides. Note that a huge part of the calculations is related to the search for individual nucleotides or their sequences in a larger sequence or in a large number of sequences. So it is advisable to use associative parallel computing. But associative architectures are not represented on the computer hardware market, unlike widely available graphics accelerators. The cuSTAR library was designed to implement associative computing model STAR-machine on graphics accelerators. In this paper, a method of organizing data for processing genomes by associative algorithms is proposed.

In this paper, we propose several methods of data organization. Such an organization allows the use of associative algorithms to solve various tasks related to genome processing. Let’s recall a brief description of the associative model of the STAR machine, and its cuSTAR implementation. Both the castor library and its STAR machine model use three types of data for associative processing. The Table type stores data as a binary table. The Slice type is used to access the bit column, and the word type is used to access the bit string. It should be noted that data processing is performed mainly using¬bit columns. Therefore, the presentation of data in the cuSTAR system is fundamentally different. Usually, a sequence of nucleotides is represented by a array of characters. It can be considered as a binary table in which the rows specify one character. That is, the data is stored line by line. To use cuSTAR, a variable of type Table is stored by columns.

The alphabet of nucleotides consists of the symbols A (adenine), C (cytosine), G (guanine) and T (thymine). Also, the “—” symbol is often used in the data to indicate possible gaps in reading, insertions or deletions in the nucleotide sequence. Thus, four or five characters are used, depending on the task. We propose two ways to encode a sequence of nucleotides. The first method is optimized for memory usage. The second method is optimized for the search time of the nucleotide in the sequence. The memory-optimized method uses the following encoding: “000” for “—” symbol, “001” for adenine, “011” for cytosine, “101” for guanine, “111” for thymine. The time-optimized method uses the following encoding: “1000” for adenine, “0100” for cytosine, “0010” for guanine, “0001” for thymine. It uses 4 bits instead of 3 bits, but allows you to replace the task of searching for a word in the table with a less time-consuming one. To find all occurrences of a nucleotide in the sequence, one needs to determine the position “1” in the code of this nucleotide. The proposed data encoding methods are more compact than the standard representation in the form of an array of characters. The time-optimized method makes it possible to search for nucleotides in a sequence an order of magnitude faster than the procedure from

the t memory-optimized method. But the memory-optimized method is preferable if the representation of the nucleotide sequence in the form of a graph is used. And in this case, the de Bruijn graph is constructed from the original sequence of nucleotides in a trivial way. Although with symbolic encoding of nucleotides, this is a time-consuming and memory-consuming task.

When using cuSTAR, it is easy to construct a de Bruijn graph from a sequence of nucleotides of any parameter k. The graph is given by a list of edges, which is one of the standard representation for associative processing. Note that by defining the graph as a list of edges, we avoid problems associated with repeating arcs.

When reading the sequence, a table GEN of size 31 is formed, where 1 is the length of the input sequence. For a graph given by a list of arcs, we form tables LEFT and RIGHT of size 3k(l — k). The table LEFT is obtained by copying к times the columns of the GEN into the corresponding columns with an upward shift. In turn, the table RIGHT is obtained by copying with a shift up one row of the table LEFT. Copying of all tables is performed in parallel.

Since genome processing involves multiple searches over a large amount of data, the development of associative algorithms for this area is relevant. The applied value of the work consists in the possibility of executing these algorithms on graphics accelerators — widespread equipment from personal computers to cluster systems.

Key words: associative parallel algorithms, bioinformatics, GPU, CUDA.

References

1. Snytnikova T. V., Nepomniaschaya A. Sh. Reshenie zadach na grafah s pomoshch’yu STAR- mashiny, realizuemoj na graficheskih uskoritelyah // Prikladnaya diskretnaya matematika. 2016. Vol. 3 (33). P. 98-115.

2. Snvtnikova T. V. Realizaciva modeli associativnvh vvchislenii na gpu: biblioteka bazovvh procedur vazvka star. // Vvchislitel’nve metodv i programmirovanie. Novve vvchislitel’nve tekhnologii. 2018. Vol. 19. P. 85-95.

3. Compeau Ph., Pevzner P. A., Tesler G. How to apply de Bruijn graphs to genome assembly. // Nature Biotechnology. 2011. Vol. 29(11). P. 987-991.

Bibliographic reference: Snytnikova T. V. Associative computing implementation library cuSTAR: data representation for bioinformatics problems //journal “Problems of informatics”. 2023, № 1. P.60-68. DOI:10.24412/2073-0667-2023-1-60-68

article

S.A. Kharyutkina, A.V. Gavrilov, A.A. Yakimenko

Novosibirsk State Technical University, 630073, Novosibirsk, Russia

CHOOSING OPERATOR EMOTIONS AS FEEDBACK FOR TRAINING NEURAL NETWORKS

DOI: 10.24412/2073-0667-2023-1-69-76

EDN: QWXYBT

The work is devoted to the study and selection of human emotions with the highest probability of recognition for training neural networks using operator emotions as feedback. On the basis of the presented program, experiments were set up and conducted to study emotions. The following emotions were studied in the work: “anger”, “disgust”, “fright”, ‘happiness”, “sadness”, “surprise” and “neutral emotion”. During the experiments, human emotions were determined, which are recognized by the program with the greatest probability. The average values of the probability of successful or unsuccessful recognition were calculated, and the similarity of emotions was analyzed. Assumptions are made about the use of operator emotions as feedback for training neural networks. The problem of reducing the time for training a neural network aimed at solving socially significant economic problems is solved. It is assumed that the approach will expand the scope of neural networks in non-corc industries by reducing the requirements for the operator/programmer and computing resources.

Key words: artificial intelligence, neural network, emotions.

References

1. Calvo R. A., D’Mello S. Affect detection: an interdisciplinary review of models, methods, and their applications // IEEE transactions on affective computing / 2010. N 1(1). P. 18-37.

2. Neiberg R. D., Elenius K. Automatic recognition of anger in spontaneous speech /7 Proc, intcrspccch, Brisbane, Australia, 2008. P. 22-26.

3. Theodoros Kostoulas R., Mporas Iosif, Kocsis Otilia, Ganchev Todor, Katsaounos Nikos, Santamaria Juan J., Jimenez-Murcia Susana, Fernandez-Aranda Fernando, Fakotakis Nikos. Affective speech interface in serious games for supporting therapy of mental disorders / / Expert Syst. Appl. 2012. N .39. P. 11072-9.

4. Zhao S., Rudzicz F., Carvalho L.G., Marquez-Chin C., Livingstone S. Automatic detection of expressed emotion in Parkinson’s disease / / IEEE international conference on acoustics, speech and signal processing (ICASSP), Florence. P. 4813-7.

5. Petrushin V. Emotion recognition in speech signal: experimental study, development, and application, international conference on spoken language processing. 2000. P. 222-225.

6. Lalitha S., Tripathi S. Emotion detection using perceptual based speech features / / IEEE annual India conference (INDICON), Bangalore. 2016. P. 1-5.

7. Investigation of multilingual and mixed-lingual emotion recognition using enhanced cues with data augmentation / S. Latitha, D. Gupta, M. Zakariah, Y. A. Alotaibi // Applied Acoustics. 2020. N 170.

8. Deep reinforcement learning for robust emotion classification in facial expression recognition / H. Li, H. Xu // Knowledge-Based Systems. 2020. N 204.

9. Michael R. I., Sam E. A Survey on Human Face Expression Recognition Techniques [J] // Journal of King Saud University Computer & Information Sciences, 2018.

10. Ryumina E. V., Karpov A. A. / Analytical review of emotion recognition methods by human facial expression // Scientific and Technical Bulletin of information technologies, Mechanics and optics. 2020. P. 163-176.

Bibliographic reference: Kharyutkina S. A., Gavrilov A. V., Yakimenko A. A. Choosing operator emotions as feedback for training neural networks //journal “Problems of informatics”. 2023, № 1. P.69-76. DOI:10.24412/2073-0667-2023-1-69-76

article

Main menu

You are here

Volume 1(58)

Languages