2022 №1(54)

CONTENTS

  1. Vasiliev E., Bolotov D., Bolotov M., Smirnov L. Neural network approach to solving the problem of self-action of wave fields in nonlinear media
  2. Isupov K, Knyazkov V, Korzhavina A.  Implementation and Convergence Evaluation of Multiple-Precision Iterative CG and PCG Solvers for GPUs         
  3. Pirova A. Yu. Hybrid MPI + OpenMP algorithm for symmetric spare matrix reordering and its application to the solving systems of linear equations       
  4. Kholkin S., Filimonov A. Finding graph chromatic number through deep learning             
  5. Starostin N., Shtanyuk A.. Godovitsyn M., Zhivchikova J. Parallel algorithm for implementing logical operations on sets of orthogonal polygons                
  6. Kaledina E.A., Kaledin O.E., Kulyagina T. I. Applying machine learning for prediction of cardiovascular diseases on small data sets            
  7. Rodionov D.M., Karchkov D.A., Moskalenko V. A.. Nikolsky A. V., Osipov G. V, Zolotykh N.Yu. Diagnosis of sinus rhythm and atrial fibrillation using artificial intelligence    

E. Vasiliev, D. Bolotov, M. Bolotov, L. Smirnov

Lobachevsky State University, 603022, Nizhny Novgorod, Russia

NEURAL NETWORK APPROACH TO SOLVING THE PROBLEM OF SELF-ACTION OF WAVE FIELDS IN NONLINEAR MEDIA

UDC: 517.957

DOI:10.24412/2073-0667-2022-1-5-16

We consider the problem of propagation of optical impulses in media with Kerr nonlinearity. As a mathematical model describing an optical pulse propagation process, we chose a generalized parabolic equation, which in dimensionless variables has the form of a one-dimensional modified Nonlinear Schrödinger Equation. We trained a fully connected neural network with various optimization functions and did experiments with network configuration and hyperparameters  optimization. The conducted experiments have shown the promise of using the quasi-Newtonian L-BGFS optimization function over first-order optimization functions in this problem.

The article presents the results of experiments on training the model using various optimization functions: SDG, RMSProp, Adam, L-BGFS, the last optimizer allows you to train the network about an order faster. We also consider the size of the training sample required to train the model. From the obtained training results, we can conclude that due to the randomly uniform selection of points from the area using the Latin hypercube, it is enough to make a train sample of 10-15% of the dataset, this will correspond to a step of about 0.12 in z and 0.022 in τ compared to 0.039 in z and 0.008 in τ in a regular grid obtained by numerical methods.

In low-dimensional problems, the use of machine learning is not always appropriate, since training takes much more time than solving the problem using direct numerical simulation. However, as the complexity of the system increases, due to the increase in the number of unknown variables, a huge superiority of machine learning methods is expected due to fast calculation using an already trained network. Also open and interesting for future research is the issue of fast retraining of an already trained model for a problem with new parameters.

Key words: nonlinear Schrödinger equation, neural networks, deep learning, optimization functions

Bibliographic reference:  Vasiliev E., Bolotov D., Bolotov M., Smirnov L. Neural network approach to solving the problem of self-action of wave fields in nonlinear media  //journal “Problems of informatics”. 2022, № 1. P.5-16. DOI:10.24412/2073-0667-2022-1- 5-16


K. Isupov, V. Knyazkov*, A. Korzhavina

Vyatka State University, Kirov 610000, Russia
* Penza State University, Penza 440026, Russia

IMPLEMENTATION AND CONVERGENCE EVALUATION OF MULTIPLE-PRECISION ITERATIVE CG AND PCG SOLVERS FOR GPUS

UDC: 004.222+004.272.25

DOI: 10.24412/2073-0667-2022-1-17-27

To solve systems of linear equations with sparse coefficient matrices, iterative Krylov subspace methods such as conjugate gradients (CG) and preconditioned conjugate gradients (PCG) are widely used. The convergence rate and numerical reliability of these methods may suffer from rounding errors. One way to reduce the impact of rounding errors is to use arithmetic over numbers with increased word length, which is called multiple-precision arithmetic. In the paper, we consider multiple-precision iterative CG and PCG solvers for graphics processing units (GPUs) and evaluate their convergence rate. The preconditioned implementation uses a diagonal preconditioner, which is well suited for parallel computing. The residue number system is employed to support multiple-precision floating-point arithmetic. The matrix-vector product is implemented as a hybrid kernel, in which a double-precision matrix, represented in the CSR storage format, is multiplied by a dense multiple-precision vector. A two-stage algorithm is used to compute the parallel multiple-precision dot product. Experimental results with sparse matrices of different sizes show that higher arithmetic precision improves the convergence rate of iterative methods.

Key words: sparse linear algebra, conjugate gradient method, multiple-precision arithmetic, residue number system, CUDA.

Bibliographic reference: Isupov K, Knyazkov V, Korzhavina A. Implementation and Performance Evaluation of Multiple Precision Sparse Matrix-Vector Multiplication on CUDA Using the Residue Number System  //journal “Problems of informatics”. 2022, № 1. P.17-27. DOI:10.24412/2073-0667-2022-1-17-27


A.Yu. Pirova

Lobachevsky State University, 603950, Nizhny Novgorod, Russia

HYBRID MPI + OPENMP ALGORITHM FOR SYMMETRIC SPARE MATRIX REORDERING  AND ITS APPLICATION TO THE SOLVING SYSTEMS OF LINEAR EQUATIONS

UDC 519.688, 004.428

DOI: 10.24412/2073-0667-2022-1-28-41

Systems of linear equations (SLAE) with large sparse matrix arise from numerical simulations in various scientific and engineering applications. SLAE solution is one of the time and memory-consuming stages of modeling, which requires efficient implementation on modern supercomputers. Depending on the properties of the matrix, direct and iterative methods of solving such systems are applied. Direct methods are based on matrix factorization into two triangular matrices and backward solution of these systems. A feature of direct methods is that the number of non-zero elements during the factorization step can significantly increase over the initial matrix. Symmetric row and column reordering is used before numerical factorization to minimize the fill-in. A suitable permutation reduces the memory consumption for storing the factor and the time required for the most time-consuming stage – numerical factorization. Therefore, it is important to develop new parallel reordering algorithms that reduce the total time for solving SLAEs with a sparse matrix, and hence, the time of numerical simulation in applications.

The problem of finding the ordering that minimizes the factor fill-in is NP-hard. In practice, two heuristic approaches are commonly used: the minimum degree and nested dissection algorithms. The nested dissection algorithm is built on the divide-and-conquer principle. At each step of the algorithm, a graph separator is found. The separator vertices divide the graph into two disconnected subgraphs of similar size. Then the separator vertices are numbered and removed from the graph, and the algorithm operates recursively on new subgraphs. The process stops when all vertices are numbered. There are many modifications of the nested dissection method that differ in the algorithm for finding the separator. Since 1993, the modifications of the nested dissection algorithm employing the multilevel approach are used in parallel computations.

Most sparse SLAE solvers have built-in implementations of reordering methods, as well as an interface for using third-party libraries and user permutations. Open source libraries ParMETIS and PT-Scotch are widely used in the academic community. They implement a multilevel nested dissection algorithm for distributed memory systems using MPI technology. In particular, the open academic direct solver MUMPS has an interface for using these libraries. The ParMETIS library also has a version for shared memory systems called mt-metis. The commercial solver Intel MKL PARDISO and its cluster version Intel Cluster PARDISO are examples of tight integration of the solver and ordering routine. They include optimized versions of the METIS library, designed for shared-memory and distributed-memory systems, but its approaches to optimizing and combining shared and distributed memory computations during the reordering process have not been published.

Earlier, we presented the PMORSy and DMORSy reordering libraries that implement the multilevel nested section method for shared- and distributed-memory systems, respectively. The ordering algorithm in PMORSy is based on parallel processing of the task queue. The task is to find the separator in the current subgraph. After calculating the separator, new subgraphs go to the shared task queue, from which they are assigned for execution to free threads. The algorithm is implemented using OpenMP technology. The parallel algorithm in DMORSy is based on parallel processing of a binary tree constructed during the nested dissection method. Let the original graph be distributed among P processes. Then its separator is calculated in parallel by all P processes using a multilevel algorithm similar to that used in PT-Scotch. After the separator is found, new subgraphs are redistributed to P / 2 processes each, and the ordering process is continued independently for new subgraphs. The algorithm is implemented using MPI technology.

In this paper, we propose a hybrid MPI + OpenMP parallel algorithm for distributed memory systems, in which the use of processes and threads within a single computing node is combined. The algorithm is constructed as follows. While the input graph is distributed among P > 1 processes, its separator is calculated in parallel by all P processes using a multilevel algorithm from DMORSy. Once the input subgraph is stored on a single process, the ordering is performed using a parallel task queue according to the algorithm from PMORSy. The combination of parallelization schemes includes their advantages: the scalability of the distributed memory algorithm and less factor fill-in obtained by the algorithm on shared memory.

We show the competitiveness of the implementation in comparison with analogs in terms of ordering time and factor fill-in. We test our implementation on 37 symmetric matrices from the SuiteSparse Matrix Collection and LOGOS-FEM Collection. Computational experiments show that the use of a hybrid parallelization scheme in comparison with the pure MPI DMORSy parallelization reduces the run-time of reordering by 5 % on average when working on two computational nodes. In comparison with ParMETIS, hybrid DMORSy works faster on most matrices, the average advantage is 2 times for matrices over 1 million rows and 4.9 times for matrices less than 1 million rows. Moreover, hybrid DMORSy produces orderings of 8 % on average smaller fill-in than ParMETIS for a 2 / 3 of test matrices. Further, we tested the obtained permutations for solving the SLAEs. For this, a series of experiments were carried out with the open source solver MUMPS. The use of the hybrid DMORSy permutations allowed us to reduce the run-time of the solver on 26 matrices out of 37 in comparison with the use of permutations from ParMETIS (on average, by 26 %). For most of these matrices, the lead is obtained both by reducing the ordering time and by the time of numerical factorization.

Key words: nested dissection ordering, sparse matrix reordering, distributed-memory parallel algorithms, sparse matrix solve.

Bibliographic reference: Pirova A. Yu. Hybrid MPI + OpenMP algorithm for symmetric spare matrix reordering and its application to the solving systems of linear equations /journal “Problems of informatics”. 2022, № 1. P. 28-41. DOI:10.24412/2073-0667-2022-1-28-41


S. Kholkin, A. Filimonov

National Research Lobachevsky State University of Nizhny Novgorod, 603950, Nizhny Novgorod, Russia

FINDING GRAPH CHROMATIC NUMBER THROUGH DEEP LEARNING

UDC 004.855.5

DOI: 10.24412/2073-0667-2022-1-42-54

Deep learning in recent years has become state-of-the-art type of algorithms in various spheres such as computer vision and natural language processing. Number of deep learning architectures is growing and so is field of processing graphs through these architectures. Graph Neural Network (GNN) is deep learning architecture that have an ability to reflect graph structure and learn various patterns that are effectively used in graph related domains such as drug discovery or traffic forecasting. GNN embeds vertices as vectors that reflect vertex related features that are updated by function that depends on vertex neighbors, in some ways that can be interpreted as message passing between vertices, also update function is permutation invariant what gives GNN unique ability not to depend on vertices order or vertices number, graph of any size can be fed into particular GNN.  Many studies on applying deep learning technics for combinatorial optimization is underway and since many combinatorial problems either originally represented in terms of graph theory or can be converted to such, GNN seems to be a natural choice for finding their approximate solutions. Recent studies confirm this by showing great results in solving Travelling Salesman Problem or Boolean Satisfiability Problem. In this paper we train GNN to find chromatic number of graph or to solve decision version of Graph Coloring Problem. Neural Network is trained in supervised manner as a binary classifier, given graph and number of colors model should return a probability that graph can or cannot be colored by given number of colors. Should be noticed that GNN finds chromatic number but doesn’t find vertices coloring itself, hence GNN can predict lower chromatic. Model architecture reflect color-vertex relationship by assigning vector embedding to each color and passing them into update embeddings function so they are treated like vertices embeddings. GNN is built in a recurrent manner with adjustable number of timesteps that reflect depth of message passing between vertices. At every timestep of recurrency model updates vertex and colors embeddings through aggregating information from its neighbors and/or colors through recurrent unit that stands as an update function (Long Short-Term Memory network with LayerNorm in our case). For additional expressivity MLP layers were added before recurrent unit at each timestep.  After embedding updates vertices embeddings are being fed into classifier MLP to get a probability that graph is able to be colored by given number of colors. For training GNN in supervised manner labeled data is needed. Natural datasets related to graph coloring are unsuitable for training and validation through heterogeneity and small size. Data was generated following phase transition paradigm, for each graph instance in dataset we have a paired one that differs only by one edge but have higher chromatic number and that pair reflects hard to color pair of graphs. Because of this data generation paradigm GNN optimization objective becomes more difficult to reach and hopefully makes GNN more knowledgeable. Two datasets were generated CHROM_40_60 and CHROM_35_45 with 40 to 60 or 35 to 45 number of vertices in graph instance respectively and with 3 to 8 chromatic numbers both. Instances of CHROM_35_45 dataset are supposed to be easier to color than CHROM_40_60 ones. GNN was trained using that data and compared to heuristics: tabucol and greedy. Percentage of correctly predicted graphs chromatic numbers were taken as a comparison metric. Testing on CHROM_40_60 and CHROM_35_45 datasets showed that GNN have slightly better accuracy metric than tabucol: GNN correctly predict 69 % and 77 % of graph chromatic numbers, when tabucol correctly predict 66 % and 76 % of graph chromatic numbers, greedy is far behind. To test GNN generalization ability cross dataset experiments were performed, GNN trained on CHROM_40_60 dataset is validated on CHROM_35_45 dataset and vice versa. These experiments show that model trained on CHROM_40_60 harder instances show decent accuracy on CHROM_35_45 easier instances while model trained on CHROM_35_45 easier instances perform weaker on CHROM_40_60 hard instances but still keeps some accuracy hence we can say that model has an ability to generalize to graphs with other number of vertices. Testing on natural instances of COLOR02/03/04 dataset which instances differ a lot from generated train/test data was performed. GNN trained on CHROM_40_60 shows accuracy similar to tabucol and much better than greedy. GNN behaves unstable on instances with chromatic number higher that was seen in training dataset. We can say that GNN have an ability to generalize on graphs from other distributions with similar chromatic number and doesn’t have an ability to generalize on graphs with unseen chromatic number. Referring to computational complexity execution time measurements show that GNN is approximately 100 times faster than tabucol and 30 times slower than greedy. There are problems with data generation because such algo of data generation have exponential asymptotics and hence data with number of vertices higher than 60 and chromatic number higher than 8 takes very long time that limits GNN train possibilities and performance on higher than 8 chromatic numbers. Addressing further work GNN model can be changed to more expressive, or manner of training could be changed to semi-supervised or supervised/semi-supervised hybrid frameworks that could lead not only to finding chromatic number but to finding vertex coloring itself. In total GNN was trained in supervised manner to find chromatic number of graph so that it have an ability to generalize on graphs with various sizes and shows accuracy on presented data similar to tabucol and at the same is much faster.

Key words: neural network, deep learning, chromatic number, combinatorial optimization

Bibliographic reference: Kholkin S., Filimonov A. Finding graph chromatic number through deep learning //journal “Problems of informatics”. 2022, № 1. P. 42-54. DOI:10.24412/2073-0667-2022-1- 42-54


N. Starostin, A. Shtanyuk, M. Godovitsyn, J. Zhivchikova

Lobachevsky University, 603950 Nizhni Novgorod, Russia

PARALLEL ALGORITHM FOR IMPLEMENTING LOGICAL OPERATIONS ON SETS OF ORTHOGONAL POLYGONS

UDC 519.175.1, 621.3.049.771.14

DOI: 10.24412/2073-0667-2022-1-55-65

The IC designing main task is obtaining a workable crystal layout, which will be used to create a template for fabrication. It is important to perform a verification cycle of the obtained layout before transferring the designed solutions to production. The first stage of verification (DRC) consists of design rules layout according to the check. These rules are described by the system of norms, restrictions, rules and procedures regulating permissible mutual arrangement of topological elements and topological structures, taking into account design features and possibilities of technological process. There are two phases in almost any verification procedure: the first one is the search of topological elements and specific areas, the second one is the check for design rules compliance directly.

It is necessary to use logical operations on orthogonal rectangles that make up the topology of an integrated circuit. Such operations as union, intersection, subtraction are performed over layers that contain orthogonal rectangles (polygons). These operations are subject to stringent execution time requirements. The traditional bitmap rectangle representation does not allow for a quasi-linear time dependence on the processed data and requires development of new algorithms and approaches to polygon representation.

It is proposed to describe the polygon by the coordinates of the polyline nodes (boundary nodes). Such a representation will be called contour representation, which is de facto standard representation for the layout at the verification stage. In contour representation any topological layer is described by enumeration of all contours that form polygon boundaries. Each contour begins with an arbitrary starting node and ends with a finite node, which always coincides with the starting node. Such a representation is compact - the memory cost depends linearly on the number of nodes or edges of the layer contours. Problem of that kind of representation is that logical operations require information not about the boundaries, but the inner regions of polygons. 

To resolve that issue, the "sweeping line" scheme is used. The idea is to exploit three properties of a polygon. First, any edge of a polygon boundary always separates the interior region of the polygon from the exterior. Second, any edge of an orthogonal polygon is either vertical or horizontal. Third, if the plane is dissected by straight lines according to the vertical and horizontal edges into rectangular fragments, then any such fragment will belong entirely to either the interior or the exterior region of the polygon. 

The procedures for input of polygon contour representations, which are divided into sets of vertical and horizontal edges, are described. As a result of performing logical operations, polygon edges of the resulting layer are formed, which, in turn, are converted into contour representations.

If we take into account that we can use algorithm implementations based on the classical interval tree as functions providing work with half-intervals, then the computational complexity of this procedure in memory and in time is estimated as O(N log N), where – is the number of horizontal edges of layer polygons.

A parallel implementation based on geometric decomposition of input data has been developed to function effectively in a high-performance computing environment. The theoretical study showed linear scalability of this implementation on systems with distributed memory, but implementation as a multithreaded application revealed competition for shared resources. The paper presents the results of a computational experiment and outlines the ways to eliminate the competition effect. 

The implementation of algorithms and procedures presented in this work were included in the plug-in library of functions for working with topological layers, which, in its turn, is a part of the design rules checking system being developed. This library is implemented in C++, using C++17 standard. Implementation of parallel schemes to implement logical operations is done using the OpenMP library.

The results of a computational experiment confirming the nature of the time dependences determined theoretically are presented. 

We propose the structure of a software system for DRC, built with the use of programming languages C++ and Lua.

Key words: CAD, DRC, logical operations, orthogonal rectangles, polygons, sweeping line modified algorithm, IC, parallel algorithm, OpenMP

Bibliographic reference: Starostin N., Shtanyuk A.. Godovitsyn M., Zhivchikova J. Parallel algorithm for implementing logical operations on sets of orthogonal polygons //journal “Problems of informatics”. 2022, № 1. P. 55-65. DOI:10.24412/2073-0667-2022-1- 55-65


E. A. Kaledina, O. E. Kaledin, T. I. Kulyagina*

 Ogarev Mordovia State University, 430005, Saransk, Russia
*IE Wilhelm, 430005, Saransk, Russia

APPLYING MACHINE LEARNING FOR PREDICTION OF CARDIOVASCULAR DISEASES ON SMALL DATA SETS

UDC 004.89

DOI: 10.24412/2073-0667-2022-1-66-76

As a result of increasing computing power and generating large amounts of data, artificial intelligence algorithms are currently being actively used to perform a wide range of medical tasks. One of the most important areas in which the use of artificial intelligence can be useful is the diagnosis of various diseases and the prediction of their possible outcomes.

Cardiovascular diseases are one of the main factors of mortality and disability in most countries of the world, including the Russian Federation. The most important risk factor for two major cardiovascular diseases (myocardial infarction and cerebral stroke) is arterial hypertension. Therefore, the main task of primary prevention of cardiovascular diseases (CVD) is the timely detection of a high risk of fatal CVD in patients with diagnosed uncomplicated arterial hypertension. The use of machine learning algorithms can solve this problem and significantly improve the accuracy of predicting cardiovascular diseases and their complications. Machine learning methods are the main tool of artificial intelligence, the use of which allows you to automate the processing and analysis of big data, identify hidden or non-obvious patterns on this basis, and extract new knowledge.

This article describes the process of using machine learning algorithms to predict the risk of developing adverse cardiovascular events in patients with diagnosed arterial hypertension in the next 12, 24 and 36 months. The analysis included 16 predictors, which are a combination of both standard indicators of the risk of cardiovascular diseases (age, male sex, smoking, elevated cholesterol, impaired uric acid metabolism), and some specific indicators. A distinctive feature of this task is the use of local data collected in a separate region of the Russian Federation as a training data set. This feature can improve the adaptability of the predictive model to possible local features of the development of cardiovascular diseases, however, it also has a significant drawback - a small amount of training data, which contributes to model retraining and, as a result, a decrease in its ability to generalize.

The target feature in the study is a binary predictive vector of major adverse cardiovascular events at three reference time points. Due to the fact that censoring, as well as some of the considered cardiovascular diseases, can occur simultaneously or be repeated throughout all or part of the observation period, the study is formally presented as a solution to the multilabel classification problem. The paper indicates the stages of forming a data set and explores predictive machine learning algorithms on small sets to create a model for calculating the risks of cardiovascular diseases. The advantages and disadvantages of individual ensemble methods of machine learning machine learning methods (binary relevance, multioutput classifier, label powerset, MLkNN, classifier chain) for the development of predictive algorithms in the conditions of the problem are shown.

From the results of the study, we can say that the machine learning algorithms - multioutput classifier and labelpowerset on a small dataset showed the best result among all the analyzed methods for assessing the development of cardiovascular diseases. This fact makes it relevant to study the application of this method on samples of large volumes, with the inclusion of a larger set of risk factors.

Key words: machine learning algorithms, data analysis, cardiovascular disease prediction.

Bibliographic reference: Kaledina E.A., Kaledin O.E., Kulyagina T. I. Applying machine learning for prediction of cardiovascular diseases on small data sets //journal “Problems of informatics”. 2022, № 1. P. 66-76. DOI:10.24412/2073-0667-2022-1- 66-76


D. M. Rodionov*, D. A. Karchkov*, V. A. Moskalenko*, A. V. Nikolskiy*'** , G. V. Osipov* , N. Yu. Zolotykh*

* Lobachevsky State University, 603950 Nizhni Novgorod, Russia
**City Clinical Hospital No. 5, Nizhny Novgorod

DIAGNOSIS OF SINUS RHYTHM AND ATRIAL FIBRILLATION USING ARTIFICIAL INTELLIGENCE

UDC: 004.891.3

DOI: 10.24412/2073-0667-2022-1-77-88

The electrocardiogram (ECG) is the most used biological signal recording in clinical medicine. The ECG signal is a graph of the electrical activity of the heart obtained from the surface of the body, most often non-invasively, using electrodes. In the early days of electrocardiography, the doctor had to look at a graph written on a piece of paper, recognizing possible pathologies with his eyes, which often led to errors in the diagnosis. Today, there are many decision support systems based on complex algorithms that help the doctor in the search for artifacts that establish both the type of pathology and the localization of its markers in the signal. However, there are a large number of diagnoses, the detection of which by the developed algorithms is not effective. Moreover, such algorithms, rarely, but make a mistake. Experts see a promising solution for eliminating existing shortcomings in expert systems in the application of artificial intelligence methods that have shown their effectiveness in a variety of applied tasks. Within the framework of this article, the use of neural networks for solving diagnostic problems is considered UNet, adapted for processing a one-dimensional ECG signal, was chosen as the basic architecture of the neural network. Among a wide range of conditions of the human cardiovascular system, the main attention was focused on the detection of long-duration signal sections classified by specialists as complexes with the prevalence of sinus rhythm and atrial fibrillation (atrial fibrillation). It should be noted that the neural network considered in the framework of the work, after the necessary improvements, will be integrated into the existing diagnostic complex "Cardio- Lighthouse", developed on the basis of Lobachevsky University.

Key words: analysis of the electrocardiogram signal, artificial intelligence in medicine, sinus rhythm, atrial fibrillation, atrial fibrillation.

Bibliographic reference: Rodionov D.M., Karchkov D.A., Moskalenko V. A.. Nikolsky A. V., Osipov G. V, Zolotykh N.Yu. Diagnosis of sinus rhythm and atrial fibrillation using artificial intelligence //journal “Problems of informatics”. 2022, № 1. P. 77-88. DOI:10.24412/2073-0667-2022-1- 77-88