2021 №1(50)

CONTENTS

  1. Brikkel D. M., Erofeev V. I. Influence of material damage on the parameters of a nonlinear flexible and longitudinal wave which spread in a beam
  2. Kuleshov A. S., Solomina D. V. Application of the Kovacic algorithm to the problem of rolling of a heavy homogeneous ball on a surface of revolution              
  3. Kupriyanova T. V., Kislitsyn D. I. Application of machine learning methods in building construction          
  4. Petrov S. S. Finite element model of sea ice dynamics and its parallel implementation using the INMOST library
  5. Isupov K, Knyazkov V., Babeshko L, Krutikov A. Implementation and Performance Evaluation of Multiple Precision Sparse Matrix-Vector Multiplication on CUDA Using the Residue Number System        
  6. Chaplygin A. V., Gusev A. V. Shallow water model using a hybrid MPI/OpenMP parallel programming    

D.M. Brikkel*, V.L Erofeev*’**

*National Research Lobachevsky State University of Nizhny Novgorod,603950, Nizhny Novgorod, Russia

**Mechanical Engineering Research Institute of RAS,603024, Nizhny Novgorod, Russia

INFLUENCE OF MATERIAL DAMAGE ON THE PARAMETERS OF A NONLINEAR FLEXIBLE AND LONGITUDINAL WAVE WHICH SPREAD IN A BEAM

DOI: 10.24411/2073-0667-2021-10001

As a rule, in the mechanics of solids, problems of dynamics are considered separately from problems of damage accumulation. When developing such methods, it is customary to postulate in advance that the elastic wave velocity is a given damage function, and then experimentally determine the proportionality coefficients. The phase wave velocity and wave attenuation are usually considered to be power functions of frequency and linear functions of damage. With its undoubted advantages (simplicity), this approach has a number of disadvantages, like any approach that is not based on mathematical models of processes and systems.

The authors of consider the problem to be self-consistent, including, in addition to the equation of damage development, the dynamic equation of the theory of elasticity. This approach made it possible to consider a number of applied problems of wave dynamics of damaged materials and structural elements.

Damage is usually understood as a reduction in the elastic response of the body due to a reduction in the effective area that transfers internal forces from one part of the body to another, its part, caused, in turn, by the appearance and development of a scattered field of micro defects (micro cracks — in elasticity, dislocations — in plasticity, micro pores — during creep, surface micro cracks — during fatigue).

Not directly measurable (such as speed, force or temperature), damage, i.e. degradation of the mechanical properties of the body can be detected by analyzing the body’s response to various external influences. According to experimental practice, the presence of a damage field in materials can be indirectly detected and partly quantitatively represented through a decrease in the ultrasonic signal propagation rate, a decrease in Young’s modulus („modulus defect11), and a decrease in density („loosening"), change in hardness, drop in electric potential, drop in stress amplitude during cyclic testing, acceleration of creep in the third stage.

In traditional calculations, the scalar damageability parameter  which characterizes the

relative density of micro defects uniformly dispersed per unit volume, is taken as a measure of damage during deformation development. This parameter is zero when there is no damage and is close to unity at the moment of failure.

An approach is proposed that allows one to obtain new dependences of the nonlinear flexural and longitudinal waves parameters on the material damage degree. In particular, the problem of the formation of intense bending waves of a stationary profile is considered within the framework of a geometrically nonlinear model of a damaged beam, as well as the problem of the formation of longitudinal waves, where geometric and physical nonlinearities are additionally taken into account.

During the study, when nonlinear flexural waves appear in an element, the presence of both periodic and solitary (localized in space) non-sinusoidal waves was determined. It is shown that the amplitudes of solitary and periodic waves increase with an increase in the damage parameter, and the length of the periodic and the width of a solitary wave decrease with an increase in the considered parameter. When nonlinear longitudinal waves appear in the element, it is proved that the width of the soliton decreases with decreasing damage parameter, in turn, the soliton velocity increases linearly with increasing damage parameter. New dependences of wave parameters (amplitude, width, wavelength) are determined taking into account their geometric nonlinearity on the parameters of material damage. This work is a continuation of a number of articles on a new approach that allows formulating a mathematical model describing the propagation of waves in elements, taking into account the accumulation of damage in their materials.

Key words: Bending wave, longitudinal wave, Material damage, material damage parameter, geometric nonlinearity, physical nonlinearity.

References

1. Kachanov L.M. Osnovy mekhaniki razrusheniya. M.: Nauka. 1974. 311 p.
2. Rabotnov Y. N. Polzuchest’ elementov konstrukcij. M.: Nauka. 1966. 752 p.
3. Stulov A., Erofeev V. Frequency-dependent attenuation and phase velocity dispersion of an acoustic wave propagating in the media with damages // Generalized Continua as Models for Classical and Advanced Materials. Series: Advances Structured Materials, Altenbach, Holm, Forest, Samuel (Eds.), Springer, Switzerland. 2016. V. 42. P. 413-423.
4. Moiseev N.N. Asimptoticheskie metody nelinejnoj mekhaniki. M.: Nauka. 1981. 400 p.
5. Porubov A. V. Lokalizaciya nelinejnyh voln deformacii. M.: Fizmatlit. 2009. 208 p.
 
Bibliographic reference:    Brikkel D. M., Erofeev V. I. Influence of material damage on the parameters of a nonlinear flexible and longitudinal wave which spread in a beam //journal “Problems of informatics”. 2021, № 1. P.6-14. DOI: 10.24411/2073-0667-2021-10001

A. S. Kuleshov, D.V. Solomina

Department of Mechanics and Mathematics, Lomonosov Moscow State University, 119234, Moscow, Russia

APPLICATION OF THE KOVACIC ALGORITHM TO THE PROBLEM OF ROLLING OF A HEAVY HOMOGENEOUS BALL ON A SURFACE OF REVOLUTION

DOI: 10.24411/2073-0667-2021-10002

Problems of description of the motion of bodies that roll on a fixed or moving rigid surface have a long history. They are closely related to the process of formation and development of a large branch of analytical mechanics, namely, dynamics of nonholonomic systems. In works of I. Newton, L. Euler, I. Bernoulli, J. D’Alembert and J. Lagrange, some problems on the rolling of rigid bodies without sliding were studied; these problems are typical in the dynamics of systems with nonholonomic constraints and hence they are considered as classical problems of nonholonomic mechanics.

One of such classical problems is the problem of rolling without sliding of a homogeneous ball on a fixed surface under the action of gravity. For the first time this problem has been studied by E. J. Routh in his classical treatise „The Advanced Part of a Treatise on the Dynamics of a System of Rigid Bodies11. Routh derived the equations of motion of the ball in semifixed axes and indicated the cases of their integrability. In particular, Routh firstly proved that a homogeneous ball, moving within a vertical cylinder under the action of gravity, does not tending downwards on the average. This physical fact may be observed while playing basketball, when the ball has almost hit the basket, but then rapidly jumps out of it, suddenly lifting upwards. Actually, the majority of the subsequent authors were just stating the results, obtained by Routh, without any essential additions. In particular, this problem has been considered in details by Ju. I. Neimark and N. A. Fufaev in his classical monograph „Dynamics of Nonholonomic Systems11.

Usually, when considering this problem, following the E. J. Routh approach it is convenient to define explicitly the surface, on which the ball’s centre of gravity is moving and not the supporting surface along which the ball rolls. The surface, on which the ball’s centre of gravity is moving, is equidistant to the surface, over which the ball rolls. From the classical works of E. J. Routh and F. Noether it was known that if the ball rolls on a surface such that its centre of gravity moves along a surface of revolution, then the problem is reduced to solving the second order linear differential equation with respect to the projection of velocity of the ball’s centre of gravity onto the tangent to the parallel of the corresponding surface of revolution. However it is impossible to find the general solution of the corresponding second order linear differential equation for the arbitrary surface of revolution in explicit form. Therefore it is interesting to study for which surface of revolution the general solution of the corresponding second order linear differential equation can be expressed explicitly, in particular, in terms of liouvillian functions. Recall that liouvillian functions are functions constructed from rational functions by algebraic operations, taking exponentials, and integration. To solve this problem it is possible to apply the so-called Kovacic algorithm to the corresponding second order linear differential equation. In 1986, the American mathematician J. Kovacic presented a very effective algorithm for finding a general solution of a second order linear differential equation with variable coefficients for the case where this solution can be expressed in terms of liouvillian functions. If a linear differential equation has no liouvillian solutions, the Kovacic algorithm also allows one to ascertain this fact. The necessary condition for application of the Kovacic algorithm to a second order linear differential equation is that the coefficients of this equation should be rational functions of independent variable. There are no new ideas in the algorithm. It is simply brute-force calculations. The algorithm does require that the partial fraction expansion of the coefficients of the differential equation be known, thus one needs to factor a polynomial in one independent variable over the complex numbers into linear factors. Once the partial fraction expansions are known, only linear algebra is required.

In this paper we apply the Kovacic algorithm to the problem of rolling of a heavy homogeneous ball on a fixed surface such that the centre of gravity of the ball moves along a given surface of revolution. We present our own method to derive the corresponding second order linear differential equation, the integration of which leads to the solution of the problem. In the case when the centre of gravity of the ball moves along the paraboloid of revolution we are presenting the corresponding linear differential equation in explicit form and reduce its coefficients to a form of rational functions. Using the Kovacic algorithm we prove that the general solution of the corresponding second order linear differential equation is expressed in terms of liouvillian functions for all values of parameters of the problem.

Key words: rolling without sliding; homogeneous ball; surface of revolution; Kovacic algorithm; Liouvillian solutions.

References

1. Routh E. J. The Advanced Part of a Treatise on the Dynamics of a System of Rigid Bodies: Being Part II of a Treatise on the Whole Subject. Cambridge: Cambridge University Press, 2013.
2. Noether F. Uber rollende Bewegung einer Kugel auf Rotationsflachen. Leipzig: Teubner, 1909.
3.  Kovacic J. An algorithm for solving second order linear homogeneous differential equations // J. Symb. Comput. 1986. Vol. 2. Issue 1. P. 3-43.
4.  Kaplansky I. An Introduction to Differential Algebra. Paris: Hermann, 1957.
5.  Do Carmo M. P. Differential geometry of curves and surfaces. New Jersey: Prentice-Hall, Englewood Cliffs, 1976.

Bibliographic reference: Kuleshov A. S., Solomina D. V. Application of the Kovacic algorithm to the problem of rolling of a heavy homogeneous ball on a surface of revolution //journal “Problems of informatics”. 2021, № 1. P.15-24. DOI: 10.24411/2073-0667-2021-10002

article


T.V. Kupriyanova, D.I. Kislitsyn

Nizhny Novgorod State University of Architecture and Civil Engineering, 603950, Nizhny Novgorod, Russia

APPLICATION OF MACHINE LEARNING METHODS IN BUILDING CONSTRUCTION

DOI: 10.24411/2073-0667-2021-10003

Construction is one of the largest sectors of the economy, which meets the needs of all rapidly developing countries. However, this industry is the most traumatic. Throughout the construction process, incidents of varying severity occur on construction sites every day. Along with this, the demand for innovative projects in the construction sector is increasing every year, so the question arose about the use of machine learning methods in production on the construction site. After all, it is thanks to artificial intelligence and machine learning that you can minimize project risks, secure and optimize construction processes. The purpose of the research work is to offer an option for the development of an automated module for construction production using machine learning methods. It provides functions for predicting and ranking incidents on the construction site, as well as automatically assigning a person responsible for closing the incident. To achieve this goal, certain tasks were solved. Namely: familiarization with existing methods of solving the problem, identification of possible options for collecting information about the situation at the construction site, analysis of available data for the distribution of incidents by their characteristics into appropriate categories, research of predictive machine learning algorithms and their ready-made implementations, and as a result, an automated module was presented at the initial stage of development and its effectiveness was evaluated. In the course of this work, a possible program interface with the necessary sections was demonstrated: the authorization (registration) page, the main page, the personal account, the sections of construction projects and providing data on incidents, as well as the incident archive and technical support. When the program is running, it communicates with the database, which will store all the data obtained for the entire period of construction production. In addition, the analysis of the received information from various sources of the case monitoring system in the construction industry will be carried out, then the incidents will be divided by signs into the appropriate categories and their color marking according to the degree of importance. In this case, each incident is recorded in a database that displays a set of single events that carry certain attributes. After analyzing an array of historical data, machine learning creates an algorithm for ranking incidents. The description of the incident is characterized by its type, time and place of occurrence, the device on which everything was observed, and some other information. All properties, with the exception of time, are divided into categories, and color- coded to facilitate recognition, depending on the degree of importance. For example, the 4th level is red — the most important, the Oth level is green-the problem is solved. The program also assumes the function of automatically creating requests for the elimination of incidents that have occurred and assigning responsible persons for closing emerging problem cases. Initially, machine learning receives data about employees who can eliminate the incident, and then, based on the standard machine learning algorithm — an ensemble of decision trees — the most suitable person is automatically selected. As a result of applying machine learning methods, the person responsible for solving the incident will be installed, to which all attributes obtained from monitoring systems are transmitted. And after the problem is resolved, the responsible person changes the incident status to „Closed11. After that, the data becomes historical, and in the future can be used by machine learning when a similar situation occurs to guide its solution. When implementing the proposed automated module for construction production using machine learning methods, positive results are expected, including minimizing the number of missed incidents and reducing the response time to emerging problem situations.

Key words: machine learning algorithms, data mining, ranking of incidents on the construction site.

References

1. The rating of accidents at the construction site / [Electron Res.]: // Proraboff.RF website — https://xn--80aclbcbgb9aa.xn--plai/rejting_neschastnyh_sluchaev_na_strojke/. Date of request 26.09.2020.
2.  Hebert, Jeff. Predicting rare failure events using classification trees on large scale manufacturing data with complex interactions // 2016 IEEE International Conference on Big Data, BigData 2016, Washington DC, USA, December 5-8, 2016. P. 2024-2028.
3.  Hastie, Trevor, Tibshirani, Robert, & Friedman, Jerome. The Elements of Statistical Learning. Springer Series in Statistics. New York, NY, USA: Springer New York Inc. 2001.
4.  Smart construction technology: as machine learning to predict and prevent accidents on the construction site / J. Kanner. Your window to the world CADS: [Electron Res.]: http://isicad.ru/ ru/articles.php?article_num=20712. Date of request 27.09.2020.
5.  Application of machine learning to the ranking of incidents on the Moscow Railway / P. Y. Boyko, E. M. Bykov, E. I. Sokolov, D. A. Yarotsky. // Information technologies and computer systems. 2017. N 2. S. 43-53.      '              

Bibliographic reference:  Kupriyanova T. V., Kislitsyn D. I. Application of machine learning methods in building construction //journal “Problems of informatics”. 2021, № 1. P.15-24. DOI: 10.24411/2073-0667-2021-10003 

article   


S. Petrov

Marchuk Institute of Numerical Mathematics of the Russian Academy of Sciences (INM RAS), 119333, Moscow, Russia
Moscow Center for Fundamental and Applied Mathematics at the INM RAS, Moscow, Russia

FINITE ELEMENT MODEL OF SEA ICE DYNAMICS AND ITS PARALLEL IMPLEMENTATION USING THE INMOST LIBRARY

DOI: 10.24411/2073-0667-2021-10004

This work is devoted to the numerical implementation of the dynamic core of finite-element sea ice drift model with nonlinear viscous-plastic rheology. Sea ice models of this level are an integral part of modern ocean dynamics models, which are used both for short-term and seasonal weather forecasts and for predicting changes in the Earth’s climate. A technology for generating an irregular triangulation for the Arctic Ocean and adjacent seas with a thickening of the grid in areas with a potentially high ice concentration, near the coast, and in narrow straits, taking into account data on the contours of the continents and data of ice cover, is described. A description of the algorithm for interpolating geodata into model grid nodes is given. In the second part of the work, a finite element implementation of the model with various time integration schemes for the momentum balance equation is presented. To integrate the equations of mass and ice concentration transfer, the Taylor-Galerkin scheme with flux correction was implemented. An optimization of the time integration scheme for the momentum balance equation is proposed, which allows to accelerate the standard stationary mEVP method used in modern ice drift models (CICE, LIM, FESIM). The idea of the proposed nonstationary mEVP- opt method is to approximate the iteration parameter to the locally optimal one obtained from the estimation of the integration step in the approximation of the linearized transition operator. The results of a model numerical experiment to reproduce the most computationally complex mode of slow ridging are presented. The result is compared to the Picard method one with 10 pseudo-iterations, which gives high accuracy at high computational costs. It is shown that the new mEVP-opt method provides a significant reduction in the computation time compared to mEVP with a slight increase in the number of operations.

Key words: modeling of sea ice dynamics, viscous-plastic rheology, INMOST, Ani-2D, Ani-3D.

References

1.  Hibler W. D. A Dynamic Thermodynamic Sea Ice Model // Journal of Physical Oceanography. 1979. V. 9. N 4. P. 815-846.              '               '               '
2.  Hunke E. C., Dukowicz J. K. An Elastic-Viscous-Plastic Model for Sea Ice Dynamics // Journal of Physical Ocaenography. 1997. V. 27. N 9. P. 1849-1867.
3.  Bouillon S., Fichefet T., Legat V., Madec G. The elastic-viscous-plastic method revisited // Ocean Modelling. 2013. V. 71. P. 2-12.
4.  Kimmritz M., Danilov S., Losch M. The adaptive EVP method for solving the sea ice momentum equation // Ocean Modelling. 2016., V. 101.
5.  Wessel Р., Smith W. Н. F. A Global Self-consistent, Hierarchical, High-resolution Shoreline Database // Journal of Geophysical Research. 2000. V. 101, P. 8741-8743.
6.  Wessel P., Luis J.F., Uieda L., Scharroo R., Wobbe F., Smith W. H. F., Tian D. The Generic Mapping Tools version 6 // Geochemistry, Geophysics, Geosystems. 2019. V. 20. P. 5556-5564.
7.  Danilov A. Unstructured tetrahedral mesh generation technology // Comp. Math, and Math. Phys. 2010. V. 50. N 1. P. 139-156.
8.  Meier W. N., Fetterer F., Windnagel A. K. 2017. Near-Real-Time NOAA/NSIDC Climate Data Record of Passive Microwave Sea Ice Concentration. V. 1.
9.  Danilov A., Terekhov K., Konshin I., Vassilevski Y. INMOST parallel platform: Framework for numerical modeling // Supercomputing Frontiers and Innovations. 2015. V. 2. P. 55-66.
10.  Karypis G., Kumar V. A Fast and Highly Quality Multilevel Scheme for Partitioning Irregular Graphs // SIAM Journal on Scientific Computing. 1999. V. 20. N 1. P. 359-392.
 11. [Electron. Res.]: https://www.mcs.anl.gov/petsc/.
12.  Sakov P., Counillon F., Bertino L., Lisжter К. A., Oke P. R., Korablev A. TOPAZ4: an ocean-sea ice data assimilation system for the North Atlantic and Arctic // Ocean Sci. 2012. V. 8. P. 633-656.
13.  [Electron. Res.]: https://www.copernicus.eu/en.
15.  Hutchings J. K., Jasak H., Laxon S. W. A strength implicit correction scheme for the viscous-plastic sea ice model // Ocean Modell. 2004. V. 7. P. 111-133.
16.  Kuzmin D., Turek S. Flux correction tools for finite elements // J. Comput. Phys. 2001. P. 525-558.             '
17.  Olshanskij M. A. Lekciiiuprazhneniya po mnogosetochnymmetodam // Fizmatlit. 2005. S. 24¬25.          '               '
18.  Danilov S., Wang Q., Timmermann R., Iakovlev N., Sidorenko D., Kimmritz M., Jung T., Schrpter J. Finite-Element Sea Ice Model (FESIM), version 2 // Geosci. Model Dev. 2015. V. 8. P. 1747-1761.

Bibliographic reference: Petrov S. S. Finite element model of sea ice dynamics and its parallel implementation using the INMOST library //journal “Problems of informatics”. 2021, № 1. P.65-82. DOI: 10.24411/2073-0667-2021-10004

article


K. Isupov, V. Knyazkov*, I. Babeshko, A. Krutikov

Vyatka State University, 610000, Kirov, Russia
* Penza State University, 440026, Penza, Russia

IMPLEMENTATION AND PERFORMANCE EVALUATION OF MULTIPLE PRECISION SPARSE MATRIX-VECTOR MULTIPLICATION ON CUDA USING THE RESIDUE NUMBER SYSTEM

DOI: 10.24411/2073-0667-2021-10005

The solution of large systems of linear equations with sparse matrices is usually done using iterative methods that involve repeated computation of the sparse matrix-vector product (SpMV). The SpMV plays an important role in many scientific and engineering applications, and a lot of effort has been made to implement efficient SpMV kernels for highly parallel computing architectures such as graphics processing units (GPUs). However, round-off errors that occur when using finite precision floating-point arithmetic can cause poor convergence of iterative solvers. In this article, we present SpMV implementations for GPUs using multiple-precision floating-point arithmetic that are part of MPRES- BLAS, an open-source CUDA library of multiple-precision linear algebra operations (available at https://github.com/kisupov/mpres-blas).

We first give a brief description of the multiple-precision floating-point number representation supported in MPRES-BLAS. Unlike traditional approaches to expressing high-precision numbers, our representation is based on the residue number system (RNS). The RNS is an unweighted number system that is an alternative to the positional number representation. In the RNS, a number is represented by its residues modulo several pairwise relatively prime integers called the moduli. The RNS is interesting, in particular, because it provides efficient addition, subtraction and multiplication of large integers, the residues of a number are mutually independent, and the calculations with each residue are performed in the ring of integers modulo the corresponding modulus, so that carry chains between the residues are eliminated.

The SpMV performance depends on the format used to store the sparse matrix. In this article, we use the multiple-precision variants of CSR (Compressed Sparse Row) and ELLPACK sparse storage formats. The CSR format is widely used in iterative solvers, since it requires storing only nonzero matrix elements. In turn, the ELLPACK format is efficient when implemented on a GPU, provided the number of nonzero elements in each row of the matrix does not vary significantly.

Our multiple-precision SpMV algorithm consists of two steps. At the first step, the matrix and vector are multiplied element-by-element, and the computed array of component-wise products is stored in a global memory buffer. This step is highly parallelizable thanks to the use of RNS, namely, both all elements of the intermediate array and all digits of the significand (mantissa) for each multiple-precision element are computed in parallel. Each multiple-precision multiplication is performed by n GPU threads, where n is the size of the RNS moduli set. At the second step, the summation kernel is

This research was funded by the Russian Science Foundation grant number 20-71-00046.

launched, which perform a per-row reduction of the intermediate array to produce the output vector. In this step, each multiple-precision addition is performed by one thread.

In the experimental part of the paper, we evaluated the performance of our multiple-precision SpMV routines on the marinel, largebasis, ESOC, Goodwin_127 and nd6k matrices from SuiteSparse Matrix Collection (formerly known as the University of Florida Sparse Matrix Collection).

We compare the proposed SpMV with implementations based on existing multiple-precision general-purpose arithmetic libraries for CUDA-enabled GPUs, namely, CUMP and CAMPARY. We carry out the experiments on an NVIDIA RTX 2080 graphics card (46 streaming multiprocessors, 8 GB of GDDR6 memory, compute capability version 7.5). The results indicate that, with the same precision, the proposed SpMV routines in many cases provide significantly better performance than the CUMP and CAMPARY library implementations. One exception is the 106-bit precision case, where the CAMPARY library is much faster.

Comparison of the CSR and ELLPACK formats shows that ELLPACK outperforms CSR at a moderate level of precision (106-212 bits). With higher arithmetic precision (424-868 bits), the ELLPACK advantages become less apparent. Moreover, the memory requirements of ELLPACK can significantly exceed the requirements of CSR, especially in the context of higher precision. We note that matrices from real-world applications usually consist of single or double precision entries, and there is no need to convert them to higher precision. Considering this aspect, we plan to implement multiplication of a sparse double-precision matrix by a high-precision vector and use it to improve the convergence of iterative solvers.

Key words: Sparsematrices, SpMV, multiple-precisionarithmetic, residue number system, GPU programming.

References

1. Filippone S., Cardellini V., Barbieri D., Fanfarillo A. Sparse Matrix-Vector Multiplication on GPGPUs // ACM Transactions on Mathematical Software. 2017. Vol. 43, N 4. Article No. 30. DOL 10.1145/3017994.
2. Bell N., Garland M. Efficient Sparse Matrix-Vector Multiplication on CUDA. Technical Report NVR-2008-004. NVIDIA Corporation. 2008. [Electron. Res.]: https://www.nvidia.com/docs/IO/66889/nvr-2008-004.pdf (date of access: 10.01.2021).
3. Greathouse J. L., Daga M. Efficient sparse matrix-vector multiplication on GPUs using the CSR storage format // Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC14, November 16-21, 2014, New Orleans, LA, USA. IEEE, 2014. P. 769-780. DOI: 10.1109/SC.2014.68.
4. Liu Y., Schmidt B. LightSpMV: Faster CSR-based sparse matrix-vector multiplication on CUDA- enabled GPUs // Proceedings of the 2015 IEEE 26th International Conference on Application-specific Systems, Architectures and Processors (ASAP), July 27-29, 2015, Toronto, ON, Canada. IEEE, 2015. P. 82-89. DOI: 10.1109/ASAP.2015.7245713.                '
5.  Sedaghati N., Mu T., Pouchet L.-N., Parthasarathy S., Sadayappan P. Automatic selection of sparse matrix representation on GPUs // Proceedings of the 29th ACM on International Conference on Supercomputing, ICS’15, June 8-11, 2015 at Newport Beach, CA, USA. ACM, 2015. P. 99-108. DOI: 10.1145/2751205.2751244.
6.  Cools S., Yetkin E. F., Agullo E., Giraud L., Vamoose W. Analyzing the Effect of Local Rounding Error Propagation on the Maximal Attainable Accuracy of the Pipelined Conjugate Gradient Method // SIAM Journal on Matrix Analysis and Applications. 2018. Vol. 39, N 1. P. 426-450. DOI: 10.1137/17M1117872.
7.  Saito Т., Ishiwata Е., Hasegawa Н. Analysis of the GCR method with mixed precision arithmetic using QuPAT // Journal of Computational Science. 2012. Vol. 3, N 3. P. 87-91. DOI: 10.1016/j.jocs.2011.05.001.
8.  Shewchuk J. R. Adaptive Precision Floating-Point Arithmetic and Fast Robust Geometric Predicates // Discrete and Computational Geometry. 1997. Vol. 18, N 3. P. 305-363. DOI: 10.1007/PL00009321.
9.   Masui K., Ogino M. Research on the Convergence of Iterative Method Using Mixed Precision Calculation Solving Complex Symmetric Linear Equation // IEEE Transactions on Magnetics. 2020. Vol. 56, N 1. Article no. 7503604. DOI: 10.1109/TMAG.2019.2951280.
10. Mukunoki D., Takahashi D. Using Quadruple Precision Arithmetic to Accelerate Krylov Subspace Methods on GPUs // Lecture Notes in Computer Science. 2014. Vol. 8384. P. 632-642. DOI: 10.1007/978-3-642-55224-3_59.
11. Hishinuma T., Hasegawa H., Tanaka T. SIMD Parallel Sparse Matrix-Vector and Transposed- Matrix-Vector Multiplication in DD Precision // Lecture Notes in Computer Science. 2017. Vol. 10150. P. 21-34. DOI: 10.1007/978-3-319-61982-8_4.
12. Kouya T. A highly efficient implementation of multiple precision sparse matrix-vector multiplication and its application to product-type Krylov subspace methods // International Journal of Numerical Methods and Applications. 2012. Vol. 7, N 2. P. 107-119.
13. Hirokawa Y., Itoh T., Tadano H., Ikuno S. Speedup and numerical evaluation of multiple-precision Krylov subspace method using GPU cluster for large-sparse linear system // Proceedings of the international conference for high performance computing, networking, storage and analysis, SC13, November 17-22, Denver, Colorado, USA. Colorado Convention Center, 2013.
14. Mohan Ananda P. A. Residue Number Systems. Theory and Applications. Cham: Birkhauser. 2016.
15. Omondi A., Premkumar B. Residue Number Systems: Theory and Implementation. Imperial College Press, 2007.
16.  Isupov K. Using Floating-Point Intervals for Non-Modular Computations in Residue Number System // IEEE Access. 2020. Vol. 8. P. 58603-58619. DOI: 10.1109/ACCESS.2020.2982365.
17.  Isupov K., Knyazkov V., Kuvaev A. Design and Implementation of Multiple-Precision BLAS Level 1 Functions for Graphics Processing Units // Journal of Parallel and Distributed Computing. 2020. Vol. 140. P. 25-36. DOI: 10.1016/j.jpdc.2020.02.006.
18.  Joldes M., Muller J. M., Popescu V. Implementation and Performance Evaluation of an Extended Precision Floating-Point Arithmetic Library for High-Accuracy Semidefinite Programming // Proceedings of the 24th IEEE Symposium on Computer Arithmetic, ARITH, July 24-26, 2017, London, UK. IEEE, 2017. P. 27-34. DOI: 10.1109/ARITH.2017.18.
19.  Nakayama T., Takahashi D. Implementation of Multiple-Precision Floating-Point Arithmetic Library for GPU Computing // Proceedings of the 23rd IASTED International Conference on Parallel and Distributed Computing and Systems, PDCS 2011, December 14-16, 2011, Dallas, USA. ACTA Press, 2011. P. 343-349.

Bibliographic reference: Isupov K, Knyazkov V., Babeshko L, Krutikov A. Implementation and Performance Evaluation of Multiple Precision Sparse Matrix-Vector Multiplication on CUDA Using the Residue Number System //journal “Problems of informatics”. 2021, № 1. P.49-64 DOI: 10.24411/2073-0667-2021-10005

article


A.V. Chaplygin, A.V. Gusev

Marchuk Institute of Numerical Mathematics, Russian Academy of Sciences, 119333, Moscow, Russia

SHALLOW WATER MODEL USING A HYBRID MPI/OPENMP PARALLEL PROGRAMMING

DOI: 10.24411/2073-0667-2021-10006

Hybrid models that combine MPI message passing technologies for distributed memory architectures and OpenMP multithreading for shared memory architectures are becoming increasingly popular. Modern high-performance computing systems are multiprocessor systems with shared memory (computing nodes), united in a single communication network. Creating models that effectively use the resources of such computing systems is an urgent task today. The portability of models to multiprocessor systems is largely determined by the correct choice of its software architecture, which would also provide flexibility and ease of maintaining the program code.

The paper considers the system of shallow water equations in the form presented in the ocean general circulation sigma model INMOM (Institute of Numerical Mathematics Ocean Model). The INMOM model is being developed at the INM RAS and is used as a ocean component of the INMCM (Institute of Numerical Mathematics Climate Model) climate model, created at the INM RAS and participating in the IPCC (Intergovernmental Panel on Climate Change) program for climate change forecasting. The model is completely written in the Fortran 90/95 programming language. In previous works, a shallow water model has been formulated that can be used both as a program block of the sigma ocean model INMOM and independently, for example, to calculate tsunami waves, tides, and wind surges. The model was implemented using MPI technology, and the software architecture of the model did not involve the use of hybrid parallel programming.

The aim of this work is to create a new software architecture of the shallow water model, which involves the use of hybrid parallel programming, and to implement a hybrid MPI OpenMP shallow water model based on this software architecture for effective use on massively parallel multiprocessor computing systems.

The model is based on a system of nonlinear shallow water equations that are solved using numerical methods. The second order numerical schemes on the ‘C’-grid according to Arakawa classification and the explicit first order „leapfrog  scheme are used as spatial and temporal discretization schemes in the model, respectively.

A new software architecture for the shallow water model was developed, based on the separation of concerns. This software architecture divides program code into three levels. The lowest level contains all subroutines needed to calculate the nonlinear shallow water equations, the so-called computational cores. The highest level is responsible for the order of calling the computational cores and describes the time cycle of the model at a relatively high level. The intermediate level between the first two is responsible for the parallel methods and approaches used in the model. Such software architectures allow separate parallel methods and approaches from the model in order to adapt them for different types of computing systems and flexibly configure the model for the target computing system, without changing the parts of the program responsible for calculating shallow water equations.

As the main method of parallelization, the model uses an improved two-dimensional method of domain decomposition: the initial computational domain is uniformly divided into rectangular blocks of small size and each processor is assigned a certain number of blocks, which form its computational subdomain. This approach has a number of advantages: it becomes possible to balance the load of calculations on processors and work effectively with cache memory. Due to the presence of islands and coasts in the computational area, the load balancing is a particularly urgent task. The method of load balancing using Hilbert curves is chosen as one of such methods.

In the model, a task-based hybrid MPI/OpenMP approach was implemented, in which blocks from a two-dimensional decomposition are distributed across OpenMP threads. Blocks are first distributed across MPI processes using the load balancing method, and then within the MPI process are distributed across its available threads, ensuring a uniform computational load per thread. The paper shows the advantage of this approach in comparison with the widespread vector-based MPI/OpenMP hybrid approach, in which OpenMP is used only for parallelizing two-dimensional loops over subdomains.

The paper shows the efficiency of decomposition into small blocks and the efficiency of working with cache memory when calculating the shallow water model on a single core. It was shown that the implemented task-based hybrid approach has a performance twice as high as the vector-based hybrid approach when calculating the model on many computing nodes. The advantage of the task¬based hybrid approach over the pure MPI approach was shown. The efficiency of the method of load balancing was also demonstrated in the work.

Note that the implemented software architecture in the shallow water model also makes it possible in the future to apply heterogeneous parallel programming using CUDA, OpenACC, etc. All modifications during the transition to the new model of parallel programming will also be carried out without modifications of the program parts responsible for calculating the shallow water equations, which greatly simplifies the model development and support.

Key words: parallel computing, hybrid parallel programming, shallow water equations, software architecture.

References

  1. Lawrence, B. N. and Rezny, M. and Budich, R. and Bauer, P. and Behrens, J. and Carter, M. And Deconinck, W. and Ford, R. and Maynard, C. and Mullerworth, S. and Osuna, C. and Porter, A. and Serradell, K. and Valcke, S. and Wedi, N. and Wilson, S. Crossing the chasm: how to develop weather and climate models for next generation computers? // Geosci. Model Dev., 2018. N 11, P. 1799-1821.
  2. Volodin E. M., Dianskij N. A., Gusev A. V. Model’ zemnoj sistemy INMCM4: vosproizvedenie i prognoz klimaticheskih izmenenij v 19-21 vekah. // Izvestiya RAN. Fizika atmosfery i okeana, 2013, T. 49, N 4, S. 379-400 (in Russian).
  3. Chaplygin A. V. Parallel’naya realizaciya obshchej modeli cirkulyacii okeana INMOM. // Sbornik tezisov luchshih vypusknyh kvalifikacionnyh rabot fakul’teta VMК MGU 2017. Moskva: MAKS PRESS, 2017. S. 27-28 (in Russian).             '
  4. Chaplygin A. V., Dianskij N. A., Gusev A. V. Parallel’noe modelirovanie nelinejnyh uravnenij melkoj vody. // Trudy 60-j Vserossijskoj nauchnoj konferencii MFTI, 2017 (in Russian).
  5. A.V. Chaplygin, N.A. Diansky, A.V. Gusev. Load Balancing Using Hilbert Space-Filling Curves for Parallel Shallow Water Simulations. // Numerical methods and programming. 2019. N 20. P. 75-87.
  6.  Arakawa A., Lamb V. R. Computational design of the basic dynamical processes of the UCLA general circulation model. // Methods in computational Physics. V. 17.
  7.  Mesinger F., Arakawa A. Numerical methods used in atmospheric models. // JOC, GAPR Publication Series. 1976. V. 1, N 17.
  8.  Rouch Р. Dzh. Vychislitel’naya gidrodinamika [Computational Fluid Dynamics]. Moscow: Mir, 1980 (in Russian).
  9. George L. Mellor. User guide for a three-dimensional, primitive equation, numerical ocean model. Princeton University.
  10.  Dianskij N. A. Modelirovanie cirkulyacii okeana i issledovanie ego reakcii na korotkoperiodnye i dolgoperiodnye atmosfernye vozdejstviya. M.: Fizmalit, 2012 (in Russian).
  11.  Smith R., Jones P., Briegleb B., et al. The Parallel Ocean Program (POP) Reference Manual Ocean Component of the Community Climate System Model (CCSM) and Community Earth System Model (CESM). 23 March 2010.                '               '               '               '
  12. van Werkhoven B., Maassen J., Kliphuis M., Dijkstra H. A., Brunnabend S. E., van Meersbergen M., Seinstra F. J., and Bal H. E. A distributed computing approach to improve the performance of the Parallel Ocean Program. // Geosci. Model Dev., 2014. N 7, P. 267-281.
  13.  Wilhelmsson Tomas. Parallelization of the HIROMB Ocean Model. Licentiate Thesis, Royal Institute of Technology Department of Numerical Analysis and Computer Science, 2002.
  14.  John M. Dennis. Inverse Space-Filling Curve Partitioning of a Global Ocean Model. // IEEE International Parallel and Distributed Processing Symposium, 2007.
  15.  Liu T. et al. Parallel Implementation and Optimization of Regional Ocean Modeling System (ROMS) Based on Sunway SW26010 Many-Core Processor // IEEE Access, 2019. Vol. 7, P. 146170-146182.
  16.  Afzal, A., Ansari, Z., Faizabadi, A. R. et al. Parallelization Strategies for Computational Fluid Dynamics Software: State of the Art Review. // Arch Computat Methods Eng. 2017. N 24, P. 337-363.
  17.  Akhmetova D., lakymchuk R., Ekeberg O., Laure E. Performance study of multithreaded MPI and Openmp tasking in a large scientific code // Proc. 2017 IEEE 31st Int. Parallel Distrib. Process. Symp. Work. IPDPSW 2017; P. 756-765.
  18. Mortikov E. V. Programmnaya realizaciya bloka perenosa primesej v klimaticheskih modelyah na osnove gibridnogo programmirovaniya MPLOpenMP // Superkomp’yuternye dni v Rossii: Trudy mezhdunarodnoj konferencii. 2016. S. 521-529 (in Russian).
  19. Rabenseifner R. and Wellein G. Communication and optimization aspects of parallel programming models on hybrid architectures. // Int. J. High Perform. Comp. Appl. 2003. N 17(1), P. 49-62.
  20. Elliott, S., Sobhani, N., Del Vento, D., Gill, D. O. WRF performance optimization targeting Intel multicore and manycore architectures. //In 2nd Symposium on High Performance Computing for Weather, Water, and Climate. American Meteorological Society: New Orleans, LA, US, 2016.
  21. Shchepetkin Alexander F., McWilliams James C. The regional oceanic modeling system (ROMS): a split-explicit, free-surface, topography-following-coordinate oceanic model // Ocean Modelling. 2005. Vol. 9, Iss. 4, P. 347-404.
  22. Andrew R. Porter, Jeremy Appleyard, Mike Ashworth, Rupert W. Ford, Jason Holt, Hedong Liu and Graham D. Riley. Portable multi- and many-core performance for finite-difference or finite- element codes — application to the free-surface component of NEMO (NEMOLite2D 1.0). // Geosci. Model Dev., 2018. N 11, P. 3447-3464.
  23. NEMO Consortium. NEMO development strategy Version 2: 2018-2022. 2018.
  24. Cluster INM RAS. [Electron. Res.]: http://cluster2.inm.ras.ru.
  25. Supercomputer MVS-10P (JSCC ). [Electron. Res.]: http://www.jscc.ru.

Bibliographic reference: Chaplygin A. V., Gusev A. V. Shallow water model using a hybrid MPI/OpenMP parallel programming //journal “Problems of informatics”. 2021, № 1. P.65-82. DOI: 10.24411/2073-0667-2021-10006

article