Volume 3(60)

CONTENTS

  1. Bredikhin S. K, Scherbakova N. G. Scientific journal co-authorship network model      
  2. Gerb A.R., Omarova G.A. Graph partitioning algorithms: literature review           
  3. Snytnikova T. V. Processing-in-Memory: current trends in the development of technology 
  4. Malyshkin V.E., Perepelkin V. A. A Multi-Agent Approach to Improve Execution Efficiency of Fragmented Programs in LuNA System    
  5. Simonov V. S., Khairetdinov M. S. Imperative code conversion method for parallel data processing platforms     

S.V. Bredikhin, N. G. Scherbakova

Institute of Computational Mathematics and Mathematical Geophysics SB RAS, 630090, Novosibirsk, Russia

SCIENTIFIC JOURNAL CO-AUTHORSHIP NETWORK MODEL

DOI: 10.24412/2073-0667-2023-3-5-18
EDN: QADKBW

 

The traditional representation of a complex system (CS) in the form of a network or a graph makes it possible to identify many aspects of the system’s behavior, see [2-4]. The network is based on pairwise relationships between network objects. In this case, the nodes of the co-authorship network correspond to the authors, and the edge between the nodes is constructed if there is at least one publication in which both authors participated, see the fundamental works [5, 6]. Such a model is able to express many complex properties of CS, its advantage lies in its simplicity and the ability to use graph theory in the analysis. The model has been studied quite well in terms of the distribution of degrees of network nodes [5], ranking of authors [6], dynamics of evolution [7, 8], identification of communities [9], and prediction of new co-author relations [10, 11].

However, this CS modeling does not reveal all the information that can be extracted directly from the list of articles and their authors, for example, does not reflect the total number of articles prepared by co-authors. To overcome this shortcoming, several methods for constructing networks of co-authorship, extending the traditional approach, were considered, see [6, 12]. But fixing only binary relations between system objects excludes interaction involving groups of network objects. A possible solution to the problem is to generalize the pair interaction to the interaction of an arbitrary set of nodes. In this case, the networks are called hypernetworks. The main mathematical structures used as a CS model include bipartite graphs, hypergraphs, and simplicial complexes [13].

The concept of a hypergraph [20] was proposed in [21] as applied to the analysis of CSs and providing a computationally feasible tool that can be adapted to many analytical situations. In the study of co-authorship, as a rule, the nodes corresponding to the authors form a hyperedge in the case of a joint publication [22, 23].

This paper presents a co-authorship network model that takes into account group relationships that arise between co-authors. The network is modeled using a hypergraph whose vertices correspond to authors and whose edges correspond to scientific articles (SA). The data we analyze is extracted from the electronic archive (https://www.dia-endojournals.ru/jour/issue/arcliive) of the quarterly scientific and practical medical peer-reviewed journal Diabetes Mellitus (ISSN 2072-0378), published since 1998. The journal is indexed in the international abstract and full-text databases. The hypergraph Hca, built on the basis of the selected archive data, lias size m(Hca) = 991 and order n(Hca) = 1694. Since only those SAs that have two or more authors are considered, there are no loops in the constructed hypergraph. The hypergraph Hca is not connected, it consists of 97 components, of which 68 components have one edge, the maximum component includes about 64.75 % vertices (authors) and 67.4% edges (articles).

The parameters of the hypergraph (and its components) are measured and its topological properties (simple, star, strong star, conformal, Helly) are revealed. The use of the hypergraph language allows to

get an idea of the shape of cooperation in the system under consideration. Most of the authors of the co-authored journal articles are indirectly related to each other due to the presence of joint works. The vertex degree distributions (the maximal component) have a long tail, i.e. most authors have a small number of joint SAs. The same can be said about the distribution of edge degrees, indicating that most SAs have a small number of co-authors. All components, except for the maximal, have one or another considered property. Moreover, the properties of conformity and Helly are common to all components, regardless of size and order. The property of simplicity is more characteristic of components with a small number of edges, the same can be said about the probability of being a star. Two components have all the considered properties.

It should be noted that based on the incidence matrix of a hypergraph, a traditional network of co-authorship can be built, in which two authors are connected if they have at least one joint work. The associated multigraph G(Hca) is one of the models of such a network in which the weight of an edge between two authors is equal to the number of joint works. This work continues the study and approbation of methods for analyzing networks of co-authorship, see [1].

Key words: complex network, hypergraph, co-authorship, archive of scientific articles, bibliometry.

This work was carried out under state contract with ICMMG SB RAS (0251-2021-0005).

 

Bibliographic reference: Bredikhin S. K, Scherbakova N. G. Scientific journal co-authorship network model //journal “Problems of informatics”. 2023, № 3. P.5-18. DOI:10.24412/2073-0667-2023-3-5-18.

article


A.R. Gerb*. G.A. Omarova**

Novosibirsk State University, 630090, Novosibirsk, Russia

Institute of Computational Mathematics and Mathematical Geophysics SB RAS, 630090, Novosibirsk, Russia

GRAPH PARTITIONING ALGORITHMS: LITERATURE REVIEW

DOI: 10.24412/2073-0667-2023-3-19-36
EDN: LXQVVD

This paper is devoted to the analysis of modern graph partitioning algorithms. We observe exact solutions, sequential, iterative, multilevel, streaming and parallel algorithms and note advantages and disadvantages of some algorithms.

Key words: graphs, coarsening, serial and parallel algorithms, optimization.

The work was supported by the Ministry of Science and Higher Education of the Russian Federation (project code 0251-2020-0001).

 

Bibliographic reference: Gerb A.R., Omarova G.A. Graph partitioning algorithms: literature review //journal “Problems of informatics”. 2023, № 3. P.19-36. DOI:10.24412/2073-0667-2023-3-19-36.

article


T.V. Snytnikova

Institute of Computational Mathematics and Mathematical Geophysics SB RAS, 630090, Novosibirsk, Russia

PROCESSING-IN-MEMORY: CURRENT TRENDS IN THE DEVELOPMENT OF TECHNOLOGY

DOI: 10.24412/2073-0667-2023-3-37-54
EDN: FSPSXV

Moving data between the CPU and RAM is a first-order obstacle to improved performance, scalability, and energy efficiency in today’s systems. Computer systems use a number of methods to reduce the overhead associated with data movement, ranging from traditional mechanisms to new methods such as Processing-in-Memory (PIM) computation. These methods can be divided into two large areas: processing-near-memory (PNM), where computations are performed in dedicated processing elements, and processing-using-memory (PUM), where computations are performed inside a memory array by exploiting the internal analog operating properties of the memory device. This paper discusses the paradigm of PIM architectures and provides an overview of PUM architectures based on parallel DRAM operations and associative processors.

Key words: processing-in-memory, associative processors, PDRAM.

Bibliographic reference: Snytnikova T. V. Processing-in-Memory: current trends in the development of technology //journal “Problems of informatics”. 2023, № 3. P.37-54. DOI:10.24412/2073-0667-2023-3-37-54.

article


V. E. Malyshkin, V.A. Perepelkin

Institute of Computational Mathematics and Mathematical Geophysics SB RAS, 630090, Novosibirsk, Russia
Novosibirsk State University, 630090, Novosibirsk, Russia
Novosibirsk State Technical University, 630073, Novosibirsk, Russia

A MULTI-AGENT APPROACH TO IMPROVE EXECUTION EFFICIENCY OF FRAGMENTED PROGRAMS IN LUNA SYSTEM

DOI: 10.24412/2073-0667-2023-3-55-67
EDN: LYEAKS

Development of efficient numerical parallel programs is an complex and laborious problem which impedes application of supercomputers to perform numerical simulations. Parallel programs construction languages and systems are of help by providing to a user a high-level language to describe a desired parallel program and generating the program automatically. The systems do not only reduce complexity of parallel programs development, but also provide static and/or dynamic adaptation of the program execution to particular hardware and execution conditions (e.g. workload balancing). This implies that a high-level parallel program specification may be transformed into a parallel program in diverse ways, many of which may be optimal for particular simulation. Such diversity can either be resolved statically or dynamically, the former causing less run-time overhead, while the latter preserves the ability to dynamically tune parallel program execution. Finding a good trade-off between static and dynamic decision making is a challenging problem in system parallel programming, which also depends on the computational model on which the system is based.

LuNA system [1] is a system for automatic construction of numerical parallel programs, which is being developed in ICMMG SB RAS. It is based on the theory of parallel programs synthesis based on the computational models [2] and follows the active knowledge concept [3]. Its input language LuNA comprises means to describe pieces of data and computations, called data and computational fragments correspondingly (DFs and CFs). DFs are immutable coarse-grained data objects, while CFs are side-effect free sequential procedure calls on particular DFs as input or output arguments. Such dataflow computational model allows the system to automatically execute CFs on a distributed memory machine either by generating a conventional distributed program which performs procedures invocation according to the information dependencies, or by dynamically interpreting the program on a multicomputer. The former approach lacks dynamic flexibility while the latter approach causes significant overhead.

In order to achieve a trade-off between those two options a multi-agent approach is suggested. For that a multi-agent system is defined, where each agent implements a single CF in a distributed environment, which allows an agent to consume and produce DFs and to create new agents. Each agent is controlled by an imperative sequential program in a conventional language (C++ to be specific).

This work was carried out under state contract with ICMMG SB RAS 0251-2022-0005.

The environment system supports agents with basic operations, such as storing a DF, migrating a CF to another computing node, etc. Usage of the C++ language to formulate agent programs is beneficial in sense that it is compiled into a highly efficient machine code using conventional C++ compiler. Such approach allows to make decisions on how to execute LuNA programs both statically (by generating corresponding behavior into agents’ programs) and dynamically (by adding dynamic properties support modules into run-time environment and making agents’ behavior depend on that modules). For example, distribution of agents to computing nodes may either be static (in this case each agent will have a particular computing node to migrate to for execution) or dynamic (in this case each agent will request a dynamic balancing module to assign a node to migrate to).

The proposed approach showed to be much more efficient than a naive distributed interpretation approach which is confirmed by works [4-5]. The fragmented structure of the program is preserved in run-time, which makes it possible to provide dynamic properties of LuNA-program execution. The efficiency achieved in practical cases appeared to be comparable with that of manual low-level programming, which makes LuNA system usable for automating construction of real numerical parallel programs at least in some application domains. The proposed approach takes advantage of well- developed tools for conventional sequential compilation to reduce the run-time overhead.

Key words: Fragmented programming, LuNA system, parallel programs construction automation, high performance computing, multi-agents approach, partial evaluation.

Bibliographic reference: Malyshkin V.E., Perepelkin V. A. A Multi-Agent Approach to Improve Execution Efficiency of Fragmented Programs in LuNA System //journal “Problems of informatics”. 2023, № 3. P.55-67. DOI:10.24412/2073-0667-2023-3-55-67.

article


V. S. Simonov*. M.S. Kllaiгetdinov*,**

* Novosibirsk State Technical University, 630087, Novosibirsk, Russia
"Institute of Computational Mathematics and Mathematical Geophysics SB RAS, 630090, Novosibirsk, Russia

IMPERATIVE CODE CONVERSION METHOD FOR PARALLEL DATA PROCESSING PLATFORMS

DOI: 10.24412/2073-0667-2023-3-68-80
EDN: HGORYY

There are many data processing platforms that allow sequential programs to access parallel processing capabilities. To benefit from the advantages of such platforms, existing code has to be rewritten into domain-specific languages that each platform supports. This transformation, a tedious and error—prone process, also requires developers to choose the right platform that optimizes performance based on a specific workload.

This article describes a formal method, the result of which on imperative code is code suitable for execution in a parallel data processing system, for example, Hadoop, implementing the MapReduce paradigm. Given a sequential code fragment, a method is used to output a high-level summary expressed in our the language of the program specification, which is then compiled for execution in Apache Spark [1]. We demonstrate that the method allows you to convert imperative code into suitable for execution on the Apache Spark platform. Translated results are executed 1.3 times faster on average than sequential implementations, and also scale better for large datasets.

As computing becomes more ubiquitous, storage becomes cheaper, and data collection tools become more sophisticated, more data is being collected today than ever before. Data-driven advances are becoming increasingly common in various scientific fields. Thus, efficient analysis and processing of huge data sets is a huge computational task.

For processing very large data sets, many parallel data processing platforms have been developed [1-5], and new ones continue to be developed [5-7]. Most parallel data processing frameworks come with domain-specific optimizations , which are provided either through the library application programming interface (API) [1-4, 6, 7], or using a high-level domain-specific language: domain-specific language (DSL), so that users can express their calculations [5, 8]. Calculations expressed using such API or DSL calls are more efficient due to the optimization of platforms for a specific domain [8-11].

However, many of the problems associated with this approach often make frameworks related to a specific subject area inaccessible to non-specialists, such as researchers studying physical or social sciences. First, domain-specific optimization for various workloads requires an expert to determine in advance the most appropriate structure for a given piece of code. Secondly, end users often have to learn new APIs or DSLs [1-3, 6, 7, 12] and transform existing code to take advantage of the advantages provided by some platforms. This requires not only considerable time and resources, but is also fraught with errors in the code. Moreover, even users who want to transform their applications must first understand the purpose of the code that could have been written by others, and manually written low-level optimizations in the code often hide high-level intentions. Finally, even after learning new APIs and rewriting code, newly emerging frameworks often turn newly written code into outdated

applications. Users then have to repeat this process in order to keep up with new advances, which requires considerable time, which would be better spent on promoting scientific discoveries.

One way to improve the availability of parallel data processing platforms involves creating compilers that automatically convert applications written in common general-purpose languages (such as C, Java or Python) into high-performance parallel processing applications such as Hadoop or Spark. These compilers allow users to write their applications in familiar general-purpose languages and allow the compiler to reassign parts of their code to high-performance DSL [13-15]. Then applications can use the performance of these specialized frameworks without the additional cost of learning to program individual DSLs. But such compilers do not exist for all cases, and their creation can be very difficult.

Key words: imperative code, parallel data processing.

Bibliographic reference: Simonov V. S., Khairetdinov M. S. Imperative code conversion method for parallel data processing platforms //journal “Problems of informatics”. 2023, № 3. P.68-80. DOI:10.24412/2073-0667-2023-3-68-80.

article