2020 № 1(46)

Contents

  1. Bredikhin S.V., Lyapunov V.M., Shcherbakova N.,G. Dynamics of the citation network of scientific articles
  2. Sokolova O. D., Materukhin A.V. Analytical review of modern information technologies for collection, processing and analysis of data applied for monitoring atmospheric pollution air
  3. Tarkhanova O. Y., Shakhov V. V. On Performance Evaluation of Wireless Sensor Networks
  4. Perepelkin V. A. LuNA System for Automatic Construction of Numerical Parallel Programs for Multicomputers

S. V. Bredikhin, V. M. Lyapunov, N. G. Shcherbakova

Institute of Computational Mathematics and Mathematical Geophysics SB RAS,630090, Novosibirsk, Russia

DYNAMICS OF THE CITATION NETWORK OF SCIENTIFIC ARTICLES

UDK 001.12+303.2
DOI: 10.24411/2073-0667-2020-10001

Many complex networks are scale-free, namely their degree distribution follows a power law for large k. The graphs corresponding to the citation networks (CN) of scientific articles are included in this set. The vertices of citation network correspond to scientific articles, and directed edges to citations. Almost each new article contains some number of references (citations) to previously published ones. The number of references to a cited vertex is its in-degree. The appearance of new connects between old vertices is impossible.

The question is how growing networks self-organize into a scale-free structure. Simon (1955) assumes that the principle of "having much gets more” has effect. The study of this mechanism as applied to CN was performed in Price (1976). D. Price called the strategy, in which success breeds success, a cumulative advantage. For CN, the citation strategy is formulated as follows: the speed with which articles receive new citations is proportional to the citations already received.

Thanks to a series of works, the beginning of which was laid by the work Barabási, Albert (1999), the mechanism of cumulative advantage was called the preferred attachment, hereafter PA-mechanism. In Barabasi-Albert model the probability Π that a new node connects to a node i depends on the degree ki of i as

Πki= kijkj.

A generalization of the PA-mechanism was presented in Krapivisky, et al (2000) as

Πki= kiαjkjα.=Ctkiα,

C(t) is a normalization constant.

Dorogovtsev and Mendes (2000) proposed that in some real networks the probability Π also depends on an age of node i, decaying as (t – ti)-v, where ti is a timestep a node i was added to the network, t – a timestep a new node is added, ν is a tunable parameter. So,

Πki, tikifti,

where f(ti) is an aging function.

In real networks Π(0) ≠ 0, which means there is a nonzero probability that a new node attaches to a node i, such that ki = 0. So Dorogovtsev , et al. (2000) presented the model of a directed network such that

Πkiin= kiin+k0j(kjin+a) ,

where kiin is the number of incoming links, k0 is the initial attractiveness of the node i.

So the likelihood that a newly added node will join node i after a time t has elapsed after the addition of node i to the directed network may be proportional to the following characteristics:

Πkiin,t∝(kiin+ k0)fit.

The aim of this work is to study the dynamical properties of the citation network of scientific articles based on data provided by the bibliographic database RePEc. We find that the distribution of incoming links follows the power law with parameters γ = 2,89, xmin = 207. So we want to prove the preferential attachment hypothesis that in this case states that the rate Π(kin) with which a node with k incoming links acquires new links is a monotonically increasing function of kin, hereinafter we use the denotation k instead of kin. To investigate the attachment mechanisms separately from aging we use the method presented in Jeong, et al. (2003). To avoid the influence of C(t) and an aging effect we study the attachment kernel Ak within a relatively short window time ΔT. The functional form of Ak can be determined by measuring how many citations an article with k citation collected during some previous period T0 receives within period ΔT. We plotted the function of attachment rate using different previous periods and found the linear fit. The results of approximation with a linear functions show that Ak has a form k + k0, where k0 can be considered as the initial attractiveness. It should be noted that we have different estimations of Ak depending on T0.

When investigating an aging process we ignore the preferential attachment and explore age characteristics of nodes with the same degree. We study two distributions, S(t) and D(t). S(t) is the distribution of ages of citation from citing article to cited articles. D(t) is the distribution of ages to a cited article from citing articles. The numerical calculations show that S(t) increases during 2-3 years and then decays exponentially during approximately 25 years (irrespective of the selected t0), then decreases slower. So it can be assumed that the aging function f(t) = (exp αt), α < 0. The distribution D(t) shows the linear growth for 20 years and then the exponential decay.

Our experiment shows that the citation process can be described by the linear preferential attachment, but given the process of aging is consistent with the attachment kernel only for the first 20 years after publication. It should be noted that fluctuations in the distributions and the dependence on selected time windows are observed which may be due to incomplete data.

Key words: citation network, power law, preferential attachment, initial attractiveness, aging.

 

Bibliographic reference: Bredikhin S.V., Lyapunov V.M., Shcherbakova N.,G. Dynamics of the citation network of scientific articles //journal “Problems of informatics”. 2020, № 1. P. 5-20. DOI: 10.24411/2073-0667-2020-10001

article


O.,D. Sokolova, A.,V. Materukhin*

Institute of Computational Mathematics and Mathematical Geophysics SB RAS,630090, Novosibirsk, Russia
*Moscow State University of Geodesy and Cartography,105064, Moscow, Russia

ANALYTICAL REVIEW OF MODERN INFORMATION TECHNOLOGIES FOR COLLECTION, PROCESSING AND ANALYSIS OF DATA APPLIED FOR MONITORING ATMOSPHERIC POLLUTION AIR

UDK 528
DOI: 10.24411/2073-0667-2020-10002

 

The article is an analytical review of scientific publications on the topic of modern information technologies in the field of data collection, processing and analysis for monitoring air pollution in large cities. Modern large cities are the source of most emissions of harmful substances into the atmosphere. To monitor atmospheric pollution parameters, as a rule, either data obtained from few and fairly far from each other regular environmental monitoring stations are used, or data obtained as a result of expensive short-term field studies. This is not enough to obtain adequate information about the spatial distribution of pollutants or to identify sources of pollution. To monitor atmospheric pollution parameters with high spatial and temporal resolution are required.

Over the past two decades, progress has been made in the development of small-sized sensor devices with the added ability to determine their location, the so-called geosensors. The use of wireless geosensor networks for monitoring air quality is considered in many scientific publications. The aim of the review was to analyze modern scientific publications in this field of research. The problems of research are related to the tasks of managing of spatio-temporal data streams. The review presents articles that describe mathematical models and optimization algorithms for data collection processes, as well as their processing. Today, the scientific community has already reached a consensus about which information technologies are most preferable at this level-- data mining methods are the most promising tool for analyzing and predicting air pollution. The level of spatial and temporal granularity of data necessary for the application of these methods can be achieved using geosensor networks.

The main problems that arise during the collection and transmission of data in a wireless geosensor network are problems of ensuring energy efficiency, speed and accuracy of transmission.  For the effective functioning of monitoring networks, it is necessary to develop adequate mathematical models and methods that will optimize the data collection process. The review analyzes various approaches to solving this problem (for example, the use of mobile sinks or clustering of many geosensors). The review includes publications in which it is proposed to improve monitoring through the use of mobile nodes. The possibility of using Vehicle ad-hoc Sensor Networks (VANET) for monitoring air quality has been offered quite often in recent years-- sensors can be installed on cars that move in the transport network of a city. Thanks to mobility, the sensor node can take measurements in different places, thereby reducing the requirements for a large number of sensor nodes for monitoring the territory.

The bottleneck of most research projects in this area is the lack of the possibility of large-scale deployment of the proposed solutions. Therefore, the review focuses on the use of simulation to verify the effectiveness of various technological solutions. Some systems for modeling data streams from geosensors (for example, the MobGeoSensSimulation system), which measure air quality parameters over a large urban area and send data to a processing center, are analyzed.

The analysis of scientific publications carried out in the article suggests that the main trend in research in the field of monitoring atmospheric air pollution is the development of technological solutions associated with the use of wireless geosensor networks, and the current trend is the transition from stationary to mobile geosensors. Such a transition will allow the use of data mining methods with greater efficiency. Various solutions described in the modern scientific literature and intended for application at various levels of the atmospheric monitoring system are analyzed: at the level of data collection, at the level of data processing and at the level of data analysis. Another identified research trend is the development and study of computer modeling systems for monitoring systems.

Key words: monitoring of air pollution, wireless geo-sensor networks.

 

Bibliographic reference:  Sokolova O.,D., Materukhin A.,V. Analytical review of modern information technologies for collection, processing and analysis of data applied for monitoring atmospheric pollution air //journal “Problems of informatics”. 2020, № 1. P. 21-34. DOI: 10.24411/2073-0667-2020-10002

article


O.,Y. Tarkhanova, V.,V. Shakhov

Institute of Cytology and Genetics SB RAS,630090, Novosibirsk, Russia
* Institute of Computational Mathematics and Mathematical Geophysics SB RAS, Novosibirsk State Technical University

ON PERFORMANCE EVALUATION OF WIRELESS SENSOR NETWORKS RENEWCOMMAND

 

UDK 004.942

DOI: 10.24411/2073-0667-2020-10003

This work was supported by the ICM&MG base project No 0315-2016-0006

Nowadays, Internet of Things (IoT) has received increased attention in many countries. IoT and Big Data technologies have a tremendous impact on all fields of human life. In the Russian Federation, a roadmap for the IoT development is presented by the Ministry of Industry and Trade. Issues of development and implementation of IoT technologies are in the focus of attention of senior government authorities. According to Cisco Systems, this year 50 billion devices will be connected to the Internet. McKinsey (an international consulting company specializing in strategic management advice) estimates the economic impact of IoT development at USD 11 trillion by 2025. Note that there are several standardization projects and initiatives in sensor networks that can ultimately converge with the Internet of things. For example, this concerns proposals developed as part of the projects of the 7th EU Framework Programme for the development of scientific research and technology (IoT-A). Other appropriate projects include COBIS, SENSEI, SARIF. The international non-profit organization Open Geospatial Consortium has developed and supported a set of standards for requesting, transmitting and processing sensor data via the Web. To standardize the inclusion of sensors in the IoT architecture, a number of initiatives have been proposed, for example, IEEE 1451, ISO / ICE 24753. Great progress in the field of micro-electro-mechanical technologies has enabled the development of inexpensive sensor nodes that communicate wirelessly. A wireless sensor network is a self-organizing group of spatially distributed sensors for monitoring environmental conditions and delivering the collected data in a decision center. Wireless sensor networks (WSNs) can be used in a wide range of applications, such as precision agriculture, air pollution monitoring, smart home etc. However, a sensor has to be compact and inexpensive. This is a stringent demand of market. A consequence of the low cost of the sensors is a significant limitation of their resources (battery capacity, computing capabilities, memory etc.). This restriction significantly affects the performance and reliability of both individual sensors and the network as a whole, which contradicts the requirements for technologies used in critical areas of human life. Thus, efficient tools for assessing the effectiveness of WSNs are required. In this paper we partially fill the mentioned gap.

We develop a theoretical approach for modeling and analyzing DoB attacks. The proposed methods are based on continuous time Markov chains. The theory of Markov process is generally used when the various quality metrics of WSNs are calculated. In previous papers the specifics of the network topology are often not taken into account. On the other hand, in a number of studies, focus has been placed on optimizing the network topology, while the characteristics of its elements are assumed to be given (e.g. the probabilities of nodes availability and their dynamic). An existent approach combining the accounting of the network topology, the details of the operation of its nodes, the quality of the channels is described as a general idea. Here, we apply this one for a specific application. Also, we investigate a situation when the optimal network topology does not exist in the general case. The best topology is determined by the parameters of the environment and duty, and also depends on time. The corresponding example has been provided. A method for WSNs performance analysis based on temporary colored Petri nets has been proposed. To verify the approach we use a typical LoRaWAN topology, for example the similar topology has been used in digital farming, Edge computing applications, etc. The results of performance analysis have been visualized and presented.

Key words: Wireless Sensor Networks, Performance Analysis, Markov chain, Petri Nets.

 
Bibliographic reference:  Tarkhanova O.,Y., Shakhov V.,V. On Performance Evaluation of Wireless Sensor Networks //journal “Problems of informatics”. 2020, № 1. P. 35-65. DOI: 10.24411/2073-0667-2020-10003
 

Institute of Computational Mathematics and Mathematical Geophysics SB RAS,630090, Novosibirsk, Russia
 
LUNA SYSTEM FOR AUTOMATIC CONSTRUCTION OFNUMERICAL PARALLEL PROGRAMS FOR MULTICOMPUTERS
 
UDK 004.4'242
DOI: 10.24411/2073-0667-2020-10004
 
Usage of supercomputers for numerical simulations implies the complexity of distributed parallel programs development. Non-functional requirements for such programs include efficiency, memory and network bandwidth economy, tuning to available resources, fault tolerance and check pointing support, dynamic workload balancing, etc. To satisfy these requirements one has to concern peculiarities of application algorithm, hardware configuration and data. The development requires skills and knowledge supercomputer users are unlikely to have. To reduce the complexity of parallel programs development, debugging and modification diverse systems and tools of programs construction automation exist and evolve. Such tools accept a high level description of an application algorithm, as well as hardware configuration description, to produce and execute a parallel program, which implements the algorithm and fits into the non-functional requirements. Thus some of the programmer's burden is eliminated. Also such systems and tools are necessary to implement an active knowledge technology, where application algorithm is synthesized automatically based on application problem specification and has then to be automatically executed.
Automatic parallel programs construction system has to employ a high-level description of an application algorithm. The algorithm representation should be independent from hardware configuration and allow tuning of algorithm execution to given hardware and data. The representation should comprise independently computable parts which system should be able to dynamically map to given resources.
According to such demands the fragmented programming technology (FPT) is being developed in ICMMG SB RAS, as well as LuNA system for automatic numerical parallel programs construction (LuNA stands for Language for Numerical Algorithms). In the FPT computations are represented as the fragmented algorithm (FA) -- a countable set of computational tasks called computational fragments (CFs). A CF is defined by two finite sets of input and output arguments -- immutable data objects called data fragments (DFs) -- and an operation on the DFs, which computes values of output DFs from values of input DFs. Execution of FA is the execution of all the CFs once their input DFs are computed. Operations are side-effect-free sequential subroutines, thus a CF can be executed on any computing node dynamically, provided its input DFs' values are transferred to the node. The job of the programming system is to dynamically map DFs and CFs to computing nodes, provide input DFs transfer to CFs and execute CFs to produce new DFs. During FA execution the system redistributes CFs and DFs in order to balance workload, employs replication to provide fault tolerance, saves DFs values for check pointing or performs other jobs to provide necessary non-functional properties of the program execution.
In order to improve non-functional properties of FA execution supplementary information called recommendations can be provided by the user. Recommendations are means to overcome the fundamental difficulty of efficient execution of an application algorithm, represented in a high-level language, such as LuNA. Using recommendations the user can explicitly declare properties of the algorithm, which are hard to obtain automatically, but can significantly affect execution (e.g. estimated operations execution time). Also the user can employ recommendations to express his insight on how the computations should be performed in order to achieve high efficiency (instead of programming it in a low-level language, such as C+MPI). The system will follow the recommendations in cases, where good efficiency is hard to achieve automatically.
An experiment of automatic CFs mapping to resources using profiling is concerned. An application (matrices multiplication test) was run multiple times in the profiling mode. After each run the CFs mapping to computing nodes was adjusted to avoid load imbalances, which occurred in the previous execution. After few executions the execution time of the program decreased to the time of an efficient hand-made implementation of the same algorithm. This shows the possibility to automatically tune a FA execution to hardware configuration.
Another experiment shows automatic provision of dynamic load balancing on the example application of self-gravitating dust cloud simulation using Particle-in-Cell method. The application is an example of a problem, where workload distribution cannot be predicted statically and depends on input data. Necessary dynamic workload balancing was provided by LuNA system automatically based on the Rope of Beads algorithm, which preserves distributed data structure when balancing.
A number of other models and real-life applications were developed using the LuNA system. The efficiency of the constructed program is generally lower, than the one of hand-coded implementation, but reduction of labourness of program development, debugging and modification is often worth it. In future it is planned to implement other system algorithms and heuristics to improve the quality of automatically constructed programs, and to use LuNA system as a basis for active knowledge system construction.
Key words: Fragmented programming technology, automatic parallel programs construction, high performance computing, LuNA system.

 

Bibliographic reference:  Perepelkin V.,A. LuNA System for Automatic Construction of Numerical Parallel Programs for Multicomputers //journal “Problems of informatics”. 2020, № 1. P. 66-74. DOI: 10.24411/2073-0667-2020-10004

article