1. Loukachevitch N. et al. SentiRuEval: testing object-oriented sentiment analysis systems in Russian // Proceedings of International Conference Dialog. 2015. P. 3–9.
2. Loukachevitch N., Rubtsova Y. Entity-Oriented Sentiment Analysis of Tweets: Results and Problems // Text, Speech, and Dialogue. Springer International Publishing, 2015. P. 551–559.
3. Chetviorkin I., Braslavskiy P., Loukachevitch N. Sentiment Analysis Track at ROMIP 2011 // Computational Linguistics and Intellectual Technologies: Annual International Conf. Dialogue, CoLing&InTel. N 11 (18). 2012. P. 739–746.
4. Chetviorkin I., Loukachevich N. 2013. Sentiment analysis track at romip 2012 // In Proceedings of International Conference Dialog. V. 2. 2012. P. 40–50.
5. Amigó E. et al. Overview of RepLab 2012: Evaluating Online Reputation Management Systems // CLEF (Online Working Notes/Labs/Workshop). 2012.
6. Amigó E. et al. Overview of replab 2013: Evaluating online reputation monitoring systems // International Conference of the Cross-Language Evaluation Forum for European Languages. Springer Berlin Heidelberg. 2013. P. 333–352.
7. Loukachevitch, N., Rubtsova, Y. SentiRuEval-2016: Overcoming Time Gap and Data Sparsity in Tweet Sentiment Analysis // In Proceedings of International Conference on Computational Linguistics and Intellectual Technologies Dialog-2016. 2016. P. 375–384.
8. Rosenthal, S., Farra, N., & Nakov, P. SemEval-2017 task 4: Sentiment analysis in Twitter // In Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017). 2017. P. 502–518.
9. Pang B., Lee L., Vaithyanathan S. Thumbs up? : sentiment classification using machine learning techniques // Proceedings of the ACL-02 conference on Empirical methods in natural language processing. V. 10. Association for Computational Linguistics, 2002. P. 79–86.
10. Turney P. D. Thumbs up or thumbs down: semantic orientation applied to unsupervised classification of reviews // Proceedings of the 40th annual meeting on association for computational linguistics. Association for Computational Linguistics, 2002. P. 417–424.
11. Wilson T., Wiebe J., Hoffmann P. Recognizing contextual polarity in phrase-level sentiment analysis // Proceedings of the conference on human language technology and empirical methods in natural language processing. Association for Computational Linguistics, 2005. P. 347–354.
12. Jiang L. et al. Target-dependent twitter sentiment classification // Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies. V. 1. Association for Computational Linguistics, 2011. P. 151–160.
13. Lukashevich N., Rubtsova Yu. Obektno-orientirovannyj analiz tvitov po tonal’nosti: rezul’taty i problemy // Trudy Mezhdunarodnoj konferencii DAMDID/RCDL-2015. Obninsk, 2015. S. 499–507.
14. Klekovkina M. V., Kotel’nikov E. V. Metod avtomaticheskoj klassifikacii tekstov po tonal’nosti, osnovannyj na slovare ehmocional’noj leksiki // Trudy konferencii RCDL. 2012. S. 118–123.
15. Read J. Using emoticons to reduce dependency in machine learning techniques for sentiment classification // In Proceedings of ACL-05, 43nd Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 2005.
16. Rubtsova Yu. V. Metod postroeniya i analiza korpusa korotkih tekstov dlya zadachi klassifikaci i otzyvov // EHlektronnye biblioteki: perspektivnye metody i tekhnologii, ehlektronnye kollekcii: Trudy XV Vserossijskoj nauchnoj konferencii RCDL’2013, Yaroslavl’, Rossiya, 14–17 oktyabrya 2013 g. Yaroslavl’: YarGU, 2013. S. 269–275.
17. Rubtsova Yu. V. Razrabotka i issledovanie predmetnonezavisimogo klassifikatora tekstov po tonal’nosti // Trudy SPIIRAN. 2014. T. 5. N 36. S. 59–77.
18. RubtsovaYu. V. Avtomaticheskoe postroenie i analiz korpusa korotkih tekstov (postov mikroblogov) dlya zadachi razrabotki i trenirovki tonovogo klassifikatora // Inzheneriyaznanij i tekhnologii semanticheskogo veba. 2012. T. 1. S. 109–116.
19. Rubtsova Y. Reducing the Degradation of Sentiment Analysis for Text Collections Spread over a Period of Time // International Conference on Knowledge Engineering and the Semantic Web. Springer, Cham, 2017. P. 3–13.
20. Rubtsova Y. Preodolenie degradacii rezul’tatov klassifikacii tekstov po tonal’nosti v kollekciyah, raznesennyh vo vremeni // Sistemnaya informatika. 2016. S. 45–68.
21. Titov, I. Modeling Online Reviews with Multi-grain Topic Models // Proceedings of the 17th International Conference on World Wide Web (WWW‘08). 2008. P. 111–120.
22. Levy, O. Improving Distributional Similarity with Lessons Learned from Word Em-beddings // Transactions of the Association for Computational Linguistics. 2015. P. 211–225.
23. Mikolov, T., Chen, K., Corrado, G., & Dean, J. Efficient estimation of word representations in vector space. arXiv preprint arXiv: 1301.3781. 2013.
24. Mikolov T., Sutskever I., Chen K., Corrado G., and Dean J. Distributed Representations of Words and Phrases and their Compositionality // In Proceedings of NIPS, 2013. P. 3111–3119.
Bibliographic reference: Rubtsova Y. V.; Neural network model for overcoming time gap of sentiment classification //journal “Problems of informatics”. 2018, № 2. P. 4-14.
Article
Razakova M. G.
JSC National center of space researches and technologies, 050010, Almaty, Kazakhstan
RADAR REMOTE SENSING METHODS OF FOREST COVER
UDC 550.388.2
By way of analysis of a group of the radar-imagery statistical information, we determine an optimal level of filtration for automatic allocation of main objects of the observed surface. This analysis has been made based on the satellite data TerraSAR-X in both parallel and cross-polarization modes (VV, VH). Some pre-radiometric calibration values for the magnitudes are required since the radar imagery is performed under an angle. For this, we need to extract the value of sigma naught (radiometric calibration). To eliminate the high-frequency noise could be used many different filters; a simplest one is the average filter. To determine the degree of averaging, it is possible to calculate the analytical form of relationship between the size of the signature smoothing and residual values of magnitudes of radar data. The radar ability to detect texture is a major advantage over other types of imagery where the texture is not a quantitative characteristic. Filtration methods can distinguish an object component of the signal.
There are three main groups of image processing algorithms on computers: (a) initial (pre-) image processing algorithms for restoration, cleaning from random noise, improve the quality, correction of geometric distortion of radar systems; (b) thematic image processing and pattern recognition algorithms; they are performed to determine the parameters of image detail and include: finding the homogeneity of the image in terms of light and color areas, extract the feature forms, identify the coordinates of the singular points of objects and so on; (c) algorithms of the target isolation of specific objects by binarization methods of the image; in accordance with the predetermined threshold, the values of separate elements will be identified, which are suitable under the conditions of the task we discuss.
In the work, we show that by regular applying filtration methods to radar data, the area covered by vegetation has the maximum difference in the cross polarization image in comparison with that of classified forest area.
Key words: radar satellite data, forest classification.
Bibliographic reference: Razakova M.G. Radar remote sensing methods of forest cover //journal “Problems of informatics”. 2018, № 2. P. 15-23.
Article
Bredikhin S. V., Lyapunov V. M., Shcherbakova N. G.
Institute of Computational Mathematics and Mathematical Geophysics SB RAS, 630090, Novosibirsk, Russia
SPECTRAL ANALYSIS OF THE JOURNAL CITATION NETWORK
UDC 001.12+303.2
In this paper we investigate methods of spectral clustering for analysis of the journal citation networks. Clustering problem is reduced to min-cut graph partitioning: to find a partition of the graph such that the edges between different groups have very low weights and the edges within a group have high weights. That means that objects in different clusters are dissimilar from each other and objects within the same cluster are similar to each other, see C. J. Alpert, S.-Z. Yao (1995). Graph partitioning problems can be solved exactly in polynomial time, so for practical applications approximate solution methods have been developed. One of the widely used is the spectral partitioning method. The spectral methods usually involve taking the eigenvectors of some matrix based on relations between data elements. Most spectral clustering algorithms cluster the data with the help of eigenvectors of graph Laplacian matrices.
We study two major versions of spectral clustering, so called “unnormalized” and “normalized” spectral clustering that reveal the relationship of the object function formulation and the matrix used in the eigenvalue equation. Unnormalized spectral bi-clustering algorithms use the Laplacian matrix L=D-A for solving the problem Lv=lv and assigning vertices to clusters according to the signs of elements of the eigenvector v corresponding to the second smallest eigenvalue. The simplified versions of the unnormalized spectral bi-clustering method is presented as the techniques of the consistency confirmation of the approach. As shown in M. E. J. Newman, M. Girvan (2004) this class of spectral clustering is only consistent under strong additional assumptions, which are not always satisfied in real data. Most of normalized spectral bi-clustering algorithms use the symmetric normalized Laplacian matrix for these purposes, see J. Shi, J. Malik (2000). As shown in M. Meila, J. Shi (2001) the same results can be obtained by using the largest eigenvector of the matrix . Spectral k-way clustering uses not only the second but also the next few eigenvectors to construct a partition.
The journal citation network on study is built on the basis of the bibliographic information extracted from the DB RePEc. The main component of the corresponding weighted digraph G has 1729 vertices (journals) and 135702 arcs (citations).We analyze the work of two spectral clustering algorithms in the context of three versions of transformation of digraph G to an undirected form. So, we examine the graphs represented by matrices (graph ), (graph ) and (graph ), where A is the journal-journal citation matrix. Algorithm WTR P. Pons, M. Latapy (2005) is the agglomerative algorithm based on random walk matrix . Algorithm LEV M. E. J. Newman (2006) is the bi-clustering algorithm based on the modularity matrix. The algorithms are implemented with use of the igraph packet (C library). We use indexes as the measures of similarity of two data clusterings. For clustering the similarity is low, as an example . The most similarity is reached for graph . WTR clusters of small size (less than 200) can be interpreted in terms of thematic fields. The results are presented in the tables (1–6). We can see that results strongly depend on the digraph transformation and the algorithm used.
Key words: journal citation network, co-citation network, bibliographic coupling network, weighted directed graph, graph partitioning, spectral clustering.
Bibliographic reference: Bredikhin S.V., Lyapunov V.M., Shcherbakova N.G. Spectral analysis of the journal citation network //journal “Problems of informatics”. 2018, № 2. P. 24-40.
Article
Blagodarniy A. I.
Institute of Computational Technologies of SB RAS, 630090, Novosibirsk, Russia
SOFTWARE TOOLS FOR BUILDING AUTOMATED CONTROL SYSTEMS IN THE ENVIROMENT OF THE DOMESTIC OPERATING SYSTEM
UDC 004.9
In the article Software tools for building automated control systems in the environment of the domestic operating system the construction, schemes of components interrelation and the tool nucleus of the SCADA-system on the platform of the Russian network real time operational system Neutrino KPDA.10964-01, which is being developed and accompanied by the Russian Ltd Company SVD Installed systems from St. Petersburg. Using the Russian operational system answers the task of import substitution of the programming provision, the importance of which is increasing because of geopolitical risks and the growing pressure of economic sanctions from unfriendly states.
The product being described in this article is the result of developing the SCADA-system BLACART, constructed at the ICT SB RAS and functioning in the environment of the operational system QNX 4.25. The denominated SCADA-system displayed high performance in dozens of realized systems of control in different branches of industry, as a rule, in dangerous production. The certificate of conformity and the resolution for its application at the mining enterprises were received.
As a result of relocating SCADA-system Blacart onto the base of the Neutrino operational system the structure of the SCADA-system hasn’t been changed. It was only the programming key (code) which was changed, particularly, all applications of different system libraries of subprograms were carried out in POSIX-interoperable format. Having preserved the structure as well as the main characteristics and functional capabilities of the SCADA-system in the process of relocating it into the milieu of the Neutrino system, it also became possible to keep up all of the previous product values that prove to have been efficient in the course of realized long-term control systems practice.
A newly-developed SCADA-system has been realized as a distributed computing technological network, which is simultaneously a local computing network on the basis of the network protocol Qnet of the operational system Neutrino.
The software of the SCADA-system is a hierarchical association of two subsystems: the subsystem of the upper level and the one of the lower level, which has been put into operation at one or another unit(s) of the technological network. The subsystem of the upper level is the automatic work position of the operator, which includes the graphical interface of the operator and the operational and archive databases control system. The subsystem of the lower level realizes the interface with the monitoring and technological equipment control.
The elaboration or modification of a specific project of the automated control system over technological processes on the basis of newly-developed SCADA-system is reduced to the construction of the graphical interface of the operator with the help of the system application builder (graphics editor) Application Builder Photon and to compiling a set of text configuration files of the upper and lower level subsystems.
All automated work positions of the operator are fully equal in status display operations and technological equipment control, their local database being synchronized between themselves. The absence of the dedicated data server in the technological network is the first and major peculiarity of the SCADA-system software. Such an approach realizes the principle of multiple hot backup in automated control systems.
The described SCADA-system has been constructed and elaborated as a real time SCADA-system and is intended for automatizing technological processes, which put tough demands on the time of control system reaction to one or another event. A guaranteed little time of reacting to any single event is one more major peculiarity of the SCADA-system.
The safeguard of the newly developed SCADA-system from cyber-attacks can only be reached by tough monitoring of users’ access to management in accordance with the categories as well as by blocking all of the functions of using the operational system (by the operators). The characteristics of the operational system Neutrino itself also make safety possible. As for the vulnerability of the programming provision, it can be easily detected and neutralized as the programming key of the SCADA-system is open.
Key words: SCADA, import substitution of software, QNX, Neutrino.
Bibliographic reference: Blagodarniy A.I. Software tools for building automated control systems in the enviroment of the domestic operating system //journal “Problems of informatics”. 2018, № 2. P. 41-51.
Article
Kulikov I. M. , Chernykh I. G.
Institute of Computational Mathematics and Mathematical Geophysics SB RAS, 630090, Novosibirsk, Russia
gooPhi: A NEW CODE FOR NUMERICAL MODELING OF ASTROPHYSICAL FLOWS ON INTEL XEON PHI SUPERCOMPUTERS
UDC 519.6, 524.3
In this paper, a new hydrodynamics code called gooPhi to simulate astrophysical flows on modern Intel Xeon Phi processors with KNL architecture is presented. In this paper, an astrophysical phenomenon a jellyfish galaxy formation was considered. It is known, that the main scenarios of formation these objects are based on the ram-pressure mechanism of intergalactic gas or based on the galactic wind by means active galaxy nuclei. However, the ram-pressure mechanism can be obtained as a result of collision of galaxies with different masses. This scenario was investigated in the present work using the developed code. A new vector numerical method implemented in the form of a program code for massively parallel architectures is proposed. For the numerical solution of hydrodynamic equations, the modification of the original numerical method based on a combination of the operator splitting method, Godunov method and HLL solver was used. This method combines all advantages of the above methods and has a high degree of parallelism. In the base of parallel implementation is a multi-level decomposition of computing. At the first level, geometric decomposition of the computational domain by means MPI library was used. At the second level, there is a decomposition of computing between the Intel Xeon Phi accelerator threads by means OpenMP library. In everyone thread, vectorization of computing is carried out by means of AVX512. It should be noted, that the construction of the numerical method allows all kinds of decomposition. The results of the verification of numerical method on three tests of Godunov and on the Sedov blast wave test are presented. The purpose of the first test is the correctness of the contact discontinuity description. Most methods for solving hydrodynamics equations yield either oscillation or diffusion of shock waves. The author’s method gives the diffusion of the shock wave, while at the same time correctly reproduces the location of the shock wave, contact discontinuity and the waveform of the rarefaction wave. In the second test, a gas with the same thermodynamic parameters expands in different directions, forming a rarified region in the center. The test reveals an ability to physically believable simulate such a situation. It is known from the literature that many methods give an erroneous (unphysical) temperature jump in the region of strong rarefaction, and as a result, the resulting solution is distorted. The author’s method successfully simulates the rarefaction region. The main idea of the third test is to check the stability of the numerical method. A big pressure drop (5 decimal orders) should reveal the ability of the method to stably model strong perturbations with the emergence of rapidly propagating shock waves. The author’s method successfully simulates a strong wave. Sedov blast wave test is a standard test that verifies the ability of a method and its realization to reproduce strong shock waves with large Mach numbers. The author’s numerical method reproduces quite well the position of the shock wave, as well as the density profile. A detailed description is given, and a parallel implementation of the code is made. A performance of 173 gigaflops and 48 speedup are obtained on a single Intel Xeon Phi processor. A 97 per cent scalability is reached with 16 processors. In this paper, we considered the scenario of the formation of galaxies like a jellyfish on the basis of the collision of two dwarf galaxies dSph, which differ by an order in mass. We also considered the chemical processes taking place in the tail of galaxies by means of the complete system of chemical reactions and a shortened version that allow construct an analytical solution. It is worth noting that the asymptotics of these solutions has one nature. Behind the front of a massive galaxy, a tail is formed, in which the development of the Kelvin-Helmholtz instability develops an analog of the turbulent flow, due to which the tail is fragmented into tentacles observed in the jellyfish galaxies. For characteristic temperature values, as well as the characteristic concentration of atomic neutral hydrogen in tentacles, the behavior of the concentration of various forms of hydrogen was modeled by means of the ChemPAK code, which in its overwhelming part was ionized and the molecular one was several thousandths percent. It is obvious that the process of formation of molecular hydrogen plays a smaller role than the processes leading to the ionization of hydrogen. In this connection, an analytic solution of the ionization process is of main interest.
Key words: Numerical modeling, computational astrophysics, Intel Xeon Phi.
Bibliographic reference: Kulikov I.M., Chernykh I.G. gooPHI: a new code for numerical modeling of astrophysical flows on Intel Xeon phi supercomputers //journal “Problems of informatics”. 2018, № 2. P. 52-74.
Article