1. Kashyap, H., et al. Big data analytics in bioin- formatics: A machine learning perspective. arXiv 1506.05101, 2015.
2. Turner, V., Gantz, J., and Minton, S. The digital universe of opportunities: Rich data and the increasing value of the internet of things. Tech. rep., 2014.
3. Ragunath, P. K., Venkatesan, P., and Ravimohan, R. New curriculum design model for bioinformatics postgraduate program using systems biology approach // Journal of Computer Science & Systems Biology. 2009. N 2. Ð. 300305.
4. Benson, D., Karsch-Mizrachi, I., Lipman, D., et al. GenBank // Nucleic Acids Res. 2000. N 28. Ð. 1518.
5. Burge, S. W., et al. Rfam 11.0: 10 years of RNA families // Nucleic Acids Research 41, D1. 2012. D226D232.
6. Ponten, F., Schwenk, J. M., Asplund, A., and Edqvist, P.-H. D. The human protein atlas as a proteomic resource for biomarker discovery // Journal of Internal Medicine. 2011. 270, 5. Ð. 428446.
7. de Lorenzo, V., et al. The power of synthetic biology for bioproduction, remediation and pollu tion control // EMBO reports. 2018. 19, 4. e45658.
8. Duigou, T., du Lac, M., Carbonell, P., and Faulon, J.-L. RetroRules: a database of reaction rules for engineering biology // Nucleic Acids Research 47, D1. 2018. D1229D1235.
9. Nielsen, J., and Keasling, J. D. Engineering cellular metabolism // Cell 164. 2016. Ð. 11851197.
10. Rose, P. W., et al. The rcsb protein data bank: redesigned web site and web services. Nucleic Acids Research 39. 2011. D392D401.
11. Bourne, P. Will a biological database be deferent from a biological journal? // PLOS Computational Biology. 2005. 1, 3.
12. Singh, S., et al. Comparative modeling study of the 3-d structure of small delta anti-gen protein of hepatitis delta virus // Journal of Computer Science & Systems Biology. 2010. 3. Ð. 14.
13. Hoskeri, J., Krishna, V., and Amruthavalli, C. Functional annotation of conserved hypothetical proteins in rickettsia massiliae mtu5 // Journal of Computer Science & Systems Biology. 2010. 3. Ð. 5052.
14. Altschul, S., Gish, W., Miller, W., et al. Basic Local Alignment Search Tool // Journal of Molecular Biology. 1990. 215. Ð. 403410.
15. Benham, S., et al. Taxus baccata in Europe: Distribution, habitat, usage and threats. Publications Once of the EU: Luxembourg, 2016.
16. Benson, D., Karsch-Mizrachi, I., Lipman, D., et al. GenBank // Nucleic Acids Res. 2000. 28. Ð. 1518.
17. Benson, D., Karsch-Mizrachi, I., Lipman, D., et al. GenBank // Nucleic Acids Res. 2014. 42. Ð. 732.
18. Dalmaris, E., et al. Dataset of targeted metabolite analysis for Five taxanes of hellenic taxus baccata l // Populations. 2020. Data 5, 1.
19. Birney, E., and Clamp, M. Biological database design and implementation // Briengs in Bioinformatics. 2004. 5, 1. Ð. 3138.
20. Bourne, P. Will a biological database be dierent from a biological journal // PLOS Computational Biology. 2005. 1, 3.
21. Bradley, A. R., Rose, A. S., Pavelka, A., et al. An e‑cient le format for the transmission, visualization, and analysis of macromolecular structures // PLOS Computational Biology. 2017. 13. Ð. 116.
22. Duggirala, S. Newsql databases and scalable in-memory analytics. In A Deep Dive into NoSQL Databases: The Use Cases and Applications // P. Raj and G. C. Deka, Eds., vol. 109 of Advances in Computers. Elsevier, 2018. P. 49.
23. Srinivasa, K., and Hiriyannaiah, S. Comparative study of dierent in-memory (no/new) SQL databases. In A Deep Dive into NoSQL Databases: The Use Cases and Applications // P. Raj and G. C. Deka, Eds., vol. 109 of Advances in Computers. Elsevier. 2018. P. 133156.
24. Raj, P. A detailed analysis of NoSQL and NEWSQL databases for bigdata analytics and distributed computing. In A Deep Dive into NoSQL Databases: The Use Cases and Applications // P. Raj and G. C. Deka, Eds., of Advances in Computers. Elsevier, Vol. 109. 2018. P. 148.
25. Microsoft. Microsoft SQL Server. 2017: (RTM) 14.0.1000.169 (X64) Aug 22 2017 17:04:49 Copyright (C) 2017 Microsoft Corporation Express Edition (64-bit) on Windows 10 Home 10.0 [X64] (Build 18362:).
Bibliographic reference: Kalimoldayev M. N., Mazakova A. T., Yashchenko R. V., Mazakov T. Zh., Abdildayeva A. A. Electronic database design for zoological collection of the republic of Kazakhstan //journal “Problems of informatics”. 2021, № 4. P.40-54. DOI: 10.24412/2073-0667-2021-4-40-54
A. Artiukhov
Novosibirsk State University, 630090, Novosibirsk, Russian Federation
ACTIVE KNOWLEDGE BASE PROTOTYPE ON THE BASIS OF COMPUTATIONAL MODELS
DOI: 10.24412/2073-0667-2021-4-55-66
The ever-growing volume of knowledge, for example, in the programming eld, requires a person to increase the speed of knowledge processing, to speed up knowledge mastering, and to use it more e‑ciently. One possible solution is to automate the process of knowledge application. But in many spheres of activity today humanity is accumulating knowledge in an informal way by using linguistic systems. Mastering the reading, understanding and correct use of the knowledge presented in this form, requires a long learning curve. Thus, knowledge accumulated in this form cannot be applied directly and automatically. Moreover, if the person who has mastered it does not use it very often, over time it will begin to be forgotten and subsequently can be completely lost.
In this work, the application of knowledge is considered as creation of a computer program that
utilizes it, and the automation of the knowledge application is understood as the automatic synthesis of such programs.
To solve mentioned problems e‑ciently the system must not only store the knowledge itself, but
also save the functional connections between individual concepts. In addition, it is necessary that the system, according to the task specication, is able to automatically construct a software application that solves it.
The system that allows one to automatically apply knowledge is called the Active Knowledge Base. This system is based on the theory of structural synthesis of programs. Knowledge in such a system is represented in the form of computational models, i.e. bipartite graphs in which the vertices in one set correspond to operations, and in the other - to variables. The edges of the graph determine whether a variable is the result of an operation or its input. Variables in such models correspond to some values of the described subject area, and operations are associated with certain program modules at the execution stage. The input and output parameters of program modules are associated with the variables of the computational model.
One of the key steps in constructing a software application is building a computation plan based
on the specification of the problem. A computation plan is a partially ordered set of operations, where the order relation is consistent with information dependencies and is defined as
"to compute F2, you need to compute F1\.
Within this prototype, knowledge in the form of computational models is stored in a special Sqlite database, and the system itself is divided into 3 parts. The rst part is a subsystem for storing computational models. It adds new computational models to the database and reads them from it.
The second part is a subsystem for constructing computation plans, which is engaged in building
computation plans according to the specification of the problem. The third part is an execution
subsystem. The computation plan is not a complete program that could be executed by the user's
OS. To execute it, one needs a system that interprets the plan. In this work it's done by the execution subsystem.
Computational models are described in special les, where each le is a description of one entity, for example, a variable of a computational model or an operation.
To construct a computation plan, a problem specication should be described rst. It consists of
the name of the computational model, a set of input variables, and a set of output variables. When the specication is completed, the le describing it is fed to the subsystem for constructing computationplans via command line arguments. Having received it, the system launches the construction algorithm, its result is a le with a computation plan.
The general operating principle of the execution subsystem is similar to that of the interpreter.
Operations are executed sequentially, as the variables on which they depend are ready. The execution subsystem supports the initialization of the input variables of the computation plan with the initial values passed to it; for this, it needs to create a special le with initialization parameters and send it to the execution subsystem via command line.
The process of operations execution continues until one of the conditions is met: all operations from the computation plan have already been calculated, or all output variables of the computation plan have been calculated. If in the course of calculations all operations from the calculation plan have been executed, and not all of the output variables have been calculated, then the calculations are completed with a corresponding warning message sent to the user. Otherwise, when the variables have already been calculated, and there are still operations in the computation plan, then the plan execution ends normally, the output variables are saved in accordance with their types, and the user is informed about the result of the computations through the standard output stream.
Results of the development and implementation of the system called Active Knowledge Base are
presented in the paper.
Key words: Active knowledge, Program synthesis, Active knowledge base, Structural synthesis of programs, Automatic programs construction, Knowledge base, Knowledge storage, Computational models.