Engineering Journal of Don

Analyzing the main methods of predictive analytics
- Bukharova K.A.
- Ermakov S.G.
- Abstract
- pdf (rus)
Predictive analytics is one of the most important areas of data analysis, which allows predicting future events based on historical data. The relevance of predictive analytics in the modern world is due to the rapid development of technology, the growth of data volumes and the growing need for informed management decision-making. The article discusses the main approaches such as regression models, time series, decision trees, clustering methods and neural networks, as well as their advantages and disadvantages.

Keywords: predictive analytics, regression models, time series, decision trees, neural networks, clustering, big data, predictive analytics methods, big data analysis, forecasting
Moving from a university data warehouse to a lake: models and methods of big data processing
- Abstract
- pdf (rus)
The article examines the transition of universities from data warehouses to data lakes, revealing their potential in processing big data. The introduction highlights the main differences between storage and lakes, focusing on the difference in the philosophy of data management. Data warehouses are often used for structured data with relational architecture, while data lakes store data in its raw form, supporting flexibility and scalability. The section ""Data Sources used by the University"" describes how universities manage data collected from various departments, including ERP systems and cloud databases. The discussion of data lakes and data warehouses highlights their key differences in data processing and management methods, advantages and disadvantages. The article examines in detail the problems and challenges of the transition to data lakes, including security, scale and implementation costs. Architectural models of data lakes such as ""Raw Data Lake"" and ""Data Lakehouse"" are presented, describing various approaches to managing the data lifecycle and business goals. Big data processing methods in lakes cover the use of the Apache Hadoop platform and current storage formats. Processing technologies are described, including the use of Apache Spark and machine learning tools. Practical examples of data processing and the application of machine learning with the coordination of work through Spark are proposed. In conclusion, the relevance of the transition to data lakes for universities is emphasized, security and management challenges are emphasized, and the use of cloud technologies is recommended to reduce costs and increase productivity in data management. The article examines the transition of universities from data warehouses to data lakes, revealing their potential in processing big data. The introduction highlights the main differences between storage and lakes, focusing on the difference in the philosophy of data management. Data warehouses are often used for structured data with relational architecture, while data lakes store data in its raw form, supporting flexibility and scalability. The section ""Data Sources used by the University"" describes how universities manage data collected from various departments, including ERP systems and cloud databases. The discussion of data lakes and data warehouses highlights their key differences in data processing and management methods, advantages and disadvantages. The article examines in detail the problems and challenges of the transition to data lakes, including security, scale and implementation costs. Architectural models of data lakes such as ""Raw Data Lake"" and ""Data Lakehouse"" are presented, describing various approaches to managing the data lifecycle and business goals. Big data processing methods in lakes cover the use of the Apache Hadoop platform and current storage formats. Processing technologies are described, including the use of Apache Spark and machine learning tools. Practical examples of data processing and the application of machine learning with the coordination of work through Spark are proposed. In conclusion, the relevance of the transition to data lakes for universities is emphasized, security and management challenges are emphasized, and the use of cloud technologies is recommended to reduce costs and increase productivity in data management.

Keywords: data warehouse, data lake, big data, cloud storage, unstructured data, semi-structured data
On the Development of Secure Applications Based on the Integration of the Rust Programming Language and PostgreSQL DBMS
- Abstract
- pdf (rus)
Currently, key aspects of software development include the security and efficiency of the applications being created. Special attention is given to data security and operations involving databases. This article discusses methods and techniques for developing secure applications through the integration of the Rust programming language and the PostgreSQL database management system (DBMS). Rust is a general-purpose programming language that prioritizes safety as its primary objective. The article examines key concepts of Rust, such as strict typing, the RAII (Resource Acquisition Is Initialization) programming idiom, macro definitions, and immutability, and how these features contribute to the development of reliable and high-performance applications when interfacing with databases. The integration with PostgreSQL, which has been demonstrated to be both straightforward and robust, is analyzed, highlighting its capacity for efficient data management while maintaining a high level of security, thereby mitigating common errors and vulnerabilities. Rust is currently used less than popular languages like JavaScript, Python, and Java, despite its steep learning curve. However, major companies see its potential. Rust modules are being integrated into operating system kernels (Linux, Windows, Android), Mozilla is developing features for Firefox's Gecko engine and StackOverflow surveys show a rising usage of Rust. A practical example involving the dispatch of information related to class schedules and video content illustrates the advantages of utilizing Rust in conjunction with PostgreSQL to create a scheduling management system, ensuring data integrity and security.

Keywords: Rust programming language, memory safety, RAII, metaprogramming, DBMS, PostgreSQL
Analysis of the directions of application of predictive analytics in railway transport
- Bukharova K.A.
- Ermakov S.G.
- Abstract
- pdf (rus)
The railway transport industry demonstrates significant achievements in various fields of activity through the introduction of predictive analytics. Predictive analytics systems use data from a variety of sources, such as sensor networks, historical data, weather conditions, etc. The article discusses the key areas of application of predictive analytics in railway transport, as well as the advantages, challenges and prospects for further development of this technology in the railway infrastructure.

Keywords: predictive analytics in railway transport, passenger traffic forecasting, freight optimization, maintenance optimization, inventory and supply management, personnel management, financial planning, big data analysis
Improving the efficiency of working with databases in PHP based on the use of PDO
- Abstract
- pdf (rus)
PHP Data Objects (PDOs) represent a significant advancement in PHP application development by providing a universal approach to interacting with database management systems (DBMSs). This article opens with an introduction describing the need for PDOs as of PHP 5.1, which allows PHP developers to interact with different databases through a single interface, minimising the effort involved in portability and code maintenance. It discusses how PDO can improve security by supporting prepared queries, which is a defence against SQL injection. The main part of the paper analyses the key advantages of PDO, such as its versatility in connecting to multiple databases (e.g. MySQL, PostgreSQL, SQLite), the ability to use prepared queries to enhance security, improved error handling through exceptions, transactional support for data integrity, and the ease of learning the PDO API even for beginners. Practical examples are provided, including preparing and executing SQL queries, setting attributes via the setAttribute method, and performing operations in transactions, emphasising the flexibility and robustness of PDO. In addition, the paper discusses best practices for using PDO in complex and high-volume projects, such as using prepared queries for bulk data insertion, query optimisation and stream processing for efficient handling of large amounts of data. The conclusion section characterises PDO as the preferred tool for modern web applications, offering a combination of security, performance and code quality enhancement. The authors also suggest directions for future research regarding security test automation and the impact of different data models on application performance.

Keywords: PHP, PDO, databases, DBMS, security, prepared queries, transactions, programming
Lakehouse-based data architecture: modern approaches to building a management model for educational organizations based on data analysis
- Abstract
- pdf (rus)
Nowadays, educational organisations face the need to effectively manage growing volumes of heterogeneous data from academic performance and digital educational resources to administrative processes. The article is dedicated to the study of modern approaches to building an Corporate data warehouse (DWH) using Data Lake technology to manage educational organisations. The article considers the integration of traditional methods of structured data storage with the flexibility and scalability of Data Lake, which allows to work effectively with large volumes of heterogeneous data. The description of DWH architecture adapted for educational institutions is given. The description of Apache Airflow platform is given.

Keywords: Data Lake, corporate data warehouse, Apache Airflow, Greenplum, ETL
Reliability Model of RAID-60 Disk Arrays
- Abstract
- pdf (rus)
The general characteristics of the innovative RAID-60 data storage system, which combines the best aspects of RAID-6 and RAID-0E technologies, as well as the reliability model of this data storage sys-tem, are presented. The main purpose of this connection is to provide outstanding performance with maximum data redundancy. The arti-cle discusses in detail the structural analysis, advantages and various scenarios for the use of the specified RAID-60 data storage system and the proposed model of its reliability. An important aspect is also the comparison of the RAID-60 system with other widespread vari-ants of data storage systems, such as RAID-0, RAID-1 and RAID-5, as well as with the reliability models of these systems. Particular at-tention is paid to the formula that allows you to calculate the average operating time to failure of a disk array. Also, for completeness of the analysis, attention is paid to plotting the probability of a RAID-60 failure (P(t)) over time (t). This graph is an important tool for visu-alizing the dynamics of reliability of data storage systems.

Keywords: RAID-60, reliability, disk array, data redundancy, manufacturer, parity blocks, data storage

Analyzing the main methods of predictive analytics

Moving from a university data warehouse to a lake: models and methods of big data processing

On the Development of Secure Applications Based on the Integration of the Rust Programming Language and PostgreSQL DBMS

Analysis of the directions of application of predictive analytics in railway transport

Improving the efficiency of working with databases in PHP based on the use of PDO

Lakehouse-based data architecture: modern approaches to building a management model for educational organizations based on data analysis

Reliability Model of RAID-60 Disk Arrays

News

News archive