Engineering Journal of Don

Moving from a university data warehouse to a lake: models and methods of big data processing
- Abstract
- pdf (rus)
The article examines the transition of universities from data warehouses to data lakes, revealing their potential in processing big data. The introduction highlights the main differences between storage and lakes, focusing on the difference in the philosophy of data management. Data warehouses are often used for structured data with relational architecture, while data lakes store data in its raw form, supporting flexibility and scalability. The section ""Data Sources used by the University"" describes how universities manage data collected from various departments, including ERP systems and cloud databases. The discussion of data lakes and data warehouses highlights their key differences in data processing and management methods, advantages and disadvantages. The article examines in detail the problems and challenges of the transition to data lakes, including security, scale and implementation costs. Architectural models of data lakes such as ""Raw Data Lake"" and ""Data Lakehouse"" are presented, describing various approaches to managing the data lifecycle and business goals. Big data processing methods in lakes cover the use of the Apache Hadoop platform and current storage formats. Processing technologies are described, including the use of Apache Spark and machine learning tools. Practical examples of data processing and the application of machine learning with the coordination of work through Spark are proposed. In conclusion, the relevance of the transition to data lakes for universities is emphasized, security and management challenges are emphasized, and the use of cloud technologies is recommended to reduce costs and increase productivity in data management. The article examines the transition of universities from data warehouses to data lakes, revealing their potential in processing big data. The introduction highlights the main differences between storage and lakes, focusing on the difference in the philosophy of data management. Data warehouses are often used for structured data with relational architecture, while data lakes store data in its raw form, supporting flexibility and scalability. The section ""Data Sources used by the University"" describes how universities manage data collected from various departments, including ERP systems and cloud databases. The discussion of data lakes and data warehouses highlights their key differences in data processing and management methods, advantages and disadvantages. The article examines in detail the problems and challenges of the transition to data lakes, including security, scale and implementation costs. Architectural models of data lakes such as ""Raw Data Lake"" and ""Data Lakehouse"" are presented, describing various approaches to managing the data lifecycle and business goals. Big data processing methods in lakes cover the use of the Apache Hadoop platform and current storage formats. Processing technologies are described, including the use of Apache Spark and machine learning tools. Practical examples of data processing and the application of machine learning with the coordination of work through Spark are proposed. In conclusion, the relevance of the transition to data lakes for universities is emphasized, security and management challenges are emphasized, and the use of cloud technologies is recommended to reduce costs and increase productivity in data management.

Keywords: data warehouse, data lake, big data, cloud storage, unstructured data, semi-structured data
Reliability Model of RAID-60 Disk Arrays
- Abstract
- pdf (rus)
The general characteristics of the innovative RAID-60 data storage system, which combines the best aspects of RAID-6 and RAID-0E technologies, as well as the reliability model of this data storage sys-tem, are presented. The main purpose of this connection is to provide outstanding performance with maximum data redundancy. The arti-cle discusses in detail the structural analysis, advantages and various scenarios for the use of the specified RAID-60 data storage system and the proposed model of its reliability. An important aspect is also the comparison of the RAID-60 system with other widespread vari-ants of data storage systems, such as RAID-0, RAID-1 and RAID-5, as well as with the reliability models of these systems. Particular at-tention is paid to the formula that allows you to calculate the average operating time to failure of a disk array. Also, for completeness of the analysis, attention is paid to plotting the probability of a RAID-60 failure (P(t)) over time (t). This graph is an important tool for visu-alizing the dynamics of reliability of data storage systems.

Keywords: RAID-60, reliability, disk array, data redundancy, manufacturer, parity blocks, data storage

Moving from a university data warehouse to a lake: models and methods of big data processing

Reliability Model of RAID-60 Disk Arrays

News

News archive