Research on Reliability Evaluation of Big Data System
Rui Cao
College of Computer and Information Engineering
Inner Mongolia Agricultural University
Hohhot, China
e-mail: 18248112616@163.com
Jing Gao
*
College of Computer and Information Engineering
Inner Mongolia Agricultural University
Hohhot, China
e-mail: gaojing@imau.edu.cn
Abstract—The application of big data system is now more
pervasive. The reliability of the large data system is crucial to
both the academic and the industry. However, to date there are
few studies on the reliability of the big data system, and lack of
evaluation model. This paper uses the fault tree to model the
reliability of the big data system on the cloud. The type of
faults is summarized and the cause of fault is analyzed by
experiments. The fault tree analysis (FTA) is used to evaluate
the reliability of the big data system, which can provide
reference for the fault processing and quality assurance of big
data system.
Keywords-big data system; reliability; fault tree; evaluation
I. INTRODUCTION
After the cloud computing, big data has been attracting a
lot of attention. Big data has become the focus and the key
technology in all walks of life in current society. In the actual
application environment, various types of failures (such as
hardware failure, network failure, etc.) inevitably exist in all
kinds of systems. Although big data computing system are
fault-tolerant, many hardware and software failures can be
tolerated, there are still some tasks that fail or affected by
performance degradation, which has a negative impact on the
business and user experience. Reliability is an important
indicator of big data systems. The improvement of reliability
is the basic guarantee to ensure the good operation of the
system. A lot of research has been carried out on this.
JunWang developed a reliability model to discuss the system
reliability of multi-way declustering data layouts. The
reliability model is used to analyze the data loss rate and the
system repair rate under different data layout schemes, and
the important parameters in the model are quantified. The
most widely used three layouts in the enterprise-level
massive duplex storage system are compared, and the
reliability model is simulated to compare the system
reliability of different recovery bandwidth [1]. Yoshinobu
proposed a jump diffusion model with two-dimensional
Wiener processes, and evaluate the stability of cloud
software by using the sample paths obtained from the jump
diffusion model. By analyzing actual data, show numerical
examples of dependability optimization based on the
software maintenance cost considering big data on cloud
computing [2]. Chen proposed an assessment model to assess
Big Data Structure of Internet of Things. The experiment
proves the practicability and efficiency of the model, and
makes contribution to the improvement of reliability work
[3].
So far, there are few studies on the reliability of large
scale big data system. Therefore, this paper analyzed and
summarized the main fault types and logical relations of big
data computing system. By using the most widely used fault
tree in engineering, reliability modeling is carried out. Utilize
fault tree analysis to research the reliability of big data
system, providing reference for fault handling and reliability
guarantee.
II. F
AULT TREE ANALYSIS
The fault tree analysis was developed by the telephone
Laboratory of the Baer Telegraph Company of the United
States in 1961 [4]. It is a graphic deduction method that
refines the cause of system failure step by step. Take the
most unexpected failure state of the research system as the
goal of failure analysis, and then look for the factors that
cause the failure step by step until there is no need to further
explore the factors. It is an internationally recognized and
effective method of reliability analysis and fault diagnosis,
which is intuitive, clear, and strong logic [5]. Qualitative
analysis and quantitative analysis can be done. The main
flow of fault tree analysis is shown in Fig. 1.
Start
System analysis
Failure Investigation
Determine the Top Event Fault Analysis
Determine Basic Events
Create Fault Tree
Qualitative Analysis Quantitative Analysis
Analysis Report
End
Figure 1. Flow chart of fault tree analysis
A. Qualitative Analysis
The main purpose of qualitative analysis is to find the
minimum cut set. The cut set is the set of basic events that
lead to the occurrence of the top events. The minimum cut
set is the set of the minimum basic events that lead to the
occurrence of the top events. The minimum cut set
represents the risk of the system, and can be used to analyze
and calculate the fault tree [6]. There are many methods to
find the minimum cut sets, such as Monte Carlo, ascending
method, descending method and Boolean algebra method.
261
2018 the 3rd IEEE International Conference on Cloud Computing and Big Data Analysis
978-1-5386-4301-3/18/$31.00 ©2018 IEEE