control reflecting policies of all participating organizations,
distribution of servers with consideration of network
scalability, and so on.
In the current study, after defining the faults handled
by SPLICE/NM, the authors will discuss the issues of
cooperative management between the distributed servers,
and then make evaluations and identify problems regarding
a real prototype based on the SPLICE/NM design.
2. Fault Definition in SPLICE/NM
Network management covers a wide field, from the
network equipment structure to user account generation and
billing. Considering various existing technologies and ap-
proaches, it is not feasible to discuss network management
as a whole.
Therefore, in OSI the entire network management is
subdivided into five classes: structure management, per-
formance management, fault management, security
management, and account management. SPLICE/NM
concentrates on the fault management and is intended for
automation of fault recovery tasks as they are convention-
ally performed by managers, but since there are all kinds of
faults that can occur on the network, they are perceived
differently by different people.
For example, although network performance deterio-
ration might be considered a fault, it could be difficult to
subdivide network deterioration cases into those regarded
as a temporary condition and those that at a certain stage
can be viewed as a fault. Solving this problem would require
technology for performance management such as that for
monitoring network traffic on a permanent basis. In addi-
tion, in a large-scale network such as the Internet, for
various reasons, such as the presence of communication bot-
tleneck sections outside of the current organization, prob-
lems cannot be solved without spending time and money.
Therefore, in SPLICE/NM, we have limited the fault
types and concentrated on network connectivity. In other
words, we consider a fault as a condition when it is impos-
sible to reach a specific host or network, or a condition when
a specific service cannot be used.
In addition, the network equipment targeted by
SPLICE/NM is a gateway functioning as a connecting point
between networks, or a server offering network services.
The term management object will refer to such types of
equipment.
3. Fault Management Model in
SPLICE/NM
Fault management in a network discussed in the time
flow aspect can be broken down into the following phases:
1. Fault discovery
2. Fault diagnosis (identification of cause and devel-
opment of countermeasures)
3. Recovery operations
4. Recording of operations results
In a conventional accident management system, each
phase is managed by a manager or a user using this network.
Among these, the fault discovery phase has long been a
target of attempts at automation, with the resulting network
monitoring system using SNMP. In addition, there is the
trouble ticket system, which involves creating a cooperative
operation environment between multiple managers. This
system can be used for recording the recovery operations
results.
The SPLICE/NM system proposed in the current
study concentrates on the phases of fault diagnosis and
recovery operations, aiming at automating the operations
required at these phases. Figure 1 illustrates a fault manage-
ment model for the case of a SPLICE/NM system combined
with a network monitoring system and a trouble ticket
system.
In the fault management model shown in Fig. 1, the
management object is usually monitored by the network
monitoring system. If a fault occurs in a management
object, then it is detected by the network monitoring system
and reported to SPLICE/NM. SPLICE/NM performs diag-
nostics of the fault in the indicated management object, and
conducts recovery operations, with the results reported to
the trouble ticket system. During this process, if it is impos-
sible to fully recover from the fault, the trouble ticket
system contacts the manager, and the recovery operations
are delegated to the manager. Finally, on recovering from
the fault, the operations results are recorded by the manager.
For implementing the fault management model using
SPLICE/NM, it is important to know how the information
Fig. 1. Management model of SPLICE/NM.
91