94 Problems with Formal Verification
This section will examine some of the reasons why formal methods have failed to provide the silver bullet
which they initially seemed to promise.
3.1. Problems with Tools and Scalability
The tools used to support formal methods arose from an academic research environment characterised by a
small number of highly skilled users (usually the developers of the tools) and by extension an environment in
which it didn’t really matter if the tools weren’t quite production grade, difficult to use, slow, or extremely
resource-hungry — they’re only research prototypes after all. The experimental background of the tools used
often lead to a collection of poorly-integrated components built by different researchers, with specification
languages which varied over time and contained overlapping and unclear features contributed by various
sources, or which differed depending on which researcher’s verification tool was being employed. In
systems like HDM this lead to assessments by independent observers that “at present an outsider can not use
HDM to design, implement, and verify a program from beginning to end” [34]. In addition the tools were
tested on small problems (usually referred to somewhat disparagingly as “toy problems”) which were
targeted more at exercising the tools than exercising the problem. This section covers some of the issues
which arose because of this.
Both of the formal methods endorsed for use with the Orange Book, Ina Jo/Ina Mod (collectively known as
the Formal Development Methodology, FDM) [35][36][37] and Gypsy (as part of the Gypsy Verification
Environment, GVE) [38][39][40] date from the 1970’s and have seen little real development since then.
Both are interactive environments, which isn’t a feature but a polite way of indicating that they require a lot
of tedious user intervention and manual effort in order to function. Furthermore, not only do they require
extensive assistance from the user, but the difficult nature of the tools and task requires expert users to work
with it, and once they’re finished it requires another set of expert users to verify and evaluate the results
[41][42][43][44].
Another early effort, the Boyer-Moore theorem prover, has been described as “like trying to make a string go
in a certain direction by pushing it […] Proof becomes a challenge: to beat the machine at its own game (the
designers and some others have chalked up some very high scores though)” [45]. Attempts to use the BM
prover in practice lead to the observation that “the amount of effort required to verify the system was very
large. The tools were often very slow, difficult to use, and unable to completely process a complex
specification. There were many areas where tedious hand analysis had to be used” [46].
Many of the tools originate from a research environment and are of a decidedly experimental nature which
contributes to the difficulty in using them. Several kernel verifications have had to be abandoned (or at least
restarted so that they could be approached in a different manner) because the tools being used for the
verification weren’t quite up to handling the problem. This wasn’t helped by the fact that they were often
built using whatever other tools happened to be available or handy rather than the tools which would
typically be found in a production environment, for example the Gypsy compiler which was used in some
kernel validations was originally implemented as a cross-compiler into Bliss, which was only available on a
limited number of DEC platforms, making it extremely difficult even to get access to the right tools for the
job.
Working at a very high level of abstraction can produce a (hopefully) correct and (more or less) verifiable
specification which then needs to be completely rewritten by system developers in order to make it
implementable [47]. Some researchers have suggested performing the implementation directly in a
specification language, however this is equivalent to a standard implementation created with an even-higher-
level-language. Although this may eliminate many specification bugs, what will be left is a class of even
tougher specification bugs which require an even higher-level specification system to expose [48]. In
addition in order to make it workable the specification language will typically have been modified in order to
remove features which are considered unsafe, but the downside of removing unsafe features such as pointer
variables is that all data structures which are manipulated by a routine will need to be passed in and out
explicitly as parameters, resulting in huge parameter lists for each routine and a high overhead from moving
all the data around.
Another approach which has been suggested is to automatically verify (in the “formal proof of correctness”
sense) the specification against the implementation as it is compiled [49]. This has the disadvantage that is
can’t be done using any known technology.