High Assurance (for Security or Safety) and Free-Libre / Open Source Software (FLOSS)... with Lots on Formal Methods / Software Verification

by David A. Wheeler, 2006-06-02 (updated 2008-07-22)

This paper discusses some relationships between high assurance software (for security or safety) and free-libre / open source software (FLOSS). In particular, it shows that many tools for developing high assurance software have FLOSS licenses, by identifying FLOSS tools for software configuration management, testing, formal methods, analysis implementation, and code generation. It particularly focuses on formal methods, since formal methods are rarely encountered outside of high assurance. However, while high assurance components are rare, FLOSS high assurance components are even rarer. This is in contrast with medium assurance, where there are a vast number of FLOSS tools and FLOSS components, and the security record of FLOSS components is quite impressive. The paper then examines why this is the circumstance. The most likely reason for this appears to be that decision-makers for high assurance components are not even considering the possibility of FLOSS-based approaches. The paper concludes that in the future, those who need high assurance components should consider FLOSS-based approaches as a possible strategy. The paper suggests that government-funded software development in academia normally be released under a GPL-compatible FLOSS license (not necessarily the GPL), to enable others to build on what tax dollars have paid for, and to prevent the vast waste of effort caused by current processes. Finally, developers who want to start new FLOSS projects should consider developing new high-assurance components or tools; given the increasing attacks and dependence on computer systems, having more high assurance programs available will be vital to everyone’s future.

Introduction

This paper discusses some relationships between high assurance software (for security or safety) and free-libre / open source software (FLOSS). First, let’s define these key terms.

Definitions

Free-libre / open source software (FLOSS) is software whose license gives users the freedom to run the program for any purpose, to study and modify the program, and to redistribute copies of either the original or modified program (without having to pay royalties to previous developers). It’s also called libre software, Free Software, Free-libre software, Free-libre / Open source (FLOS) software, or open source software / Free Software (OSS/FS). The term “Free Software” can be confusing, because there may be a fee for the Free Software; the term “Free” is derived from “freedom” (libre), not from “no price” (gratis). More formal definitions of open source software and free software (in the sense of libre software) are available on the web. Examples include the Linux kernel, the gcc compilation suite, the Apache web server, and the Firefox web browser. Many FLOSS programs are commercial, while many others are not.

For purposes of this paper, let’s define “high assurance software” as software where there’s an argument that could convince skeptical parties that the software will always perform or never perform certain key functions without fail. That means you have to show convincing evidence that there are absolutely no software defects that would interfere with the software’s key functions. Almost all software built today is not high assurance; developing high assurance software is currently a specialist’s field (though I think all software developers should know a little about it). To develop high assurance software you must apply many development techniques much more rigorously, such as configuration management and testing. You need to use implementation tools you can trust your life to. And in practice, I believe that you need to use mathematical techniques called “formal methods” for a product to be high assurance, for the simple reason that it’s usually hard to create truly convincing arguments otherwise. A significant fraction of this paper covers formal methods, since they are rarely encountered outside of high assurance. There isn’t a single universal definition of the term high assurance, and products have been labelled “high assurance” without having any formal methods applied to them. But this definition should be sufficient for my purpose. Other terms used for this kind of software are “high integrity” and “high confidence” software.

Usually high assurance software is developed because of serious safety or security concerns. Strictly speaking, software by itself has no safety or security properties -- it can only be safe or secure in the context of a larger system. A nice discussion of this issue from the safety point of view is in Nancy Leveson’s book Safeware (see section 8.3). But software tends to control safety and security systems, and such software is often called “safe software” or “secure software”. For this paper I’ll talk about the security or safety of the software, with the understanding that this only makes sense if you understand the system that the software will be part of.

For purposes of this paper, the identity of the software’s supplier is not part of the definition of high assurance. By supplier, I mean the provenance (origin) and pedigree (lineage -- who the software passed through) of the software. By keeping the supplier identity out of the definition of high assurance, I can concentrate on technological issues. In reality, there may be some people who you wouldn’t trust even if they’d “proved” their code correct... so in practice it’s quite reasonable to ask questions like, “Who developed or modified the software? Can I trust them?” For both FLOSS and proprietary software, provenance and pedigree can be considered in exactly the same way -- in both cases, you’d consider who originally developed the software (in terms of each change), and who controlled the software from development through deployment to you. In particular, you’d consider who has rights to modify the software repository, and whether or not you trusted them. You might also consider how well the development environment itself is protected from attack. Don’t be fooled into thinking that FLOSS is “riskier” than proprietary software because it can be legally modified by anyone. Anyone can modify a proprietary program with a hex editor, too -- but that doesn’t mean you’ll use that modified version. The issue with suppliers is who controls your supply chain, and FLOSS often has an advantage in provenance and pedigree (because it is often easier with FLOSS to determine exactly who did what, and who has modification rights). But provenance and pedigree issues have to be handled on a case-by-case basis, and trying to cover those issues as well would over-complicate this paper. For example, the whole issue of “who trusts who” varies depending on the organizations and the circumstances. In an ideal world this wouldn’t matter, because the proofs would be true and could be rechecked everywhere. Given the massive move to globalization, I think that would be worth trying to make who created the software irrelevant. In any case, let’s concentrate on the technical aspects in this paper.

Contrasting levels of assurance

More generally, assurance is simply the amount of confidence we have that the software will do and not do the things it should and should not. Sometimes the things it should not do, what I call "negative requirements" , are the most important. Any particular piece of software can be considered by someone to be low, medium, or high assurance. This is obviously a qualitative difference; two products could be in the same assurance category, yet one be more secure than another.

For purposes of this paper, let’s define medium assurance to be software which doesn’t reach high assurance levels, but where there has been significant effort expended to find and remove important flaws through review, testing, and so on. Note that when creating medium assurance software, there’s no significant effort to prove that there are no flaws in it, merely an effort to find and fix the flaws.

Medium assurance software must undergo testing and/or peer review to reduce the number of flaws. Such mechanisms can be really valuable in reducing flaws, and eliminate a great many of them, but the normal method of using these mechanisms won’t guarantee their absence. You can eliminate some types of flaws completely by some activities, e.g., you can completely eliminate buffer overflows by choosing almost any programming language other than C or C++... but doing so would not eliminate all flaws! In particular, testing by itself is impractical to prove anything about real software. After all, exhaustively testing a program that just adds three numbers would take 2.5 billion years (assuming each number was 32 bits, you could run the program a billion times per second, and you used 1,000 computers for testing). Real programs are much more complicated than this, which is why testing by itself can’t reach the highest levels of assurance.

The differences between medium and high assurance (as I mean the terms in this paper) seem to confuse people, so let me contrast them directly. When developing a high assurance program, the program is presumed to be wrong (guilty) until a preponderance of evidence proves that it’s correct. When a medium assurance program is being developed, it is spot-checked in various ways throughout its development to try to detect and remove some of its worst defects. Medium assurance software development normally leave some defects in the program afterward. Few like the presence of latent defects, but few people are willing to pay for (or invest the time) for high assurance development techniques today in most software.

It’s reasonable to think that as technology improves, high assurance programs will become more common. But even today there are some situations where medium assurance is not enough. Typically this is where people’s lives, or the security of a nation, is at stake. In such cases, some of today’s customers need serious evidence that there are no critical defects of any kind. They need something different: High assurance.

High assurance challenges and standards

Ideally, all software would be high assurance, but ideally we’d all live in mansions. It’s very difficult to create truly high assurance software. The configuration management and testing requirements are usually more severe (and time-consuming) than those for other kinds of software. Applying formal methods requires significant mathematical training that most software developers don’t have, and can be very time-consuming.

Because of these challenges, high assurance software is usually only developed for critical security or safety components. When creating critical security or safety components, a number of regulations are often imposed.

High assurance software for security is the point of the Common Criteria for IT Security Evaluation (ISO standard 15408) when you select EAL 6 or higher -- and EAL 6 is really a compromise! For purposes of this paper, medium assurance software is in the EAL 4 to 5 range of the Common Criteria, so Red Hat Linux and Microsoft Windows would both be considered medium assurance products. I consider EAL 2 (or less) to be low assurance; EAL 3 is a compromise, but it’s basically low assurance.

Here are some other standards that are often mentioned in the security or safety world, which often impact this kind of development:

Organization of paper

The rest of this paper looks at FLOSS tools that can be used to create high assurance components (there are many), and FLOSS components that are high assurance themselves (they are rare). It then contrasts this situation with medium assurance -- there are many medium assurance FLOSS tools, and FLOSS components with impressive results. The paper then speculates why this is the circumstance, and then concludes.

Tools to create high assurance components

It turns out that there are a lot of FLOSS tools that can be used to help develop high assurance software. To prove that, I’ve identified a few important tool categories, and for each category I identify several FLOSS tools. The tool categories I discuss below are configuration management tools, testing tools, formal methods (specification and proof) tools, analysis implementation tools, and code generation tools.

There are many other categories of tools, and many other specific FLOSS tools, that are not listed below. But the discussion below should prove my point that there are many FLOSS tools that can be used to help develop high assurance components.

Configuration management tools

There are many FLOSS software configuration management (SCM) tools, indeed, I’ve written a review of several FLOSS SCM tools. (I’ve also written a paper discussing SCM and security.)

CVS is an old and still very widely-used SCM tool. I suspect that most software worldwide, both proprietary and FLOSS, is still managed by CVS as of 2006. Subversion (SVN) is the SCM tool rewritten as a replacement for CVS, and it’s widely used, too. But the list of FLOSS SCM tools is amazingly long, including GNU Arch, git/Cogito, Bazaar, Bazaar-NG, Monotone, mercurial, and darcs (see my paper for a longer list). Clearly, there’s no problem finding a FLOSS SCM tool.

Testing tools

All developers test their software, but high assurance software requires much more testing to gain confidence in it. But again, there’s a massive number of FLOSS tools that support testing. In fact, there are so many FLOSS tools for testing that there’s a website (opensourcetesting.org) dedicated to tracking them; as of April 2006 they list 275 tools! This ranges from bug-tracking tools like Bugzilla, to frameworks for test scripts like DejaGnu.

Many high assurance projects are required to meet specific measurable requirements on their tests. One common measure of testing is “statement coverage” (aka “line coverage”), the percentage of program statements that are exercised by at least one test. One problem with the statement coverage measure is that statements that have decisions, such as the “if” statement, can cause different paths. Thus, another common measure of testing is “branch coverage” the percentage of “branches” from decision points that are covered. Branch coverage has its weaknesses too, so there are many other test measures as well -- but statement and branch coverage are the two most commonly-used measures, so we’ll start with them.

Some experts believe that unit testing (low-level tests) should achieve 100% statement coverage and 100% branch coverage, with the simple argument that if you’re not even covering each statement and each branch, your testing is poor. Most others argue, however, that 80%-90% in each is adequate -- because the effort to create tests to meet the last percent is very large and less likely to find problems than by spending the effort in other ways. No matter what, in my opinion you should create your tests first and then measure coverage -- don’t write your tests specifically to get good coverage values. That way, you’ll often gain insight into what portions of the code are hard to test or don’t work the way you thought they would. That insight will help you create much better additional tests to bring the values up to whatever your project requires.

(Oh, and why measure both statement and branch coverage? It turns out it's possible to meet one without the other. For example, an "if" statement with a "then" clause but no "else" caluse might have all its tests yield true.. in which case all the statements are covered, but not all the braches are covered (the "false" branch is not covered). Normally, when you cover all branches you cover all statements, but there are special cases where that is not true. For example, if your program (or program fragment) doesn't contain any branches, or if there is an exception handler without any branches in its body, you can have all branches covered but not all statements covered. Exception handlers might be considered a branch, but that interpretation is not universal.)

There have been several recent developments in testing that improve test efficiency:

Even in the case of test case measurement, there are FLOSS tools that can meet this need. There are several FLOSS “test coverage” tools, such as gcov, that can report which statements or which branches were not exercised by your test suite.

Formal methods tools

Many software developers have no idea what “formal methods” are. Yet my definition of high assurance implies that we’ll usually need to use “formal methods” to create high assurance software. This section explains what formal methods are, shows that there are lots of FLOSS tools even in this area, and then discusses some of the implications.

Formal methods: Introduction

Formal methods, broadly, are the application of rigorous mathematical techniques to software development (see An International Survey of Industrial Applications of Formal Methods for a lengthier definition and discussion). Ideally, we’d like a rigorous mathematical specification stating exactly what we want the program to do and not do, and then prove all the way down to the machine code that the software meets the specification. This is normally hard to do, so various compromises are often made. Many have identified three different broad levels of the use of formal methods, in order of increasing cost and time:

The “levels” are a little misleading, because you can actually do things partially (perhaps only a part of the software is formally specified), and level 1 is somewhat ambiguous. But these levels give the basic flavor; there is a trade between rigor and effort.

Now we come to the decision of where to draw the line, and this isn’t an easy decision. For purposes of this paper, to count as “high assurance” there needs to be some carefully-reasoned explanation as to why the running code meets its key requirements. How much effort is needed for this justification depends on the risk you’re willing to take, and where you perceive the risks to be. Thus, while level 0 may be less costly, that is often not enough, so high assurance development often moves into level 1 and uses a focused application of level 2 on the parts that cannot be shown correct otherwise. Almost no one tries to prove all code down to the machine code; typically developers with such concerns will check the machine code by hand to ensure that it corresponds with the source code. Some may prove down to the source code, or at least the parts of the source code that are most worrisome. Others may use proofs to a detailed software design, and then use other less rigorous arguments to justify the source code. You can even back off further, using formal methods only for the specification (level 0), or not at all. In all cases, though, there needs to be some careful reasoning that convinces others that the code actually meets the key requirements, typically by showing a stepwise refinement from specification through to the code. Mantras such as “correct by construction” come into play in these kinds of systems.

We would love to formally prove that every line of code, down to the machine code, is correct; doing so has lots of benefits. Why is it so costly? Simply put, creating proofs is incredibly hard to do; often tools and knowledgeable humans must work together to create them. To prove code correct, you generally must write the code and proofs simultaneously (so that the code is in a form that is easier to prove). Requiring proofs also creates limits on the size of programs (and thus their functionality), because our ability to do proofs does not scale that well. Years ago, the old historical rule of thumb for the largest amount of code that can be reasonably proven correct all the way down to the code level was about 5,000 lines of code. Cleanly-separated components can be verified separately (e.g., a computer’s boot and initialization programs might be separable from an operating system kernel), and that helps. This rule of thumb is (I believe) historical; the tools for verifying code have improved, and good tools (including languages designed for provability) can help today’s developers go significantly beyond this scale. SPARK Ada’s developers in particular claim they can go way beyond this. But it’s not clear where the upper bounds really are, and it’s clear that formally proving code gets harder as the software gets larger. Typical operating systems have millions of lines of code and are growing fast, so no matter what the upper bound is, there is a real gap between typical commercial demands for functionality and the ability of today’s formal methods tools to verify it. Don’t expect Windows, MacOS, the Linux kernel, or *BSD kernels to be formally proved down to their code level. Proving only general models of code (instead of the system itself) eliminates this problem, but as I noted above, this doesn’t show that the code itself is highly assured.

Note that all formal methods have a basic weakness: They must make assumptions, because you have to start somewhere. In any such system, humans have to check the assumptions very, very carefully. If you start with a false assumption, a "proof" could produce an invalid conclusion. This problem -- that your assumptions may be invalid -- is a key reason that testing and other activities are still needed for high assurance software, even if you use formal methods extensively.

Another trade-off in formal methods is between expressiveness and analyzability. Fundamentally, any formal method has some sort of language, a set of axioms, and inference rules (the rules that let you determine if something else is true). A language that is extremely flexible (expressive) typically tends to be harder to analyze. As a result, there are many different languages, each better and different things.

For more information about formal methods, you can see the Wikipedia information on formal methods (particularly the main article on formal methods, automated theorem proving, and model checking), the Formal Methods Virtual Library, the NASA Langley Formal Methods Site, and the Formal Methods Education Resources. QPQ (“QED Pro Quo”) is intended to be an “online journal for publishing peer-reviewed source code for deductive software components”, and has links to various tools and papers. Aleksey Nogin’s “A Review of Theorem Provers” has a nice short summary comparing theorem provers. The Seventeen Provers of the World compares 17 proof assistants for mathematics, and some of which are relevant to software development. Most surveys seem to be old, unfortunately. A quick overview is available from the CMU 1996 paper “Strategic Directions in Computing Research Formal Methods Working Group” as E. Clarke and J. Wing’s 1996 “Formal Methods: State of the Art and Future Directions” [PDF]. I can point out “Notes on PVS from a HOL perspective” by Michael J.C. Gordon (1995), An Analysis of two Formal Methods: VDM and Z (1997), and Vienneau’s 1993 “A Review of Formal Methods” (even the full version is incomplete for its time, but at least it is easy to read). The History of the HOL System explains some of the convoluted history of LCF, HOL, and their various derivatives, and also notes some other systems. "Formal Methods for IT Security" (May 2007) has quick overview of tool types (one quibble: I agree that automatic theorem provers like prover9 take less effort than interactive tools like PVS and HOL, but automatic tools don't give less assurance - it's just that they cannot be effectively used on certain classes of problems). The “Handbook of Automated Reasoning” (Edited by J. Alan Robinson and Andrei Voronkov) is a survey. A short 2003 survey of tools commented on many tools, and the Formal Methods Framework report (1999) summarizes many tools. DACS’ list of formal methods literature is old, but it’s nicely focused on key works. Ingo Feinerer's "Formal Program Verification: A Comparison of Selected Tools and Their Theoretical Foundations" (2005) is a much more recent comparsion of formal methods tools (in this case, of the Frege Program Prover, KeY, Perfect Developer, Prototype Verification System). Griffioen and Huisman's 1998 work compares PVS and Isabelle; Zolda's 2004 work compares Isabelle and ACL2. Formal Methods Europe is an independent association with aim of stimulating the use of, and research on, formal methods for software development; their website has some summaries (though it has a very anemic list of tools as of May 2006). The mathematically-oriented papers Information technology implications for mathematics: a view from the French riviera and Deliverable 4.1: Survey of Existing Tools for Formal MKM compares tools’ mathematical foundations (these have the informal look of notes not intended for the masses, but they are still interesting). Johann Schumann’s Automated Theorem Proving in High-Quality Software Design discusses integrating automated theorem provers into larger development approaches and tools. High-Integrity System Specification and Design by Bowen and Hinchey is a collection of older key essays. Bowen and Hinchey’s “Ten Commandments of Formal Methods ...Ten Years Later” (IEEE Computer, January 2006) discusses previously-identified lessons learned (their “ten commandments” of their IEEE Computer April 1995 article) and argues that they have generally stood the test of time. Two oft-referenced formal methods advocacy pieces were published in IEEE Software: “Seven Myths of Formal Methods” by Anthony Hall (Sep/Oct 1990) and “Seven More Myths of Formal Methods” by Bowen and Hinchey (July 1995). Richard Sharpe argues that formal methods may be increasingly used in the future. Palshikar’s “An introduction to model checking” is a gentle introduction to that topic. Tony Hoare and Jay Misra have proposed a “Grand Challenge” effort to speed maturation of formal methods -- for their pitch, see Verified software: theories, tools, experiments (July 2005). The VSETTE conference of October 2005 has a response to this proposal, focusing on systemic methods for specifying, building, and verifying high-quality software. Shankar’s presentation The Challenge of Software Verification gives interesting comments on this challenge. The paper Formal specification and verification of data separation in a separation kernel for an embedded system (Constance L. Heitmeyer, Myla Archer, Elizabeth I. Leonard, and John McLean of the Naval Research Laboratory) describes a very promising approach to proving all the way down to the code. These tools generally presume that you already understand the basics of formal logic; if you don't, books such as P. D. Magnus' "forall x" may be of use to you. (A serious problem in the U.S. is that many software developers have never studied discrete math, including logic, even though that's the basis of their field; few would allow a civil engineer to design a bridge without first learning calculus.) No doubt there are many other sources of information. Peter Gutman's article on "Verification Techniques" (a chapter of his thesis) is a much more pessimistic view of verification techniques, and has important insights on the limitations of formal methods and some other verification techniques.

After I wrote this paper, I discovered the very interesting list Free software tools for formal verification of computer programs by David Mentré. You should definitely take a look at this paper as well if you're interested in the topic! Trac's list of Theorem Proving systems identifies their licenses, many of which are FLOSS.

Here are some other lists of formal methods related tools:

Note - don’t treat “formal methods” as a checklist item for high assurance (oh look, some math, we’re done!). The point in high assurance is to identify the risk areas, and then use tools (like formal methods) to convincingly show that there isn’t a problem. There is more than a little overlap between those developing high assurance software and the research community; applying these techniques can be difficult for some domains, if you need to get really high levels of confidence for complex systems.

Formal methods: Categories of tools

There are many different kinds of formal methods tools, which I will group into these categories:

Note that these are very rough and imprecise categories. All formal methods tools must support some kind of specification notation, tools often have multiple capabilities, and there is a general trend of combining these tools into larger interoperable capabilities. A general discussion about issues in integrating tools is in the paper “PVS: Combining Specification, Proof Checking, and Model Checking”. Thus, any categorization is imperfect, but hopefully this division will help.

There are some interesting competitions, particularly for the automated tools. Here are the CASC results of 2008 using TPTP, for example.

It turns out that there are many FLOSS tools that support using formal methods, in all of those categories, as the following sections will show.

Formal methods: Specification tools

All formal tools have some sort of specification language, but some languages are often focused on higher-level specifications -- helping users enter, syntactically check, and cleanly display the specifications with a minimum of effort. These are often used for level 0 and 1 type of work (though they can be used for more -- often by devising connections to other tools). Here is a partial list of specification languages, and FLOSS tools that support them:

Formal methods: Theorem Provers/Proof Checkers

Here are FLOSS theorem provers and checkers (increasingly they are combined with model checkers, in which case I list them here and not under model checkers):

I have not included some tools in this list because I can't confirm that they have a FLOSS license. Twelf has a "license" statement that doesn't give anyone the right to use the program, yet requires that users use it legally, so theoretically it's illegal to use it. I received an email that they had decided to release it under a BSD-style license, but haven't seen public evidence of that yet (hopefully that will change). MAYA (originally part of Inka, something that supports graphs and connects to various other useful components) has no license that I can find; its "mathweb" component is clearly GPL'ed, but it's unclear it's entirely GPLed, and it depends on the proprietary Allegro Common LISP. RRL has no license I can find, and I can't download it. The lesson here is that if you develop a tool, you need to clearly identify its license so that others can use it.

Formal methods: Model checkers

Here are tools that are model checkers that at least say they are FLOSS:

Formal methods: SAT Solvers

The Boolean satisfiability (SAT) problem is the problem of determining if the variables of a given Boolean formula (where all variables can only be true or false) can be assigned in such a way as to make the formula evaluate to TRUE; alternatively, it's to determine if no such assignments exist (i.e., if it's unsatisfiable). SAT programs are low-level programs/algorithms that many other formal methods tools (like theorem provers) build on. In the last number of years there have been a lot of improvements in SAT solvers, resulting in improvements on anything built on them. SAT is a big area; SAT live tracks SAT goings-on. Here are some SAT surveys. There are a number of competitions, including the International SAT competition,
  1. MiniSat (MIT license). In the SAT 2005 competition, MiniSAT all by itself won Silver in the industrial categories SAT+UNSAT and SAT. MiniSAT is a "conflict driven solver", one of main (modern) styles of SAT solvers. SatELiteGTI is the combination of SatELite (used as a preprocessor) with MiniSat (the “GTI” component). SatELiteGTI won Gold in all three industrial categories: SAT+UNSAT, SAT, and UNSAT. I cannot find the license for SatELite, but the developers are making SatELite obsolete anyway by incorporating its capabilities into their updated version of MiniSAT.
  2. MarchDL (GPLv2+) is a SAT solver based on the "look-ahead" approach (one of the other main modern styles of SAT solvers). It won a prize at the 2007 SAT competition.
  3. Fast SAT Solver (GPL) is a SAT solver based on genetic algorithms.
  4. PicoSAT (MIT-style) is a recent and strong SAT solver. It did very well in the SAT'07 SAT Solver competition; Version 535 won the category of "satisfiable industrial instances" and came second on all industrial instances (satisfiable and unsatisfiable).
  5. Vallst (Reciprocal Public License, a GPL-like but stricter OSI-certified license) is another SAT solver. It won two golds and one bronze in the SAT 2005 “world championships”.

Formal methods: SMT Solvers

The Satisfiability Modulo Theories (SMT) problem is an extension of the SAT problem (above). Basically, given expressions with boolean variables and/or predicates (functions that take potentially non-boolean values yet return boolean values), determine the conditions that would make it true (or conversely, show it's false). An SMT solver adds one or more "theories" for various predicates, e.g., it might add real numbers (adding predicates like less-than and equal-to), integers, lists, and so on. SMT solvers are sometimes implemented on top of SAT solvers.

Many other systems build on top of SMT solvers. The SMT-LIB: The Satisfiability Modulo Theories Library and SMT-COMP: The Satisfiability Modulo Theories Competition are important to many SMT solver implementors. Here is a list of SMT solvers (current and abandoned, FLOSS and not).

Examples include:

  1. Ergo (Alt-Ergo) (CeCILL-C license) is an automatic theorem prover focused on program verification. It supports equational theory (=) and linear arithmetic, and it's relatively small. One significant problem is that this is licensed under the extremely rare CeCILL-C license, not the CeCILL license. I can't find a major FLOSS organization who has ruled that the CeCILL-C license is FLOSS (including the FSF, OSI, Debian, or Fedora). This license is intended to be FLOSS, but that is as yet untested.
  2. Gappa (CeCILL or GPL, libraries LGPL) is a tool "intended to help verifying and formally proving properties on numerical programs dealing with floating-point or fixed-point arithmetic. It has been used to write robust floating-point filters for CGAL and it is used to certify elementary functions in CRlibm. It requires Coq support library 0.8. ("Why" can invoke Gappa.)
  3. The Decision Procedure Toolkit (DPT) (Apache license) is "a system of cooperating decision procedures for answering satisfiability queries. The DPT implementation in OCaml comprises a DPLL-style SAT solver with theory-specific decision procedures".
  4. Arithmetic and Boolean solver (ABSolver) (Common Public License 1.0) is a framework for combining other tools to solve mixed arithmetic and Boolean problems, and is designed to make it easy to add new solvers. ABSolver is remarkable in its ability to solve non-linear problems. However, "Efficient Solving of Large Non-linear Arithmetic Constraint Systems with Complex Boolean Structure" (Journal on Satisfiability, Boolean Modeling and Computation 1 (2007) 209–236) warns that ABSolver's "currently reported implementation uses the numerical optimization tool IPOPT (https://projects.coin-or.org/Ipopt) for solving the non-linear constraints. Consequently, it may produce incorrect results due to the local nature of the solver, and due to rounding errors."
  5. Argo-lib (GPLv2) is an SMT-LIB solver. It is "a C++ library which provides a generic support for using decision procedures in automated reasoning systems and also support for several schemes for combining and augmenting decision procedures. This platform follows the SMT-lib initiative which aims at establishing a library of benchmarks for satisfiability modulo theories. ARGO-lib platform can be easily integrated into other systems, but it should also enable comparison and unifying of different approaches, evaluation of new techniques and hopefully help advancing the field. ARGO-lib follows a range of techniques and different systems. The latest version of ARGO-lib provides support for DPLL(T) scheme and for producing object-level proofs."
  6. OpenSMT (GPLv3) is a "compact and open-source SMT-solver written in C++, with the main goal of making SMT-Solvers easy to understand. OpenSMT is built on top of MiniSAT (http://minisat.se)... Currently OpenSMT supports only the theory of Equality with Uninterpreted Functions [QF_UF]... In the future we plan to extend OpenSMT with other theories."
  7. Fx7 (BSD-like license).
  8. haRVey SMT prover has two branches: haRVey-FOL (LGPL) and haRVey-SAT (BSD license). From the website: "Current developments aim at merging both branches, and provide one uniform tool. The main issues are the logics are different (haRVey-SAT is multi-sorted, haRVey-FOL is not) [and] there is some technical and theoretical difficulties to combine first-order provers within a Nelson-Oppen scheme. So, haRVey is still in development stage...". haRVey downloads are available (but watch out, some links are broken, so it can be hard to find).
  9. STP (MIT license) is a Decision Procedure for Bitvectors and Arrays. "STP is a constraint solver (also referred to as a decision procedure or automated prover) aimed at solving constraints generated by program analysis tools, theorem provers, automated bug finders, intelligent fuzzers and model checkers. STP has been used in many research projects at Stanford, Berkeley, MIT, CMU and other universities. It is also being used at many companies such as NVIDIA, some startup companies, and by certain government agencies. The input to STP are formulas over the theory of bit-vectors and arrays (This theory captures most expressions from languages like C/C++/Java and Verilog), and the output of STP is a single bit of information that indicates whether the formula is satisfiable or not. If the input is satisfiable, then it also generates a variable assignment to satisfy the input formula. We are currently adding the theory of finite sets and the theory of uninterpreted functions to STP." There is a SourceForge home page for STP. It uses MINISAT.

Formal methods: Other Tools

Here are FLOSS tools that are don’t easily fit into the above categories: