You Don't Know Jack About Formal Verification

(queue.acm.org)

43 points | by eatonphil 5 hours ago

5 comments

dfabulich 1 hour ago
Formal verification is still too limited to be useful for most app developers. The article gives an example of an e-commerce platform using it to prove the correctness of managing refunds, but then acknowledges:
> As of today, the formally verified core can handle most effect-free logic—invariants, transitions, conflict resolution. But the UI, network calls, and database interactions typically sit outside the verification boundary. Verification makes the core airtight but doesn’t guarantee end-to-end correctness.
So you can formally prove that your e-commerce refund management logic is correct, except for proving that you processed the refund. You can't even prove anything about recording the refund in your database, say nothing of proving anything about your interactions with your payment processor.
If your app is mostly tricky logic with just a bit of I/O, your app is very unusual, and it's almost certainly not an e-commerce app. E-commerce apps are mostly CRUD apps; I/O with the database, the UI, and third-party APIs (e.g. payment processors) is 99% of the code.
Even property-based testing is mostly unhelpful for e-commerce apps like these.
Instead, think of formal verification as a runtime performance improvement of property-based testing. If property-based testing is useful for your app (it probably isn't), then you may be able to convert some of your property-based tests into formal verifications.
But, honestly, you probably can't do it, not even with a high budget of tokens.
I'd love to be proven wrong, but the way to do it would be to formally prove the correctness of non-trivial open-source code with property tests. Perhaps you could formally verify significant chunks of Postgres! (But I doubt it.)
[-]
- teach 1 hour ago
  So much this.
  I actually did take a formal verification course in college. Our final project was to use the techniques we'd been learning to verify some classic critical-section locking algorithm. I chose to verify an implementation of Lamport's bakery algorithm[0] in C (this was the 90s -- a lot of code was still being written in C).
  The problem is that Lamport's algorithm makes an assumption that the "ticket number" is unbounded and any finite implementation in C will almost certainly use a value which is limited to 32 bits or so.
  So I was able to formally verify that the algorithm fails to protect the critical section if enough processes are kept waiting to overflow the counter. :)
  This probably just means that Lamport's algorithm isn't a great choice for such environments, but I'm still bummed that the professor gave me a B.
  [0] https://en.wikipedia.org/wiki/Lamport%27s_bakery_algorithm
- gf000 58 minutes ago
  Well, I'm someone who barely knows more than jack about formal verification, but in pretty much every case you have to have some kind of model that you are actually verifying.
  How close that model sits to the real thing you have modeled is an important question, and you are free to be as close or distant as you want -- e.g. for verifying different properties of a programming language you might decide to not care about CPU instructions, registers, etc, and only care about the semantic model. This has absolutely many use cases (e.g. whether a particular optimization is sound) where this "model mismatch" doesn't matter, this doesn't make formal verification useless in any way or shape, imo.
  Getting back to at the "e-commerce refund management" -- you can absolutely have a model that does e.g. a particular database IO call that either succeeds or not. With such a model in place, you can have the rest of your codebase formally verified and know that 'with a properly working database it will always work correctly' [1]. Is that not a very significant and useful finding in and of itself? Would you be more confident in your end-to-end tested software than the above?
  Especially that one can then separately test that particular call site as deeply as they want, to determine that the assumed property (it either returns success or fails) is sound.
  [1] Given a correct specification, which is not easy to get right
- toast0 57 minutes ago
  The first part of formal verification is getting a formal specification. I don't know about most developers, but I rarely get a written specification for anything I work on, and when I do, it's no where near what would be needed to turn it into a formal specification.
  Anyway, the specification is subject to change at the whim of a hat, so putting a lot of effort into verifying it is foolish.
  I do see value in formal verification of IPC/threading communication primitives (locks, semaphores, queues, whatevs), but then formal verification usually require assumptions for hardware behavior and those aren't always correct, so. But I've never used formal methods outside exposure in an undergrad survey class, so I dunno.
- harperlee 1 hour ago
  I don't know a lot about formal verification, but:
  > So you can formally prove that your e-commerce refund management logic is correct, except for proving that you /processed the refund/. You can't even prove anything about recording the refund in your database, say nothing of proving anything about your interactions with your payment processor.
  You could say the same thing about the viability of functional programming on a CRUD webapp, but languages like clojure have been used to great effect here. The fact that thera are important, even fundamental, bits that you cannot verify, doesn't take out value from the fact that you can eliminate whole dimensions of issues.
- microgpt 55 minutes ago
  If you have a set of axioms that Postgres works as designed, you can prove that your code updates the database. If you define "the refund was processed" to mean "the refunded column of the order is true" you can prove that.
- gdogg24 1 hour ago
  [dead]
JacobAsmuth 22 minutes ago
Anyone interested in this should check out my Qed project I've been working on, a formally verified web frontend. https://github.com/JacobAsmuth/qed
pron 26 minutes ago
> It’s no longer just for safety-critical systems with the budget for specialized proof engineers. It’s for anyone who has a property worth proving
... and the budget to pay the AI to prove it.
I have quite a bit of experience with formal verification, but I don't understand the claim made in the article. As an aside, AI's ability to reliably prove the correctness of significantly large programs is still theoretical at this point, but let's assume it's possible. The claim in the article is that writing 10,000 lines of proof to prove a 100-line program was very expensive, and that's why it isn't done. But this increase in cost continues with AI! Whether you pay people to write the proofs or you pay an LLM to write the proof, you still have to pay for it. If I run a software company, saying that "verificaton is the AI's problem" isn't much different from saying, "it's the engineers' problem." Either way I'm not doing the work myself, but I am paying for it.
If the premise is that writing proofs was 100x more expensive than testing, I see nothing in this article to even suggest why it wouldn't still be 100x more expensive when an LLM is doing the work.
(BTW, the reason there aren't many specialised proof engineers is because they aren't in high demand; they're not being paid that much more than other engineers at a similar level)
[-]
- rurban 17 minutes ago
  > writing 10,000 lines of proof to prove a 100-line program was very expensive, and that's why it wasn't done.
  We are not that silly. We are writing compilers (ie model checkers) which translate the source code to formal proofs. No cost at all, you just need to limit loop sizes and function call depths, to keep the cost of the proof down. And then extrapolate the little proof to the general proof.
  [-]
  - pron 3 minutes ago
    Whatever the cost multiplier is that is prohibitive enough to be the reason not to do formal verification, I see no reason why that same multiplier won't remain with AI.
    Personally, I don't think that picture is quite accurate. Yes, there is a high cost multiplier for small programs, albeit perhaps not so prohibitive. But for large programs, that multiplier is, for most intents and purposes infinite, unless, perhaps, you have experts who know what's worth proving and what is not.
    Anyway, I'd like to see that put to the test. Have an LLM write a 50-100KLOC program and prove all correctness properties - with the properties themselves approved by an expert human - and tell us what it costs.
expo98 1 hour ago
I had fun in a college class that used Dafny, building a pseudo digital wallet, it wasn't the main focus of the class, so didn't get that much out of it
03284782470 1 hour ago
ACM now stooping to the level of clickbait youtubers. Just great.