ExVerus: Verus Proof Repair via Counterexample Reasoning

Yang, Jun; Sun, Yuechun; Wu, Yi; Caridad, Rodrigo; Yuan, Yongwei; Yao, Jianan; Lu, Shan; Pei, Kexin

TLDR: ExVerus is a counterexample-guided framework that lets LLMs use verifier feedback to refine proofs. Instead of static proof generation, it learns from failed attempts by generating and generalizing counterexamples into inductive invariants—yielding higher proof accuracy, robustness, and efficiency than prior LLM-based Verus systems.

Why Counterexample-Guided Formal Proof Generation?

Current LLM-based proof generation framework often treats verification as a static, end-to-end prediction over source code, relies on sparse verifier feedback, and lacks visibility into concrete program behavior. As a result, such systems struggle to reason about fine-grained semantics,especially when constructing inductive loop invariants that demand deeper behavioral understanding, and counterexample guided proof generation can help address these challenges.

Overview of ExVerus

ExVerus is a counterexample-guided proof-repair framework for Verus that transforms static proof generation into an iterative reasoning process. By automatically generating, validating, and generalizing counterexamples, it enables LLMs to refine loop invariants based on concrete program behaviors. It operates in mainly three phases:

Initial Generation: Produce an initial proof using a baseline LLM prompt; failed proofs trigger repair.
Counterexample Generation: Generate and validate SMT-based counterexamples that reveal real invariant failures.
Generalization & Repair: Classify errors, mutate invariants to block counterexamples, and select the best repair via blocking validation.

Motivating Examples

On how ExVerus iteratively repairs proofs through counterexample feedback.

Overall Pipeline
Generalization Details

How ExVerus generates, validates and blocks counterexamples. For details of step 3:CEX-guided proof repair, check Generalization Details.

Counterexample generation and validation workflow.

How ExVerus' CEX-guided proof repair works through Error Triangle, Counterexample-based mutation, and validation.

Counterexample generalization for proof repair visual.

Key Highlights

Counterexample‑Guided: Makes proof search iterative by generating, validating, and generalizing counterexamples.
Three‑Phase Repair: Initial proof → SMT counterexamples → invariant mutation.
Inductive Invariants: Generalizes counterexamples into invariants that block failures.
ObfsBench Robustness: 266 obfuscated tasks covering layout, data, and control‑flow changes.
Results: +45% over SOTA; 90% robust under obfuscation (vs. 33–50%); 70.2% counterexample‑blocking; ~75% lower tokens/time.

State-of-the-Art Results

ExVerus performs impressively across all evaluated benchmarks and base models, substantially surpassing previous SOTA LLM-based provers and strong iterative refinement baselines. It attains the highest verification success rates on VerusBench and DafnyBench (up to 95.5% with o4-mini) and solves significantly more tasks on HumanEval and LCBench, underscoring its superior generality and adaptability.

Exceptional Robustness

ExVerus demonstrates strong semantic resilience under deliberate code obfuscation. In the ObfsBench benchmark—where 266 verified Verus proofs were systematically transformed using layout, data, and control-flow obfuscations, 90% of ExVerus-verified proofs remain verifiable, compared to only 33% - 50% for AutoVerus.

Superior Efficiency

ExVerus significantly reduces token consumption compared to prior methods: around 75% fewer tokens than AutoVerus on DeepSeek and around 25% fewer tokens than AutoVerus on GPT-4o.

BibTeX

@inproceedings{
yang2026exverus,
title={ExVerus: Verus Proof Repair via Counterexample Reasoning},
author={Jun Yang and Yuechun Sun and Yi Wu and Rodrigo Caridad and Yongwei Yuan and Jianan Yao and Shan Lu and Kexin Pei},
booktitle={Forty-third International Conference on Machine Learning},
year={2026},
url={https://openreview.net/forum?id=FNDkXA0OUJ}
}