Solana
Arbitrage
$124.50
Just now
Ethereum
Sandwich
$840.12
2s ago
BNB
Liquidator
$45.20
5s ago
Base
Arbitrage
$12.05
8s ago
Solana
Jito Bundle
$310.00
12s ago
Polygon
Arbitrage
$8.45
15s ago
Solana
Arbitrage
$124.50
Just now
Ethereum
Sandwich
$840.12
2s ago
BNB
Liquidator
$45.20
5s ago
Base
Arbitrage
$12.05
8s ago
Solana
Jito Bundle
$310.00
12s ago
Polygon
Arbitrage
$8.45
15s ago
InfraAwareness stage⏱ 5 min read

How to Backtest a MEV Strategy Before Going Live in 2026

**Answer first** — A useful MEV backtest replays real historical blocks via a forked node, runs your strategy code against the same conditions a live searcher faced, and reports pe

Backtest pipeline showing historical block replay against strategy code
FR
FRB TeamMEV Specialists
Last updated
#MEV#Backtesting#Strategy#Engineering#Simulation

Answer first — A useful MEV backtest replays real historical blocks via a forked node, runs your strategy code against the same conditions a live searcher faced, and reports per-attempt P&L net of gas. Anything less — a P&L spreadsheet, a price-only sim, an "average opportunity" model — produces fantasy returns that vanish in production. Spend at least 2 weeks backtesting before live capital. The cost is cheap RPC calls; the alternative is real money.

Why Most MEV "Backtests" Are Useless

The five most common backtest illusions:

  1. Optimal-fill assumption. "I would have caught every arb" — no, you would have lost ~70% to faster searchers.
  2. No gas modelling. Counts gross profit, ignores per-block gas regime.
  3. Static pool state. Assumes pool depth/price at one block holds for adjacent ones.
  4. No latency penalty. Treats your strategy as instantaneous against a real-time stream.
  5. Survivorship bias. Backtests only over months you remember being good.

All five overstate returns by 3–10x. A live deployment that "should have made $50k/month" routinely makes $4k or zero.

The Three Tiers of Backtest

Tier 1 — Spreadsheet replay (don't trust this) Pull historical opportunity data, multiply by capture rate. Useful for napkin sizing, not for go/no-go.

Tier 2 — Fork simulation per opportunity (acceptable) Replay each opportunity by forking a Reth or Geth node at the relevant block and simulating your tx. Captures gas, slippage, revert paths.

Tier 3 — Continuous replay with latency model (the real thing) Replay a continuous block range, simulate your strategy executing in parallel with historical competitor txs, model latency-loss and inclusion probability. Outputs realistic per-fill capture rate.

Most institutional MEV firms run Tier 3. FRB Agent ships a Tier 3 replay engine; configure your strategy and pick a block range.

The Tier 3 Loop, Explained

For each historical block in [start, end]:
  fork = fork_node(block_number = block - 1)
  state = fork.get_pool_state(target_pools)
  competitor_txs = block.transactions  // what actually happened
  for each potential opportunity in state:
    your_tx = your_strategy.build(state)
    if your_tx is None: continue
    landing_block = simulate_inclusion(
      your_tx,
      competitor_txs,
      latency_ms = your_measured_latency,
      bid = your_bid_function(opportunity)
    )
    if landing_block:
      pnl = simulate_pnl(fork, your_tx, landing_block)
      log(pnl)

The loop runs against weeks or months of history. Output is a P&L distribution, not an average.

Picking the Right Block Range

Backtest at minimum:

  • 8 weeks of recent history. Recent enough that competitive landscape and pool depth are similar.
  • Both bull and chop regimes. A pure-bull backtest is misleading.
  • All days of the week. Weekend MEV is different from weekday.

Avoid:

  • Short, hand-picked ranges that "look good"
  • Periods spanning major market structure changes (e.g. a fork or a major DEX deployment)
  • Extreme volatility weeks unless those are your target regime

Latency Calibration

This is the most-skipped step. Your real latency = mempool_observe_to_signed_tx + network_to_relay + relay_to_proposer. Measure all three:

  1. Observe-to-sign: timestamp from WSS pending event to your signed tx. Typical: 8–40ms.
  2. Network-to-relay: ping your relay endpoints. Typical: 4–60ms.
  3. Relay-to-proposer: out of your control, ~20–60ms.

Total: 30–160ms in 2026. Use your measured number in the simulator, not a hopeful one.

Gas Regime Modelling

Gas during your backtest period was not the gas during the next 8 weeks of live. Solutions:

  1. Use block-actual gas prices for each replayed block. Backtest reflects historical regime.
  2. Stress test by inflating gas 1.5x, 2x, 3x to model unfavorable regimes.
  3. Stress test bid quantile shifts (your bid moves from 60th to 75th percentile inclusion).

If your strategy holds up at 2x gas regime, it has margin.

Simulating Failed Attempts

Real searchers experience:

  • Bundle reverts (other tx in bundle changes pool state mid-block)
  • Slippage breaches (cap protected you from a profitable but risky fill)
  • Unselected bundles (lost the auction)

Backtest must include these failure modes. Bid simulation should resolve to:

  • Inclusion: bid > marginal
  • Loss to competitor: simulated competitor bid > yours
  • Self-revert: state changed mid-block

A backtest that shows 100% inclusion is broken.

What "Good" Output Looks Like

A well-run Tier 3 backtest produces:

  • P&L distribution histogram (not a single number)
  • Inclusion rate by opportunity size
  • Drawdown curve over the period
  • Latency sensitivity table (how much you'd lose at +20ms, +50ms)
  • Gas sensitivity table (how much you'd lose at +50% gas)
  • Sharpe-like ratio (mean return / stddev)

Walk away from any "backtest" that just shows a green line going up. That's marketing material, not a backtest.

Walk-Forward Validation

After backtest looks good, run walk-forward validation:

  1. Tune strategy on weeks 1–6.
  2. Lock parameters.
  3. Run on weeks 7–8 with no further tuning.
  4. Compare results.

If weeks 7–8 underperform weeks 1–6 by more than 30%, you've over-fitted. Adjust strategy abstractions, not strategy parameters.

From Backtest to Live: The Bridge

Move to live in three steps:

  1. Paper-trade in production — bot runs against live data, builds tx, but doesn't sign/submit. Compare paper P&L to live mempool outcomes.
  2. Tiny-capital live — 5–10% of intended bankroll, real submissions. Run for 7–14 days.
  3. Full deployment — once paper and tiny-capital metrics align with backtest, scale to target bankroll.

If paper or tiny-capital underperform backtest by >40%, do not scale. Diagnose first.

Common Diagnostic Failures

When live underperforms backtest, the typical causes (in order):

  1. Higher real latency than calibrated. Measure again.
  2. Competitor count grew. Backtest period had fewer searchers.
  3. Pool depth fell on your target pairs. Re-pick targets.
  4. New private order flow you can't see. Check builder/relay docs.
  5. Gas regime shift. Re-bid.

In our experience, 60%+ of underperformance traces to (1) and (2).

Tooling

In 2026, useful backtest tools:

  • Foundry with forge for fork simulation
  • Reth running in archive mode for historical state access
  • Anvil for fast forking
  • TheGraph for indexed historical pool state
  • FRB Agent's Replay Engine for end-to-end Tier 3 replay (built-in)

Cost of a Real Backtest

Realistic resource cost:

  • Archive RPC: $50–200/month for backtest-grade access (or self-host an archive node)
  • Compute: 2–8 vCPU + 32–64GB RAM
  • Storage: 4–12TB for full archive (less if using hosted)
  • Time: 4–24 hours per backtest run on a target chain

Budget $500–2k one-time for setup, $100–300/month for ongoing. Cheap relative to the loss prevention.

FAQ

Can I backtest without an archive node?

Limited. Hosted archives (Alchemy, QuickNode) work but get expensive at high request volume. For serious work, self-host a Reth archive node.

How long should my backtest period be?

Minimum 6 weeks; 12+ weeks is better. Less than 4 weeks is statistically meaningless.

Should I backtest on testnet?

No. Testnet has different competition, gas dynamics, and pool state. Backtest on mainnet history.

Will FRB Agent backtest for me?

Yes — for built-in strategy modules (atomic arb, liquidations, JIT, sniping). Custom strategies require running the replay engine yourself.

What if my backtest shows huge returns?

Be skeptical. Verify Tier 3 properties (latency model, gas modelling, failed-attempt simulation). Most "huge return" backtests are broken in one of those dimensions.


This article describes engineering practice. Backtest results never guarantee live results. Not financial advice.

Step after reading

Launch FRB dashboard

Connect your wallet, pair the node client with a 6-character PIN, and assign the contract mentioned above.

Need the signed build?

Download & verify FRB

Grab the latest installer, compare SHA‑256 to Releases, then follow the Safe start checklist.

Check Releases & SHA‑256

Related Articles

Further reading & tools

Discussion

No notes yet. Add the first observation, or share the link with your team on X (@MCFRB).

Leave a note
Notes are stored locally in your browser only.

Control the Pulse

Expand Your Execution

Maximize your edge by exploring the full FRB toolkit. From institutional-grade telemetry to ready-to-export strategy scripts.

CTA

Install FRB Agent

Download verified Windows binaries and check SHA-256.

CTA

Read Quick Start Docs

Share the 15-minute setup flow with ops & compliance.

CTA

Launch Control Panel

Pair node clients and monitor Ops Pulse in real-time.

Blog → App Bridge

Ready to deploy this strategy? Open the dashboard and monitor execution.

Ready to Evolve?

Take the Next Step

Whether you're verifying terminal security or launching your first bundle, the FRB journey starts here.

Recommended

Install FRB agent

Secure Windows build. Verified via SHA-256 for maximum integrity.

Recommended

Read Docs Quick Start

Master the setup in 15 minutes. From wallet pairing to first bundle.

Recommended

Launch /app dashboard

Monitor your Ops Pulse and manage transaction routes in real-time.