Skip to content

emlis42/regex

 
 

Repository files navigation

Lean 4 Regex

A regular expression engine written in Lean 4.

This library is based on the Rust regex crate and extended for compatibility with PCRE2 (see Syntax).

Main restrictions:

  • focus on unicode data,
  • simple api and few configurations,
  • BoundedBacktracker as the single regex engine,
  • no optimizations.

Installation

Add the following dependency to lakefile.toml:

[[require]]
name = "Regex"
git = "https://github.com/bergmannjg/regex"

Documentation

The main documentation is in Regex.lean

Verification

Some parts of the regex compilation and execution are verified.

Example

Get captures of "∀ (n : Nat), 0 ≤ n" :

def Main : IO Unit := do
  let re := regex% r"^\p{Math}\s*.(?<=\()([a-z])[^,]+,\s*(\p{Nd})\s*(\p{Math})\s*\1$"
  let captures := Regex.captures "∀ (n : Nat), 0 ≤ n" re
  IO.println s!"{captures}"

Output is

fullMatch: '∀ (n : Nat), 0 ≤ n', 0, 22
groups: #[(some ('n', 5, 6)), (some ('0', 15, 16)), (some ('≤', 17, 20))]

Api

  • regex% : notation to build the regular expression at compile time
  • captures : searches for the first match of the regular expression

Components of regular expression:

  • \p{Math} : match all characters with the Math property
  • (?<=\() : lookbehind of char '('
  • (\p{Nd}) : capturing group of numeric characters
  • \1 : backreference to first capturing group

Test

The library is tested with the testdata of the PCRE2 library and the testdata of the Rust Regex crate.

The tests are run with

lake test

License

This project is licensed under the MIT license.

The data in Regex/Unicode/data/ is licensed under the Unicode License Agreement (LICENSE-UNICODE).

The .toml files in testdata/rust are licensed under the (LICENSE-RUST-REGEX).

About

A PCRE2 compatible regular expression engine written in Lean 4.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • Lean 100.0%