Skip to content

linuxhd0/bqnpcre

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

bqnpcre

BQN FFI bindings for PCRE2 (Perl Compatible Regular Expressions).

bqnpcre provides a BQN interface to the PCRE2 regular expression library via •FFI. It exposes five functions: Search, Replace, Split, FindAll, and Compile.

Requirements

  • CBQN — the BQN implementation with •FFI support
  • PCRE2: libpcre2-8-dev (Debian/Ubuntu) or pcre2-devel (Fedora/RHEL)
  • A C compiler (gcc or clang)

Building

make        # produces libbqnpcre.so
make test   # build and run tests
make clean  # remove build artifacts

Installation

bqnpcre is designed to be used in-place or as a git submodule. There is no system install step.

In-place use

Clone the repo and build:

git clone https://github.com/linuxhd0/bqnpcre
cd bqnpcre
make

Import from your BQN program using the path to bqn/pcre2.bqn:

pcre•Import "/path/to/bqnpcre/bqn/pcre2.bqn"

As a git submodule

git submodule add https://github.com/linuxhd0/bqnpcre vendor/bqnpcre
cd vendor/bqnpcre && make && cd ../..
pcre•Import "vendor/bqnpcre/bqn/pcre2.bqn"

Custom .so location

The first line of bqn/pcre2.bqn sets the path to the shared library:

lib"../libbqnpcre.so"

This path is resolved relative to bqn/pcre2.bqn itself (not the working directory), so it works correctly regardless of where your program runs. If you move libbqnpcre.so to a different location, update this line. For example, to use an absolute path:

lib"/usr/local/lib/libbqnpcre.so"

Usage

pcre•Import "bqn/pcre2.bqn"

# Search - one-shot (compiles, matches, frees automatically)
# opts may be omitted: plain string or ⟨pattern, opts⟩ both work
   result"(a+)b" pcre.Search "aaab"
   result
┌─            
╵ "aaab" 0 4  
  "aaa"  0 3result"aaab" 0 4˘result"aaab" "aaa"1˘result0 02˘result4 3# Replace - one-shot
# opts may be omitted: ⟨pattern, replacement⟩ or ⟨pattern, replacement, opts⟩"hello", "world"pcre.Replace "hello world"
"world world"

# Split - one-shot
   "," pcre.Split "a,b,c""a" "b" "c"# FindAll - one-shot
# Returns a list of n×3 arrays, one per match, to preserve per-match grouping
# when capture groups are present. Use ∾ to get a flat array when there are no
# capture groups.
   fa"a+" pcre.FindAll "aabaa"
   fa
┌─          
╵ "aa" 0 21fa
┌─          
╵ "aa" 3 5fa
┌─          
╵ "aa" 0 2  
  "aa" 3 5# FindAll with capture groups - list of n×3 arrays preserves per-match grouping
   fa2"([0-9]+)-([0-9]+)" pcre.FindAll "2024-03 and 2025-07"
   fa2
⟨ ┌─                ┌─                  ⟩
  ╵ "2024-03" 0 7"2025-07" 12 19   
    "2024"    0 4     "2025"    12 16   
    "03"      5 7     "07"      17 19   
                  ┘                   ┘
# Full match string from each match
   (⊑⊏˘)¨fa2"2024-03" "2025-07"# Capture strings only (drop group 0 row)
   {0˘1𝕩}¨fa2
⟨ ⟨ "2024" "03" ⟩ ⟨ "2025" "07" ⟩ ⟩

# Compile a pattern for reuse - returns a namespace
# opts may be omitted: plain string or ⟨pattern, opts⟩ both work
   patpcre.Compile "(a+)b"
   pat.ngroups
2

# Match with compiled pattern
   resultpat.Match "aaab"
   result
┌─            
╵ "aaab" 0 4  
  "aaa"  0 3# FindAll with compiled pattern
   pat_apcre.Compile "a+"
   pat_a.FindAll "aabaa"
⟨ ┌─          ┌─          ⟩
  ╵ "aa" 0 2"aa" 3 5  
             ┘            ┘

# Split with compiled pattern
   pat_cpcre.Compile ","
   pat_c.Split "a,b,c""a" "b" "c"# Replace with compiled pattern (𝕨=replacement, 𝕩=subject)
   pat_hpcre.Compile "hello"
   "world" pat_h.Replace "hello world"
"world world"

# Free compiled pattern when done (caller is responsible)
   pat.Free @
   pat_a.Free @
   pat_c.Free @
   pat_h.Free @

# Named capture groups
   patpcre.Compile "(?P<year>[0-9]+)-(?P<month>[0-9]+)"
   namespat.names
   names
┌─           
╵ "month" 2  
  "year"  1˘names"month" "year"1˘names2 1# Look up a named group and extract its match value
   resultpat.Match "2024-03"
   year_num1(((˘names)"year"⟩)names)
   year_num⊑⊏˘result
"2024"
   pat.Free @

Unicode

BQN strings are UTF-32 internally. bqnpcre transparently encodes all string arguments to UTF-8 before passing them to PCRE2, and decodes results back to BQN strings. All match offsets (start and end) are codepoint offsets, not byte offsets.

Literal Unicode characters in patterns and subjects work automatically:

   "" pcre.Search "a⌊b"
┌─         
╵ "" 1 2  
          ┘
   ⟨"",""pcre.Replace "a⌊b"
"a★b"

For patterns that use Unicode-aware constructs (character classes, dot, quantifiers applied to multi-byte characters), pass the "u" option to enable PCRE2's UTF mode:

"[⌊★]","u"pcre.FindAll "a⌊b★c"
⟨ ┌─         ┌─         ⟩
  ╵ "" 1 2"" 3 4  
            ┘           ┘

API

pcre.Compile pattern or pcre.Compile ⟨pattern, options⟩

Compiles a regex pattern for reuse. Returns a namespace. The caller is responsible for freeing the compiled code by calling pat.Free @ when done.

  • pattern: Regex pattern string
  • options: Options string (case-insensitive); omit to use no options:
    • i — Case insensitive
    • m — Multiline mode
    • s — Dotall (. matches newlines)
    • x — Extended mode
    • u — UTF mode
    • a — Anchored (match only at start of subject)

The returned namespace has the following fields:

Field Description
pattern Pattern string
opts Options string
ngroups Number of groups including group 0
names Named groups: rank-2 array, each row is name‿group_number, alphabetical order; 0‿2⥊@ if none
Match pat.Match subject — same return as pcre.Search
FindAll pat.FindAll subject — same return as pcre.FindAll
Split pat.Split subject — same return as pcre.Split
Replace repl pat.Replace subject — same return as pcre.Replace
Free pat.Free @ — frees the compiled code

pcre.Search pattern subject or pcre.Search ⟨pattern, options⟩ subject

One-shot search: compiles, matches, and frees the compiled code automatically.

Returns the same n×3 array as Match. On no match, returns @. On error, throws.

pcre.FindAll pattern subject or pcre.FindAll ⟨pattern, opts⟩ subject

Find all non-overlapping matches of pattern in subject.

  • pattern: Regex pattern string
  • opts: Options string (same flags as Compile); omit to use no options
  • subject: String to search

Returns a list of n×3 arrays, one per match, with the same row structure as Match. Offsets are absolute positions in the original subject. Each match is a separate array to preserve per-match grouping when capture groups are present; use to get a flat array when there are no capture groups. Returns ⟨⟩ if no matches. On error, throws.

pcre.Split pattern subject or pcre.Split ⟨pattern, opts⟩ subject

Split subject on all occurrences of pattern.

  • pattern: Regex pattern string
  • opts: Options string (same flags as Compile); omit to use no options
  • subject: String to split

Returns a list of strings. On no match, returns a list containing the whole subject unchanged. On error, throws.

pcre.Replace ⟨pattern, replacement⟩ subject or pcre.Replace ⟨pattern, replacement, opts⟩ subject

Replace all occurrences.

  • pattern: Regex pattern string
  • replacement: Replacement string
  • opts: Options string (same flags as Compile); omit to use no options
  • subject: String to replace in

Returns the result string. On no match, returns the subject unchanged. On error, throws.

Limits

Passing a non-string value (number, nested array, etc.) as any string argument (pattern, options, subject, or replacement) throws "PCRE2: expected a string argument".

Files

  • src/bqnpcre.c — C wrapper around PCRE2
  • src/bqnpcre.h — Header file
  • bqn/pcre2.bqn — BQN FFI module
  • Makefile — Build system
  • test/test.bqn — Test suite

Dependencies

About

BQN FFI bindings for PCRE2 (Perl Compatible Regular Expressions).

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors