Submitted:
20 April 2025
Posted:
22 April 2025
You are already at the latest version
Abstract
Keywords:
1. Introduction
- High-performance JSON parser: We implement a JSON parser that leverages advanced LPeg optimizations to achieve performance comparable to hand-optimized Lua parsers. This case study showcases techniques including table construction optimization and substitution capture.
- Sophisticated Glob-to-LPeg converter: We develop a converter that transforms Glob patterns into equivalent LPeg patterns, showcasing LPeg’s flexibility in handling complex pattern matching scenarios. This study illustrates how clear separation of pattern matching rules and strategic optimization choices can lead to both performant and maintainable code.
2. Related Work
2.1. Overview of LPeg
- Pattern Matching: LPeg provides a concise and expressive syntax for defining patterns, enabling powerful and flexible pattern matching within Lua.
-
Captures: LPeg allows capturing and extracting specific parts of the input during parsing, enabling the construction of structured representations of the parsed data, such as abstract syntax trees (ASTs) or Lua tables. Among its various capture types:
- Group captures let you collect multiple related matches as a single unit
- Match-time captures provide dynamic control over the matching process by evaluating a function at runtime that can determine whether and how the match should proceed
- Accumulator captures modify previous captured values by applying a function that combines the last capture with the current one.
- Grammar Definition: LPeg grammars are defined using a set of rules, each associating a name with a parsing expression. These rules can be recursive, allowing the specification of complex language constructs. However, LPeg does not support left recursion and will detect and report such patterns as errors.
- Efficiency: LPeg is designed with performance in mind, featuring a fast pattern matching engine implemented in C. It employs various optimization techniques, such as special purpose instructions, tail call optimization and head-fail optimization, to minimize parsing overhead.
2.2. JSON Parsers
- dkjson: A pure Lua implementation of a JSON parser that supports parsing using LPeg.
- cjson: A high-performance Lua-C library specifically designed for fast JSON parsing and generation.
- rxi_json: A lightweight JSON library for Lua that achieves very fast parsing performance in pure Lua.1
2.3. Glob Matchers
3. Notation and Conventions
- Repetition: p^n (n times), p^+n ( times), p^-n ( times)
-
Capture:
- –
- Simple capture: { p }
- –
- Anonymous group capture: {: p :}
- –
- Named group capture: {:name: p :} (where name is the name of the capture group)
- –
- Table capture: {| p |}
- –
- Constant/predefined capture: p -> function/query/string
- –
- Match-time capture: p => function
- –
- Accumulator capture: e >> function
- –
- Substitution capture: {∼ p ∼}
3.1. Symbols
- p denotes a pattern, which depending on context may refer to a PEG pattern or a Glob pattern
- s refers to the starting rule in a grammar, particularly when a grammar has only one rule
- denotes the PEG transformation of a Glob pattern p, resulting in an equivalent PEG expression
4. Case Studies
4.1. Case Study: JSON Parser - A Concise Example
![]() |
| Listing 1. PEG grammar for JSON |


4.1.1. Optimizing Whitespace Handling


4.1.2. Accumulator vs Function-Based Object Construction




4.1.3. Substitution Capture


4.2. Case Study: Glob-to-LPeg Converter - An In-Depth Analysis
4.2.1. Overview of Glob Grammar
- * to match one or more characters in a path segment
- ? to match on one character in a path segment
- ** to match any number of path segments, including none
- {} to group conditions (e.g. *.ts,js matches all TypeScript and JavaScript files)
- [] to declare a range of characters to match in a path segment (e.g., example.[0-9] to match on example.0, example.1, …)
- [!...] to negate a range of characters to match in a path segment (e.g., example.[!0-9] to match on example.a, example.b, but not example.0)
- A Glob pattern must match an entire path, with partial matches considered failures.
- The pattern only determines success or failure, without specifying which parts correspond to which characters.
- A path segment is the portion of a path between two adjacent path separators (/), or between the start/end of the path and the nearest separator.
- The ** (globstar) pattern matches zero or more path segments, including intervening separators (/). Within pattern strings, ** must be delimited by path separators (/) or pattern boundaries and cannot be adjacent to any characters other than /. If ** is not the final element, it must be followed by /.
- {} (braced conditions) contains valid Glob patterns as branches, separated by commas. Commas are exclusively used for separating branches and cannot appear within a branch for any other purpose. Nested {} structures are allowed, but {} must contain at least two branches—zero or one branch is not permitted.
- In [] or [!...], a character range consists of character intervals (e.g., a-z) or individual characters (e.g., w). A range including / won’t match that character.
4.2.2. Pattern Analysis and Implementation Strategy
- Fixed-length patterns: Including string literals, single character wildcards (?), and character classes ([])
- Variable-length patterns: Including star (*) and globstar (**)
Segment-Level Pattern Matching




Cross-Segment Pattern Matching
![]() |
| Listing 2. The topmost rules of our Glob-to-LPeg converter, Peglob. The captures are removed to present a clear grammar. DSeg and DSEnd rules are used to process globstars. |
4.2.3. Braced Conditions and Expansion


Corner Cases with Star and Globstar
- Scenario 1: ...*{*/p,...}, where expansion results in a star before and after the brace merging into a globstar.
- Scenario 2: ...**{*p,...}, where the globstar takes precedence over the star, converting into three tokens: **, *, and p.
- Scenario 3: ...**{**/p}, which, regardless of expansion, results in two globstars followed by /p.
- Scenario 4: ...q{/**/p,...}, where q is a Word not ending in a star or globstar. This scenario expands into a valid pattern (q/**/p).
- By keeping the content before the first brace unchanged and expanding the content afterward, star merging is prevented. While this approach may differ from some other Glob matchers, such patterns are rare in practice.
- Scenarios involving globstar precedence are avoided by our grammar constraints, as they would produce invalid patterns according to our globstar positioning rules.
- Double globstar cases, while syntactically valid, are semantically meaningless due to our constraints on globstar behavior. The system correctly identifies and rejects such patterns.
- Valid combinations with path separators match correctly, maintaining consistent matching semantics even with variable-length content.
Performance Optimization for Braced Conditions
- Constraint 1: After the pattern is brace expanded, no branch within the braced condition contains / or **.
- Constraint 2: The "tail" (all characters after the braced condition) has a prefix ending with /. This prefix, excluding its last character, contains neither /, {, nor **.
- represents the corresponding PEG of the Glob pattern p
- converts to PEG and transforms EOF matches into &’/’ lookahead predicates
- A is the fixed-length pattern before the grouping condition
- are the expanded branches
- T is the tail string
- P is the prefix of the tail string meeting the constraints, with the last / character removed
- Q is the remainder of T after removing P
- No branches contain / or **, so grouping conditions don’t cross segment boundaries
- Constraint 2 ensures P doesn’t cross segment boundaries
- remains within a segment
- The successful match of Q is independent of branch selection
- At least one matches the character before the first /

5. Evaluation
5.1. JSON Parser Performance
5.2. Glob-to-LPeg Converter Evaluation
- Removed tests with invalid surrogate pairs that would trigger exceptions in Lua’s string handling
- Excluded bash-specific extension tests that fall outside our formal Glob grammar specification
- Accuracy:
- Precision:
![]() |
| Listing 3. Patterns used to benchmark Glob matchers |
6. Conclusion
7. Acknowledgement
Declaration of Generative AI in Scientific Writing
References
- Ierusalimschy, R. A text pattern-matching tool based on Parsing Expression Grammars. Software: Practice and Experience 2009, 39, 221–258. [Google Scholar]
- Medeiros, S.; Ierusalimschy, R. A parsing machine for PEGs. In Proceedings of the Proceedings of the 2008 symposium on Dynamic languages, 2008, pp. 1–12.
- Ford, B. Parsing expression grammars: a recognition-based syntactic foundation. In Proceedings of the Proceedings of the 31st ACM SIGPLAN-SIGACT symposium on Principles of programming languages, 2004, pp. 111–122.
- Linux man-pages project. glob - globbing pathnames. The Linux Documentation Project, 2023. Accessed: 2023-10-05.
- Manura, D. lua-glob-pattern: Converts file glob string to Lua pattern string. Available online: https://github.com/davidm/lua-glob-pattern.
- Sumneko. lua-glob. Available online: https://github.com/sumneko/lua-glob.
- NeoVim Contributors. NeoVim’s Glob Implementation. Available online: https://github.com/neovim/neovim/blob/f8cbdbb/runtime/lua/vim/glob.lua.
- Microsoft Language Server Protocol. Language Server Protocol Specification. Available online: https://microsoft.github.io/language-server-protocol/specifications/lsp/3.17/specification/#documentFilter.
- Ecma International. The JSON Data Interchange Syntax. Available online: https://ecma-international.org/publications-and-standards/standards/ecma-404/. 2nd edition, December 2017.
- Yedidia, Z. Incremental PEG Parsing. Bachelor’s thesis, Harvard University, Cambridge, Massachusetts, 2021.
- Pall, M. LuaJIT Extensions: table.new. Available online: https://luajit.org/extensions.html#table_new.
- Cox, R. Glob Matching Can Be Simple And Fast Too. Available online: https://research.swtch.com/glob.
- Rosetta Code. Brace Expansion. Available online: https://rosettacode.org/wiki/Brace_expansion?oldid=366904.
- Medeiros, S.; Mascarenhas, F.; Ierusalimschy, R. From regular expressions to parsing expression grammars. In Proceedings of the Brazilian Symposium on Programming Languages; 2011. [Google Scholar]
- Langdale, G.; Lemire, D. Parsing gigabytes of JSON per second. The VLDB Journal 2019, 28, 941–960. [Google Scholar]
- Bun Team. bun.Glob class | API reference. Available online: https://bun.sh/reference/bun/Glob.
| 1 | |
| 2 | LPeg.re official documentation: https://www.inf.puc-rio.br/~roberto/lpeg/re.html
|
| 3 | The fast_merge alias is used to better emphasizing the intention of using this function. |
| 4 | Even with Section 4.2.3.2 optimizations, we consume at least to the segment’s end |
| 5 | Using Bun v1.2.4-canary.13+6aa62fe4b which includes the skip_brace bugfix after Bun 1.2.3 that ported the Rust crate fast-glob. |
| 6 | Using Minimatch 10.0.1, run in Bun environment for consistent comparison. |
| 7 | Github Issue: Inconsistent Glob Pattern Matching Results in Bun 1.2.2 and 1.2.3-canary. https://github.com/oven-sh/bun/issues/17512
|







Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).


