Tstregex - A Hybrid Regex Diagnostic Tool (single file Library module and command tool)
shows the longest Regular Expression match / highlight the rejected part
Example:
# Above, the normal parts are the longest matching substring when bold parts highlight the rejected substring
$tstregex 'regex' string1 string2 ... stringN
show that help..
shows key info on (un)matching..
Triggers the Enriched Diagnostic View. It displays: - The string with the failing part highlighted. - The exact token in the regex that caused the break. - A visual pointer (^--- HERE) aligned with the regex syntax. - Execution time (useful for spotting ReDoS/Exponential backtracking).
Misc: performs a huge test suite various a large collection of regexp tests with Tstregex..
use Tstregex;
my $ctx = tstregex_init_desc('/^\d{3}/');
tstregex($ctx, '12a');
if (!tstregex_is_full_match($res))
{
my $token = tstregex_get_fail_token($res);
my $pos = tstregex_get_match_len($res);
print "Failure on token '$token' at column $pos\n";
}
Pre-parses the regex, handles delimiters (m!!, //, etc.), extracts modifiers (i, s, m, x), and prepares the nibbling steps. Returns a context hash.
Executes the diagnostic. Updates the context.
Returns match status of input string (BOOL 0 OR 1)
Returns the matching portion in case of full match (might be smaller than input string, depending on anchors..)
Returns the matching substring length
Returns the failing token in the regexp
Returns the matching regexp subpart
Returns the internal representation of the regexp
Returns the offset of the original regexp in the raw regexp
tstregex is designed to solve the "Black Box" problem of Regular Expressions. When a complex regex fails, Perl usually just says "No Match". This tool identifies exactly where and why it failed by finding the longest possible partial match.
$ perl lib/Tstregex.pm '/^[a-z]*\d{3}$/' 'abc123' 'abc12a'
abc123
abcB<12a> (B<^[a-z]*>\d{3}$)
The tool highlights the part of the string where the match failed.
The diagnostic logic uses a "Nibbling" (grignotage) strategy:
The engine breaks down your regex into a hierarchy of valid sub-patterns (lexical groups, atoms, and quantifiers) from longest to shortest.
It iteratively tests these sub-patterns against the input string. It's not just checking if the start matches, but what is the maximum sequence of instructions the engine could follow before hitting a wall.
Once the longest matching sub-pattern is found, the tool identifies the very next token in your regex syntax. This is your "Point of Failure".
Olivier Delouya - 2026
Artistic Version 2
Hey! The above document had some coding errors, which are explained below:
Unterminated C<...> sequence