Summary of preg methods
Should you choose not to use Pattern
and other object-oriented functionalities of T-Regx,
you can continue to use preg
methods (preg_match()
, preg_replace()
, etc.). We recommend
using Pattern
as a standard solution, however preg::
is available as a legacy alternative.
The downside of PHP built-in preg_match()
functions is their interface, which is not really well-designed.
preg::
methods aim to be a reliable replacement.
preg_
methods#
Summary of Here's a simple summary and analysis of PHP built-in preg_
methods:
- Every
preg_
method acceptsstring $pattern
andstring $subject
. Some methods also accept an array of$pattern
and an array ofstring $subject
. This makes thepreg_
methods inconsistent. - All of
preg_
methods follow PHP duck-typing convention, so using incorrect types causespreg_
methods to silently cast them, making thepreg_
methods misbehaves in a very peculiar way. - Some methods return their values, and some populate it via
&$ref
argument. Other methods return amounts, and populate result via&$ref
instead. This is very counter-intuitive. - Some methods return
false
on error, and others returnnull
, instead of simply throwing an exception. preg_
methods actually have multiple ways of reacting to error: either by returning an "error value" (likefalse
,null
,0
,-1
or""
), or by issuing a PHP warning/notice/error, other methods behaves normally, but set the status code forpreg_last_error()
, yet another cases result in PHP Fatal Errors (which terminate the application), and recent updates of PHP actually throw errors on invalid arguments. This makespreg_
methods very hard to rely on what they will do for faulty input.preg_
functions have a large number of arguments, many have 5-6 arguments (most of which are optional). This makes an interface that is very complex.- Values populated via
&$match
inpreg_match()
behave differently than$match
frompreg_match_all()
and follows different rules and criteria, so one can't be always mapped to the other. - Argument
array $match
passed as argument frompreg_replace_callback()
behaves utterly differently than the previous two. preg_
functions accept magic values, for example passing-1
as$limit
in replacing (which is supposed to mean "no limit"). This is an influence from C-API that PHP uses internally, but shouldn't be exposed to the interface of PHP functions.- Additional functionality is exposed as C-style flags, like
PREG_SET_ORDER
,PREG_CAPTURE_OFFSET
, etc. for backwards-compatibility. It adds another level of complexity, which in fact isn't necessary. - Function
preg_match_all()
returns an array of arrays ofnull|string
, and when used withPREG_CAPTURE_OFFSET
, then it's an array of arrays of arrays ofnull|string|int
. - Matching methods (
preg_match()
andpreg_match_all()
) have a very hard time distinguishing an empty matched group and an unmatched group (because they're both returned as""
). There are ways to try and detect it, like passing flagPREG_CAPTURE_OFFSET
, but it has flaws of its one. Major flaw, is that it changes the type fromstring
to anarray
, with the exception if the match is not captured, the return values is still a string. SoPREG_CAPTURE_OFFSET
actually changes typestring
tostring|array
, because the result can still be astring
. preg_
methods can't distinguish an unmatched group from a nonexistent group, because the last group in the match, should it be unmatched, will not be present in the result (instead of being simply present like the remaining groups). This makes the interface less reliable and prone to "array undefined index" errors.- Different PHP versions react to different inputs in different ways. Error messages between PHP versions vary.
- Some method names are very unintuitive. For example
preg_filter()
appears as a method for filtering collections, but it actually replaces values. In fact, it's very similar topreg_replace()
, but the function names doesn't illustrate it at all. The function that actually filters an array by a regular expressions is calledpreg_grep()
, name borrowed from Unixgrep
command. - Different dialects of regular expressions are actually accepted, because PHP 7.4 uses PCRE2, whereas older versions use PCRE1.
- There is no standard way of using regular expressions with plain text. The only available method is
preg_quote()
, but it doesn't work at all with/x
modifier, as it doesn't escape whitespace and comment syntax (starting#
and ending newline), which makespreg_quote()
completely unreliable with extended mode. - The syntax of regular expressions and what is allowed in the expression changes between PHP versions.
Some flaws described above are solved by successor preg::match()
,preg::replace()
methods, but it's not always possible.
For example improving the interface of the methods is not really possible with preg::match()
. For that case Pattern
interface is available to solve all the other remaining issues and is the recommended approach.
#
About preg methods of T-RegxWhen T-Regx is added to a project, both methods can be used. The preferable method is using the standard Pattern
interface,
but preg::match()
remains available for legacy projects that are not ready to be migrated yet. Using Pattern
doesn't have anything to do with preg_
methods, so knowing them isn't required to use Pattern
.
Pattern
- the standard solution for regular expressionspreg::match()
and other methods - the wrapper on PHP functions (making thepreg_
methods throw exceptions).
Pattern
is the complete solution to using regular expressions in PHP. It solves all issues with preg_
methods above.
It's descriptive, simple and easy to learn. Pattern
uses exceptions, since warnings and errors are less reliable. It's
designed with care and dedication.
On the off-chance that using Pattern
is unwanted in a project (perhaps because migrating it would require too much effort),
preg::
alternative is available. preg::
functions' interface is very similar to preg_
functions' interface, so
adopting preg::
is simple and straightforward, and in return provides safety layer of protection, type-checking,
bugfixes and exceptions on preg_
methods. Once the migration is complete, a speedy eventual migration to Pattern
gives even more advantages.
Each preg_
method has a preg::
counter-part: preg_match()
-> preg::match()
, preg_replace()
-> preg::replace()
.
The preg::
counter-parts have a very similar interface (same arguments, same types, same names, similar behaviour).
The main difference between preg_
and preg::
functions is reacting to errors and faulty inputs. While preg_
methods
react in a number of different, inconsistent manners (returning null
/false
, issuing warning/notice, setting code for
preg_last_error()
), preg::
doesn't do any of those, and throws suitable exceptions instead. So while the return type
of preg_match()
is int|false
(since it can either return an integer, or false
on error) the return type of
preg::match()
is int
. preg::match()
never returns false
to indicate an error, because a suitable exception is thrown
in that case.
There are other advantages to using preg::
methods, for example bugfixes. PHP bugfixes are only applied to future PHP
versions. On the contrary, preg::
back-ports the bugfixes to earlier versions. That means, T-Regx can be used on PHP 7.1,
without being susceptible to the bug, that was only fixed in PHP 7.3. In fact, we believe each given T-Regx release will
behave exactly the same on all supported PHP versions. Of course there are changes between distinct T-Regx versions,
but each given T-Regx version should be agnostic to PHP version.
Another big advantage of using preg::
is the type-checking. preg_
methods will accept a wide range of types and then
silently cast them. preg::
will accept only the exact types, and throw PHP \InvalidArgumentException
instead. Passing
false
as $subject
is never a good idea. Callbacks passed to preg_replace_callback()
also perform silent type cast,
when the type isn't string
. Furthermore, passing improper values as a callback return value preg_replace_callback()
can result in a PHP fatal error, which terminates the application and can't be caught. preg::replace_callback()
type
checks return values, and allows only the allowed values, and throws an exception instead, preventing the fatal error.
Using preg::last_error()
is redundant, because preg::
methods will always throw an exception on error, so there isn't
a need of ever using preg::last_error()
.
With preg_
methods, there really isn't a good way to react to a malformed pattern. preg_match("/?/")
doesn't set
preg_last_error()
code. Granted, the case of using improper regular expression isn't a particularly frequent use-case,
nevertheless preg::
throws a proper MalformedPatternException
for that case. Errors should never pass silently,
unless explicitly silenced, which is now possible with preg::
exceptions and try
/catch
for example.
preg::
methods really are#
What preg::
methods are wrapper functions for each preg_
methods, with specific improvements on top:
- Handling regex-compile errors, like malformed patterns, and throwing proper exceptions, for example
MalformedPatternException
- Handling regex-runtime errors, like catastrophic backtracking and throwing proper
CatastrophicBacktrackingPregException
exception - Performing type-checks on input arguments and return values from callbacks
- Applies bugfixes to
preg_
methods from future PHP versions (or even bugfixes not yet applied to PHP). - Preventing fatal errors
Read on, to learn about advantages of using Pattern
, which supersedes preg_
/preg::
approach.
preg::
#
Error handling with With PHP preg_
methods, multiple preg_
methods used together, for instance calling preg_match()
,
preg_replace()
, and preg_split()
right after each other, may render the error handling really tricky.
T-Regx can always narrow down the error to the exact method to one particular call (even nested ones,
like malformed preg call inside preg_replace_callback()
):
In short, preg::
can isolate the preg call to a single method call and not influence each other.
Pattern
really is#
What Pattern
is a completely redesigned solution to using regular expressions in PHP. Software developer
working with Pattern
shouldn't know anything about preg_match()
,preg_replace()
, etc. because built-in
PHP preg_
methods are now nothing but an implementation detail. The interface of Pattern
is hermetic,
no prior knowledge of preg_
is necessary.
We meet with comments, such that Pattern
is to preg_match()
what PDO
was to mysql_query()
.
We don't exactly disagree.