Why PHP sucks?
If you'd like to learn the reasons behind certain T-Regx feature, and know how it manages to supersede PHP regular expressions, read on.
#
What's wrong with PHP Regular Expressions:PHP regular expressions API is far from perfect. Here's only a handful of what's wrong with it:
#
PHP is ImplicitYou are probably a PHP developer. I would like to get 'Robert likes apples'
. Can you tell me which
is the correct signature for this task?
#
PHP is UnintuitiveProgramming languages are tools created to solve problems. An experienced programmer should be able to look at the code and tell what it does.
- Whole set of regular expressions with PHP throws all kinds of notices, warnings, errors and fatal errors, as well as silently ignoring invalid data.
- Matching API has two functions:
preg_match()
(first) orpreg_match_all()
. - Replacing API has four functions:
preg_replace()
,preg_replace_callback()
,preg_replace_callback_array()
andpreg_filter()
. preg_replace()
and other replacing functions have two optionalint
parameters, and I never know which is$limit
and which is&$count
.- Function which does replacing is named
preg_filter()
. - Matching returns an array of arrays, which contain either a
string
,null
, or an array ofnull
s,strings
andint
s. What type exactly is returned depends on the runtime subject and the order of the values. - Functions with 4, 5, 6 parameters (3-4 of which are optional).
#
PHP is MessyPREG_OFFSET_CAPTURE
is a nightmare! It changes return type from "an array of arrays" to "an array of arrays of arrays".PREG_SET_ORDER
/PREG_PATTERN_ORDER
change return values. It's either "groups of matches" or "matches of groups", depending on the flag.
The worst part? You find yourself looking at this code:
having no idea what. it. does. You have to see whether you're using preg_match()
or preg_match_all()
and
whether any of PREG_SET_ORDER
/PREG_PATTERN_ORDER
/PREG_OFFSET_CAPTURE
were used.
And to refactor it, later? Replace $match[1]
with array_map($match, ...)
. Good luck. With that.
#
PHP is InconsistentMatches returned from
preg_match()
,preg_match_all()
andpreg_replace_callback()
each have completely different structures and each has own magic values and rules. So when you, say, changepreg_match()
topreg_match_all()
, there's a high chance you'll break something.For example,
""
forpreg_match()
means "maybe matched empty string, maybe unmatched", but forpreg_match_all()
it means "definitely not matched".Flag
PREG_UNMATCHED_AS_NULL
works forpreg_match()
/preg_match_all()
, but not for replacing.How do you get results and the count of the results?
Value preg_match()
preg_replace()
Count Return type Argument reference Values Argument reference Return type If you use
PREG_OFFSET_CAPTURE
and your subject isn't matched with the pattern; these are the results:Success preg_match()
preg_match_all()
true
['match', 2]
['match', 2']
false
''
[null, -1]
preg_quote()
quotes different characters for different PHP versions.preg_match()
signature states it returnsint
, but it returnsfalse
on error.PHP documentation promises that
preg_filter()
is identical topreg_replace()
except it only returns the (possibly transformed) subjects...but
preg_filter()
andpreg_replace()
actually return completely different values for the same parameters.
#
PHP is Deliberately buggypreg_match()
andpreg_match_all()
return either:(int) x
- a number of matches, if a match is found(int) 0
- if no matches are found(bool) false
- if a runtime error occurred
So if you do just this:
there's no way of knowing whether your pattern is incorrect or whether it's correct, but your subject isn't matched by your pattern.
You need to remember to add an explicit
!== false
check each time you use it.All
preg_*
functions only returnfalse
/null
/[]
on error. You have to remember to callpreg_last_error()
to get some insight in the nature of your error. Of course, it only returnsint
! So you have to look up that4
is "invalid utf8 sequence" and2
is "backtrack limit exceeded".However,
false
-check andpreg_last_error()
can only save you from runtime errors. So called compile errors don't work that way and require either setting a custom error handler (bad idea) or read and clear just one of those errors (good luck with errors inpreg_replace_callback()
for example).preg_filter()
for arrays returns[]
if an error occurred; even though[]
is the perfectly valid result for this function. For example, it could have filtered out all values or its input was an empty array right from the beginning.For certain parameter types, some PCRE methods (e.g.
preg_filter()
) raise fatal errors terminating the application.preg_quote()
completely ignores whitespace, which should be quoted when used withx
flag.
#
PHP silently ignores invalid argumentspreg_match()
called with negative offset is simply ignored.preg_match()
called with offset longer than the subject changes nothing, andpreg_last_error()
returnsPREG_INTERNAL_ERROR
code.preg_quote()
accepts a single character as the second parameter, and simply ignores any longer string.
#
T-Regx showcaseThat's why T-Regx happened. It addresses all of PHP regular expressions flaws.
#
T-Regx eliminates gotcha'sPHP PCRE API is full of false negatives and false positives. For example, missing group in preg_match()
doesn't
necessarily mean the group doesn't exist or wasn't matched. It's just a "gotcha" set for you by PHP.
T-Regx performs all the necessary if
ology and checks to verify that methods that return true
and false
are really
true or false. If T-Regx can't eliminate false-negatives or false-negatives, its API simply doesn't include a method to verify that.
If, because of reasons, there isn't a way to determine something with absolute certainty (like the index of a group with J
modifier),
then T-Regx API simply doesn't have index()
method for usingDuplicateName().group()
.
#
T-Regx maps warnings and errors to exceptionsIf you try to use an invalid regular expression in Java or JavaScript, you would probably get a SyntaxError
exception, so you'd be forced to handle it. Such things don't happen in PHP regular expressions.
T-Regx always throws an exception and never issues any warnings, fatal errors, errors or notices.
Furthermore, T-Regx throws different exceptions for different errors:
- SubjectNotMatchedException
- MalformedPatternException
- FlagNotAllowedException
- GroupNotMatchedException
- NonexistentGroupException
- InvalidReplacementException
- InvalidReturnValueException
- CatastrophicBacktrackingPregException
- RecursionLimitPregException
- Utf8OffsetPregException
They all extend PatternException
though.
Further, furthermore, if you pass an invalid data type to any of the T-Regx methods, \InvalidArgumentException
is thrown.
#
T-Regx is clean and simpleYou will not find arrays, of arrays, of arrays in T-Regx API. Each functionality has a dedicated set of methods.
#
T-Regx unifies the differences between matching and replacingMatching
Replacing:
Read more about Detail
.
#
T-Regx provides rich API for building patternsBecause of Pattern::inject()
, Pattern::list()
, Pattern::mask()
and Pattern::template()
there is never a need for using preg_quote()
yourself.
For example to build pattern with un-safe data, instead of building pattern with preg_quote()
, simply use:
#
T-Regx is really smart with its exceptionsWe really did put a lot of thoughts to make T-Regx secure, so for example these code snippets aren't a big deal:
In other words, warnings and flags raised by the inner pattern()->match()
invalid call will be represented as
MalformedPatternException
, and won't interfere with the outer pattern()->replace()
.