Why PHP sucks?
If you'd like to learn the reasons behind certain T-Regx feature, and know how it manages to supersede PHP regular expressions, read on.
What's wrong with PHP Regular Expressions:#
PHP regular expressions API is far from perfect. Here's only a handful of what's wrong with it:
PHP is Implicit#
You are probably a PHP developer. I would like to get 'Robert likes apples'. Can you tell me which
is the correct signature for this task?
PHP is Unintuitive#
Programming languages are tools created to solve problems. An experienced programmer should be able to look at the code and tell what it does.
- Whole set of regular expressions with PHP throws all kinds of notices, warnings, errors and fatal errors, as well as silently ignoring invalid data.
- Matching API has two functions:
preg_match()(first) orpreg_match_all(). - Replacing API has four functions:
preg_replace(),preg_replace_callback(),preg_replace_callback_array()andpreg_filter(). preg_replace()and other replacing functions have two optionalintparameters, and I never know which is$limitand which is&$count.- Function which does replacing is named
preg_filter(). - Matching returns an array of arrays, which contain either a
string,null, or an array ofnulls,stringsandints. What type exactly is returned depends on the runtime subject and the order of the values. - Functions with 4, 5, 6 parameters (3-4 of which are optional).
PHP is Messy#
PREG_OFFSET_CAPTUREis a nightmare! It changes return type from "an array of arrays" to "an array of arrays of arrays".PREG_SET_ORDER/PREG_PATTERN_ORDERchange return values. It's either "groups of matches" or "matches of groups", depending on the flag.
The worst part? You find yourself looking at this code:
having no idea what. it. does. You have to see whether you're using preg_match() or preg_match_all() and
whether any of PREG_SET_ORDER/PREG_PATTERN_ORDER/PREG_OFFSET_CAPTURE were used.
And to refactor it, later? Replace $match[1] with array_map($match, ...). Good luck. With that.
PHP is Inconsistent#
Matches returned from
preg_match(),preg_match_all()andpreg_replace_callback()each have completely different structures and each has own magic values and rules. So when you, say, changepreg_match()topreg_match_all(), there's a high chance you'll break something.For example,
""forpreg_match()means "maybe matched empty string, maybe unmatched", but forpreg_match_all()it means "definitely not matched".Flag
PREG_UNMATCHED_AS_NULLworks forpreg_match()/preg_match_all(), but not for replacing.How do you get results and the count of the results?
Value preg_match()preg_replace()Count Return type Argument reference Values Argument reference Return type If you use
PREG_OFFSET_CAPTUREand your subject isn't matched with the pattern; these are the results:Success preg_match()preg_match_all()true['match', 2]['match', 2']false''[null, -1]preg_quote()quotes different characters for different PHP versions.preg_match()signature states it returnsint, but it returnsfalseon error.PHP documentation promises that
preg_filter()is identical topreg_replace()except it only returns the (possibly transformed) subjects...but
preg_filter()andpreg_replace()actually return completely different values for the same parameters.
PHP is Deliberately buggy#
preg_match()andpreg_match_all()return either:(int) x- a number of matches, if a match is found(int) 0- if no matches are found(bool) false- if a runtime error occurred
So if you do just this:
there's no way of knowing whether your pattern is incorrect or whether it's correct, but your subject isn't matched by your pattern.
You need to remember to add an explicit
!== falsecheck each time you use it.All
preg_*functions only returnfalse/null/[]on error. You have to remember to callpreg_last_error()to get some insight in the nature of your error. Of course, it only returnsint! So you have to look up that4is "invalid utf8 sequence" and2is "backtrack limit exceeded".However,
false-check andpreg_last_error()can only save you from runtime errors. So called compile errors don't work that way and require either setting a custom error handler (bad idea) or read and clear just one of those errors (good luck with errors inpreg_replace_callback()for example).preg_filter()for arrays returns[]if an error occurred; even though[]is the perfectly valid result for this function. For example, it could have filtered out all values or its input was an empty array right from the beginning.For certain parameter types, some PCRE methods (e.g.
preg_filter()) raise fatal errors terminating the application.preg_quote()completely ignores whitespace, which should be quoted when used withxflag.
PHP silently ignores invalid arguments#
preg_match()called with negative offset is simply ignored.preg_match()called with offset longer than the subject changes nothing, andpreg_last_error()returnsPREG_INTERNAL_ERRORcode.preg_quote()accepts a single character as the second parameter, and simply ignores any longer string.
T-Regx showcase#
That's why T-Regx happened. It addresses all of PHP regular expressions flaws.
T-Regx eliminates gotcha's#
PHP PCRE API is full of false negatives and false positives. For example, missing group in preg_match() doesn't
necessarily mean the group doesn't exist or wasn't matched. It's just a "gotcha" set for you by PHP.
T-Regx performs all the necessary ifology and checks to verify that methods that return true and false are really
true or false. If T-Regx can't eliminate false-negatives or false-negatives, its API simply doesn't include a method to verify that.
If, because of reasons, there isn't a way to determine something with absolute certainty (like the index of a group with J modifier),
then T-Regx API simply doesn't have index() method for usingDuplicateName().group().
T-Regx maps warnings and errors to exceptions#
If you try to use an invalid regular expression in Java or JavaScript, you would probably get a SyntaxError
exception, so you'd be forced to handle it. Such things don't happen in PHP regular expressions.
T-Regx always throws an exception and never issues any warnings, fatal errors, errors or notices.
Furthermore, T-Regx throws different exceptions for different errors:
- SubjectNotMatchedException
- MalformedPatternException
- FlagNotAllowedException
- GroupNotMatchedException
- NonexistentGroupException
- InvalidReplacementException
- InvalidReturnValueException
- CatastrophicBacktrackingPregException
- RecursionLimitPregException
- Utf8OffsetPregException
They all extend PatternException though.
Further, furthermore, if you pass an invalid data type to any of the T-Regx methods, \InvalidArgumentException is thrown.
T-Regx is clean and simple#
You will not find arrays, of arrays, of arrays in T-Regx API. Each functionality has a dedicated set of methods.
T-Regx unifies the differences between matching and replacing#
Matching
Replacing:
Read more about Detail.
T-Regx provides rich API for building patterns#
Because of Pattern::inject(), Pattern::list(), Pattern::mask() and Pattern::template()
there is never a need for using preg_quote() yourself.
For example to build pattern with un-safe data, instead of building pattern with preg_quote(), simply use:
T-Regx is really smart with its exceptions#
We really did put a lot of thoughts to make T-Regx secure, so for example these code snippets aren't a big deal:
In other words, warnings and flags raised by the inner pattern()->match() invalid call will be represented as
MalformedPatternException, and won't interfere with the outer pattern()->replace().