Skip to main content

Pattern helper functions

We have just created a new addon to the core T-Regx library: pattern-helpers. It's a set of global PHP functions, complementary to the T-Regx library.

Some users prefer a more direct approach to regular expressions, and object oriented approach with Pattern and Matcher can become too verbose at times.

The added package pattern-helpers includes functions such as: pattern_match_all(), pattern_replace(), pattern_replace_callback(), pattern_split(). These functions are much closer to the T-Regx philosophy and are supposed to supersede the preg:: functions from SafeRegex/ package. We'll be deprecating preg:: functions in the future releases, and it won't be part of T-Regx in 1.0.

Find the new package here: https://github.com/t-regx/pattern-helpers

Documentation in source code

We're working very hard on trying to bring T-Regx 1.0 to our users, and there is very little yet to be finished.

The biggest thing that's coming is documentation in the source code of the library in the form of PhpDoc comments. It will be complementary documentation to the api reference on t-regx.com, but will be much easier to use and probably more helpful to the programmers. We plan the documentation to be long and detailed - our aim is to provide a documentation that fill be sufficient for day-to-day development, that doesn't require the developer to search for other sources in the browser, but can find many answers right in the method signature and PhpDoc.

We indent to include tags @param, @return and @throws for the programming part, as well as @see and @link pointing to the necessary documentation for further reading.

Stay tuned!

Announcement - Prepared patterns revamp

Rawwrrrr!

Hello, dear regexp writers! For about 5 months now, we've been working really hard or rewriting prepared patterns, in order to introduce certain necessary features to them.

The biggest issue, of prepared patterns in their current form, is that the only form of ignoring a placeholder in a pattern was escape.

Pattern::inject('foo:@', ['bar']); // includes value
Pattern::inject('foo:\@', []); // doesn't include value

Of course, you could also escape the slash, so foo:\\@ would include the value, foo:\\\@ wouldn't, and so on.

The that's fine, but it's not everything. There are other cases whether placeholders needed special treatments, most notably [@], \Q@\E, (?#@) and #@\n (with x flag). We knew about those cases, and we made sure, that while the placeholder would be used in those cases, they wouldn't break the pattern and wouldn't introduce any unexpected behaviour.

So in other words, as long as the users used the library according to the documentation, every thing would be fine and every feature would be usable as usual.

The problem appears, what if user uses the library not in accordance to the documentation? Well, the best case would be to throw an exception, where users' actions were invalid, or perform them if they were. Sadly, it turns out that with the current implementation that appeared to be impossible. And there's also another case, where user can use in-pattern structures to enable or disable x flag, turning a certain pattern into a comment, or turn a comment off. In that, handling the placeholder properly turned out to be virtually impossible, not for the corner cases but for the standard cases as well. So we decided to spend months, to rewrite the prepared patterns internals, allowing us to handle the pattern building process much better.

The changes haven't been released yet, but they will be soon. Here are the changes:

  • Currently, \@ would be left untouched. This behaviour is unchanged.
  • Currently, [@], \Q@\E, \c@ would be injected. These values won't be injected now.
  • Currently, placeholder @ in comment would be injected. From now on, it won't, regardless of flags used in the main pattern, or in any of the subpatterns.

So in short, in the current version, @ placeholder was replaced everytime, unless escaped.

In this the next release, @ will be replaced only if that's a literal in a pattern. So, if @ is a part of a character-class ([@]), is quoted (\Q@\E), is escaped \@, is in a comment ((?#@)), or is in an extended comment (#@\n, when x flag is used), then it won't be injected, or any other case to come, it won't be injected.

Announcement - Prepared patterns simplification

Rawwrrrr!

Hello, dear regexp writers! Again! After the revamp of prepared patterns, there will come a change in the interface of the prepared patterns method as well. Simply speaking, we'll simplify them.

Reconcile Pattern::inject() vs Pattern::bind()#

When prepared patterns, first came to be, the initial idea behind Pattern::bind() was that we could name our placeholder, so that the regular expression could become more readable. With named placeholders we could also reuse them.

However, after a year of production use, it turns out that naming placeholders doesn't produce as much utility, as it does to compromise the robustness of the patterns. And reusing of the patterns proved to be even less frequent.

For example, instead of

Pattern::bind('http://@animal.site.com/@animal', ['animal' => $animal]);

one could simply use

Pattern::inject('http://@.site.com/@', [$animal, $animal]);

There have been debates as to which of those approaches is "cleaner", and the majority decided that the Pattern::inject() is cleaner, despite the duplication of placeholders, on the rationale that, if the placeholder is used twice, so should the injected values.

All in all, we decided that Pattern::bind() doesn't bring any more utility that Pattern::inject(), and there's nothing you could do with Pattern::bind(), that you couldn't with Pattern::inject(), so we decided to remove it from the library.

Bad design of Pattern::template()#

Some time back, we introduce Pattern::template() as a way of building patterns using a fluent builder. You could specify a template with @ and & placeholders inside. @ placeholders would be injected with the values, while & would be injected with patterns, like masks.

After the review of the interface, we admit that was a bad interface from the start. We didn't think it through.

We decided that two placeholders, @ and & were superfluous, and we could easily achieve the same effect with just one. Additionally, we decided that we shouldn't have tied the template to the Pattern::inject() in such a crude way.

Pattern::template('&, @, &, @')
->literal() // replace the first "&" with "&"
->mask($mask, $keywords) // replace the second "&" with the mask
->inject([$first, $second]); // replace the first and the second "@" with values

We admit that this design was as bad as it could ever be, we hated using that in production. It must be eliminated.

Instead, the new API will look similar to this one:

Pattern::template('@, @, @, @')
->literal('&') // replace the first "@" with "&"
->literal($first) // replace the second "@" with value
->mask($mask, $keywords) // replace the third "@" with the mask
->literal($second) // replace the fourth "@" with value
->build();

Which we believe looks cleaner, is more description, conveys intention and is prone to create less bugs, in our opinions.

Templates and builders

Rawwrrrr!

We've release T-Regx 0.11.0.

This is more of a maintenance release, most of our development time is hovering around inject #91 issue, and that's quite a heavy feature, requiring us to in fact rewrite our Prepared Patterns completely, and use our dedicated regular expressions parser. None of the parsers available on the internet matched our needs. It will probably be released as T-Regx 1.0, because it introduces too much breaking changes. (Actually it was realeased as 0.12.0)

Another time-consuming thing is t-regx.com website being rewritten from scratch, you can expect it in a few months.

In this release, we simplified PatternBuilder to Pattern, simplified template() and mask() methods, unified Pattern/PatternImpl/PatternInterface into one being, and we added Pcre version helper.

As of the release, as always, everything is described in ChangeLog.md on github.

Implicit all() in replace()

Rawwrrrr!

We've release T-Regx 0.10.2.

Normally, when doing replacements, you always had to specify explicitly the number of them, so:

  • replace()->all()->with()
  • replace()->first()->by()
  • replace()->only(2)->focus()

Since 0.10.2, you can skip the quantifier, and just use with()/callback()/by()/focus() or any other replace methods, like so:

  • replace()->with()
  • replace()->by()
  • replace()->focus()

And they will replace every occurrence, just like all().

Don't worry, we don't use any kind of meta-programing with magic methods or anything. We used simple polymorphism and design patterns (delegation and adapter in this case), so if you click Ctrl+B/Go to declaration in your IDE, you will see exactly what code is being run.

Additionally, we customized some exceptions messages. Now, depending on the nature of your exception, you will see one of these additional exception messages:

  • Expected to get the 3-nth element from fluent pattern, but the subject backing the feed was not matched
  • Expected to get the first match as integer, but subject was not matched
  • Expected to get the first element from fluent pattern, but the elements feed has 0 element(s)
  • and more. You can see them all on github in /CleanRegex/Internal/Exception/Messages

As always, everything is described in ChangeLog.md on github.

Valentine's release

Rawwrrrr!

We've release T-Regx 0.10.1.

This time, we've updated match filtering. Previously, methods filter() used on regular match pattern, it would filter only Detail, and have exactly alike interface as the said match pattern (like a filtering decorator), yet fluent()->filter() simply removed entries from the fluent stream. We don't like that difference.

So we renamed match()->filter() to match()->remaining(), since that looks more like a decorator it is, and we added new match()->filter() method which works like all(), but it only returns the items matching the predicate (like array_filter).

Apart from that, we fixed a bug that was lurking in fluent()->flatMap() (don't worry, it's gone now :), as well as improving the fluent()->first(). Now, when filtering a fluent stream, calling first() first calls preg_match(), and if it matches the predicate, that Detail is simply returned. If the first Detail isn't matched by the predicate, then it calls preg_match_all() and returns the first detail from that, that matches the predicate.

As always, everything is described in ChangeLog.md on github.