Sacrificial RabbitCode.  Art.  Perversion.  Madness.

Tuesday, February 28. 2006

Introducing BunnyRegex: easy regular expressions, and mini-languages inside of PHP.


Trackbacks

BunnyRegex for PHP4
When I coded BunnyRegex, I just coded it with no thought towards those who use PHP4. I actually don't feel bad about this, there are really so many good reasons to upgrade to PHP5. At any rate, I did decide that making a PHP4 version wouldn't be to diff
Weblog: Sacrificial Rabbit
Tracked: Mar 05, 01:04
Bunnies bunnies everywhere
I've long been convinced that coincidence is never an accident.  On a day to day basis, the questions that come through ##php will gather in bunches to the point where I can often just up-key to find the last time I answered the question, and state it a
Weblog: Sara Golemon
Tracked: Apr 10, 13:45

Comments
Display comments as (Linear | Threaded)

I think having this syntax around is a pretty good idea. There is a similar structure in CL-PPCRE, except that it's implemented in reverse -- the regex strings get translated to the parse-tree, which is then processed.

You might consider adding two optional arguments to each construct to replace the chained repetition. For one thing, this prevents the broken output of "\d{4}{4}", and it also stays consistent and short.

Instead of

$pattern->bol()
->digit()->moreThan(3) // future proofing
->string('/')
->digit()->exactly(2)
->string('/')
->digit()->exactly(2)
->eol();

maybe something like

$pattern->bol()
->digit(4, '+')
->string('/')
->digit(2)
->string('/')
->digit(2)
->eol();

I would suggest keyword args, but I don't think PHP has them. Another syntactic issue is the grouping. What if someone starts the group, but forgets to end it?

$pattern->startGroup()
->upper()
->lower(1, '+');

I'm not quite so sure how to fix this in PHP. I recently fought with a similar thing while writing a query language, but the solution was Lisp (in which its easy to do) and to not allow access to the grouping from the wrappers in other languages. That probably won't fly in a regex syntax. So, you may very well have it as good as it gets.
#1 Greg Pfeil (Link) on 2006-03-02 09:15 (Reply)
At this point, we move away from a wrapper around the regex sigils, and into the territory of parsing and (de?) linting.

It was certainly on my mind. In fact, if you do a checkout of the SVN, you can find a class that is half written called RabbitRegex. RabbitRegex isn't a subclass of BunnyRegex, but a class that sits on top of BunnyRegex. It actually is a fine example of AOP in PHP. (Something that I should really expand upon in a new entry, but I digress). At any rate, it is (or rather, will be) a fairly simplistic parser, keeping track of its state inside of a stack and a 'register' containing the previous sigil.

I abandoned it for now because I wasn't 100% positive of its usefulness. I figure BunnyRegex is about 80% complete. To syntax checking seems like a 20% gain for another 80% of effort.

But it sure is interesting to think about! :)
#1.1 Jonnay on 2006-03-02 09:55 (Reply)
Another thing about CL-PPCRE ... it's not build on top of PCRE, it's pure Common Lisp. The string syntax is built on top of the s-expr syntax, which is a pretty powerful thing, I think. The test will come once Perl 6 is out. A lot of improvements in the regex space, but something that will require a new generation of RE engines.

I imagine that CL-PPCRE's structure will make it easy to just add another syntactic transform that takes different strings and translates them to the same s-expr. It's a really clean division of syntax and behavior. I don't know how PCRE or any non-PCRE libraries are implemented, but I'm guessing there will be a bit more work involved in making them do Perl6 regex.
#2 Greg Pfeil (Link) on 2006-03-02 09:58 (Reply)
how about the following idea:

dateRE->pattern = "__/__/__"
dateRE->length = 8
dateRE->chars = "00/00/00"

emailRE->pattern = "__@__.__"
emailRE->chars = "AA@AA.AA"

This would only work for very simple expressions, but I think this is even far more consise and readable then the (evidently more powerful) class.
#3 Michiel van der Blonk (Link) on 2006-03-02 10:29 (Reply)
It sounds like a good idea, it looks like a mask in reverse. But there are a few things to think about:

does dateRE match "0000////" or "000/0/00"? Each subject has 8 characters in it, but the slashes are in different spots.

What about when you need to use the _ in part of your expression? do you need to escape it? Is it possible?

The second example wouldn't be too hard to build in BunnyRegex, (or straight up PCRE for that matter).
#3.1 Jonnay on 2006-03-02 11:51 (Reply)
There's kind of an assumption that regex's are hard to read, or difficult to learn. I'd say having to trawl through an API to find out what functions do what is almost as difficult, and less rewarding than investing some time in learning how regex's work.

Great implementation though, It's very readable when you put it all together!
#4 Ironik on 2006-03-02 14:12 (Reply)
I like that your site is in 2 point font... If I squint I can almost see that there are letters in the code areas and they aren't just grey splotches.
#5 SHOE on 2006-03-02 16:50 (Reply)
Is sarcasm really necessary? I don't use absolute sizes anywhere, so you are more then welcome to increase and decrease the size within your browser as you wish.
#5.1 Jonnay (Link) on 2006-03-02 17:43 (Reply)
Bravo, you are a true king bunny.

Great idea, and though I admire your idea enormously - I just have a hard time imagining I can teach noobs on forums about what an object is rather than explain what a regex is...

Nevertheless, I think its cool and can think of some places where I can compose and aggregate this class to fit user needs.

I am really interested in seeing your thoughts on AOP in PHP.

Please keep sharing like this...
#6 Paul G on 2006-03-03 07:18 (Reply)
Thanks for your kind comments!

My rantings on AOP in PHP are finished and are available:

http://blog.jonnay.net/archives/637-Aspect-Oriented-Programming-in-PHP-as-a-contrast-to-other-languages..html

You're going to have to cut and paste. Stupid spammers are making blog comment sections and trackbacks less useful every day. >:P
#6.1 Jonnay on 2006-03-06 13:22 (Reply)
Right out of the box I get this error: Parse error: parse error, expecting `T_OLD_FUNCTION' or `T_FUNCTION' or `T_VAR' or `'}'' in BunnyRegex.php on line 38
What version of PHP is required for use?
#7 Brian on 2006-03-03 15:41 (Reply)
Whups. I'll upload a new version with a similar class that should work with PHP4, I'll need someone to test it though.
#7.1 Jonnay on 2006-03-05 00:52 (Reply)

Add Comment

E-Mail addresses will not be displayed and will only be used for E-Mail notifications

To prevent automated Bots from commentspamming, please enter the string you see in the image below in the appropriate input box. Your comment will only be submitted if the strings match. Please ensure that your browser supports and accepts cookies, or your comment cannot be verified correctly.
CAPTCHA

 
 

Deprecated: Assigning the return value of new by reference is deprecated in /home/jonnay/jonnay.net/blog/include/plugin_api.inc.php on line 558

Deprecated: Assigning the return value of new by reference is deprecated in /home/jonnay/jonnay.net/blog/plugins/serendipity_event_podcast/serendipity_event_podcast.php on line 299

Deprecated: Assigning the return value of new by reference is deprecated in /home/jonnay/jonnay.net/blog/plugins/serendipity_event_podcast/serendipity_event_podcast.php on line 334

Deprecated: Assigning the return value of new by reference is deprecated in /home/jonnay/jonnay.net/blog/bundled-libs/HTTP/Request.php on line 240

Deprecated: Assigning the return value of new by reference is deprecated in /home/jonnay/jonnay.net/blog/bundled-libs/HTTP/Request.php on line 337

Deprecated: Assigning the return value of new by reference is deprecated in /home/jonnay/jonnay.net/blog/bundled-libs/HTTP/Request.php on line 630

Deprecated: Assigning the return value of new by reference is deprecated in /home/jonnay/jonnay.net/blog/bundled-libs/HTTP/Request.php on line 653

Deprecated: Assigning the return value of new by reference is deprecated in /home/jonnay/jonnay.net/blog/plugins/serendipity_event_spartacus/serendipity_event_spartacus.php on line 271

Deprecated: Assigning the return value of new by reference is deprecated in /home/jonnay/jonnay.net/blog/plugins/serendipity_event_spamblock/serendipity_event_spamblock.php on line 398

Deprecated: Assigning the return value of new by reference is deprecated in /home/jonnay/jonnay.net/blog/plugins/serendipity_event_spamblock/serendipity_event_spamblock.php on line 423

Deprecated: Assigning the return value of new by reference is deprecated in /home/jonnay/jonnay.net/blog/plugins/serendipity_event_spamblock/serendipity_event_spamblock.php on line 466

Deprecated: Assigning the return value of new by reference is deprecated in /home/jonnay/jonnay.net/blog/plugins/serendipity_event_spamblock/serendipity_event_spamblock.php on line 697