Tuesday, February 21. 2006
Deep PHP Function Voodoo
Okay, the title is a bit of a misnomer, because what I am about to explain to you isn't deep, let alone voodoo if you have any kind of experience with a language like scheme or you know what "first class functions" and "higher order functions" are. This entry is monsterous, so instead of filling up my main page full of PHP ranting, I am going to use the super-awesome-Serendipity-read-more-feature.
Higher Order Functions, Anonymous Functions and Closures
What is a higher order function? A higher order function is one that takes a function as an argument, and/or returns a new function as a result. In some languages (like Scheme, and Javascript) this is really an easy thing to do, because functions are just values, they don't have to be bound to a name in particular. Functions are said to be first class members. Just like you can bind the number 5 to the variable $foo, you can bind function ($somearg) {...do some stuff..} to the variable $foo as well. Also, both Javascript and Scheme allow you do make "anonymous functions" (called Lambda in scheme). In the expression strlen('somestring') 'somestring' is anonymous, that is, it is not bound to a variable. This really shines nicely when using higher order functions, especially in scheme:
(define some-list '(1 2 3 4 5)) (for-each (lambda (x) (print x)) some-list)
The first line just binds the list (pretend for a moment it is like an array) to the variable some-list. On the next line we have the magic. for-each is a function in scheme that takes 2 arguments, a function, and a list, and calls the function on every member of the list. Our function is (lambda (x) (print x)), which is a fancy-schmancy way of saying: an anonymous function (lambda) that takes one argument (x) and its body is (print x).
In fact, that is a highly redundant piece of code. This would work even better:
(foreach print somelist)
They broke PHP...
PHP however, is somewhat broken (in my opinion). There are in fact all kinds of higher order functions defined in the language, a callback pseudo-type, and even a way to create an anonymous functions with create_function(). However, in PHP functions are second class entities. First of all, they must be bound to a name, and the name that they are bound to is in a different name space as regular variables (thus, we cannot bind them to a variable). For instance this is not valid PHP:
$foo = function ($x) { echo "x is $x ";}
$foo(5); // Output: x is 5
$foo = function ($x) { echo "down with $x";}
$foo(5); // Output: down with 5
However, we can bind a function name to a variable, and call a variable like a function:
function foo($x) {echo "x is $x";}
$foo = "foo";
$foo(5); // Output: x is 5
function bar($x) { echo "down with $x";}
$foo = "bar";
$bar(5); // Output: down with 5
There is a subtle "use-mention" thing going on here. Languages with functions that are first class entities use the function directly, where as languages with second class functions can only mention functions.
Create Function... a band-aid.
As I mentioned earlier... there is a function in PHP however that will automatically create a function for you, give it a unique name, and return that name as a string: create_function. Lets re-write our example above using it:
$foo = create_function('$x','echo "x is $x";');
$foo(5); // Output: x is 5
$foo = create_function('$x','echo "down with $x";');
$foo(5); // Output: down with 5
"But wait... " you must be saying, "how is this any different from the invalid hypothetical example above, except for some syntatic sugar?" Well the first major difference is that in the hypothetical example above, $foo is bound to a function; a call to gettype($foo) would (theoretically) return 'function'. Instead what we have here is $foo is bound to a string contaning a unique id which is the name of the function, so it is a string.
The second difference is that you are passing create_function 2 strings:the arguments and the function body, instead of the straight-up function definition. This means you have to be careful about the string you are passing to your anonymous function. If you use double quotes, you will need to escape most of your variables unless you want to explicitly use its value. This also has a big effect in code readability. It is not as easy to parse quoted code—unless it is scheme, but that is a different tag for a different day.
Closures
The third, and perhaps the fundimental, difference here is that in the first (hypothetical) example, the function is created in the current execution environment, where as by using create_function, an entirely new environment is created, including a fresh php parser. That means that even if you try a create_function inside of a class, your scope returns to the global scope. This means you can't use create_function to dynamically add new methods to an object. Javascript and Scheme have no such limitations.
When you create a function inside of another functions execution context, this is called a closure. Here is an example of a closure in javascript
//Javascript
function createCounterClosure(start)
{
var counter = start;
return function ()
{
return counter++;
}
}
var a = createCounterClosure(5);
var b = createCounterClosure(23);
a(); // returns 5
b(); // returns 23
a(); // returns 6
a(); // returns 7
b(); // returns 24
You might notice (especially if you are good at javascript) that the syntax looks really similar to how javascript handles objects. In fact, you can view objects as nothing more then closures that act on "messages". I built my own object system in Scheme that utilizes closures, details are in my entry: One macro makes you small... Scheme and the BunnyObject System.
You can in fact fudge closures to a certain extent. Because the current scope is available to you when you are passing arguments to create_function, you can embed a variables value inside of it. But this is rather limited, because you are only embedding the values, not the variables themselves.
Using anonymous functions?
Constrasting Sheme and Javascript with PHP
In scheme or javascript, because functions are first class entities, is it easier to debug, syntax check, and even read anonymous functions because they are inline code. For instance, the same code in pseudo-scheme is:
(define filtered
(list-filter (alist (lambda(x)
(not (empty(x))))))
or pseudo-javascript:
var filtered = array_filter(anArray,
function (x) {!empty(x);});
PHPs anonymous functions are not the most readable in the world. While something simple like this:
$dim = array_filer($dim,
create_function('$str', 'return (!empty($str));'));
is readable, the readability issue is magnified when you have to start escaping characters. However, some things can only be done with create_function, especially once we get into the realm of higher-order functions.
In my REST API Framework Meditation, I have a facility set up so that the programmer can set up "Request Processors" which do generalized processing on a request. Meditation is also built so that the user can use procedural code or object oriented code, as they see fit (well, actually, they are forced into interacting with meditation in an OO way, but their own application code can be in any paradigm they wish.) The snag came when I wanted to allow for procedural processors, especially if you wanted that processor to act only on certain methods for certain resources, but on different methods for different resources. In a nutshell, it was easy to build a processor class, and tell it to only process on certain methods. But building a generalized processor function that allowed you to select which method you wanted was hard. Perhaps some code would clear the situation up:
// the resource 'foo'
$pathinfotran = new PathInfoTranslator(); // a processor to translate the pathinfo into something
$pathinfotran->processOn('GET'); // tell the processor to process on the GET method only.
Lotus::getInstance()->addRequestProcessor($pathinfotran);
...
// the resource 'bar'
$pathinfotran = new PathInfoTranslator(); // a processor to translate the pathinfo into something
$pathinfotran->processOn('GET'); // tell the processor to process on the GET method
$pathinfotran->processOn('PUT'); // tell the processor to process on the PUT method
Lotus::getInstance()->addRequestProcessor($pathinfotran);
...
As you can see, the generalized PathInfoTranslator class is very flexable. But if we just had a procedure for a reqeust processor, we would have to do our method checking inside of the procedure, making generalization very hard..
Or would we? Higher order functions inside of PHP.
The solution is not that difficult: we can build a new function, that takes the processor callback, and a list of request methods to run it on, and returns a callback that checks to see if the current method is in the list of methods to run on. It isn't hard, but it does take a little bit of care.
The first round. Good habits and readability
The thing you must recognize about using higher order functions, especially in PHP, is that the readability and understandability of your application can suffer. So you must take care to format your code as cleanly as possible. This (in my opinion) means using multi-line strings. Heredoc syntax might also be a good choice, but you would need to escape your variable names.
So back to our function, basically we want to have a function that takes a function as a parameter, and some extra arguments, and then builds a function that will only executes the callback if some condition is satisfied. Both the function that composeProcessor recieves, and the one that it returns take a single argument, which is the current Request Method.
function composeProcessor($callback, $methods = array())
{
$conditional = '($method == \'';
$conditional .= implode('\') or ($method == \'', $methods);
$conditional .= '\')';
$functionArgs = '$method';
$functionBody =
"if ($conditional)\n".
'{'."\n".
"call_user_func('$callback', \$method);\n".
'}'."\n".
'return;'."\n";
return create_function($functionArgs, $functionBody);
}
You'll notice two things right off the bat. First, each line of code for my anonymous function (the content of $functionBody) is given its own indented line in the source code. Secondly, I explicitly give each line a linebreak. inside of the string. While this is not actually necessary, it makes debugging the script and spotting parse errors much easier, because there are physical lines for the parser. The third thing that you will notice is that I am skipping between using single quotes, and double quotes. There is a good reason for this: $callback, and $conditional are variables that are inside of the execution environment of the composeProcessor, whereas $method is a parameter of our anonymous function; by using single quotes vs. double quotes, it is a subtle hint that I am going to be using the values of $callback and $conditional.
This function works file when $callback is a string. But callbacks can be arrays too, either an array of 2 strings (for a static call of an class method) or an array of an object and string (for a call to a method on an object). Handling the case of 2 strings in an array is dirt easy, but what about when the user passes in an object method as a callback?
Emulating closures
It walks, like a dog, barks like a dog... but really it is an Aibo in a fursuit.
Closures can be emulated in PHP. The absolute simplest way to do it is to have what I call a "Registry Function". Back in the before-time of PHP4, classes couldn't have static variables. For those few cases when you really needed them, you could fudge them in by using a static variable inside of a combined getter/setter method. If you wanted to set the variable, you would call it with one argument, and it would set the static variable to the new value. If you wanted to retrieve the variable, then you would call it with no arguments and it just returned the static variable. "Registry Functions" are similar, except they hold an array, and you get them with one argument(the argument being the key) and set them with 2 (key, value). I use them in meditation, mostly to hide private keys from a print_r, var_dump or var_export().
So with our registry, we will be able to work with an object reference, even object references like $this, inside of create_function. There are some garbage collection issues, because the static array will hold the object reference till the script is finished executing.
Here is our code with the registry function (ComposeProcessorRegistry). Note that error checking is remove for brevity.
function composeProcessor($callback, $methods = array())
{
if (is_array($callback))
{
if (is_object($callback[0]))
{
$id = uniqid();
ComposeProcessorRegistry($id, $callback[0]);
$callback = "array(ComposeProcessorRegistry('$id'), '{$callback[1]}')";
}
else
{
$callback ="array('{$callback[0]}','{$callback[1]}')";
}
}
else
{
$callback = "'$callback'";
}
$conditional = '($method == \''.implode('\') or ($method == \'', $args) .'\')';
$functionArgs = '$method';
$functionBody =
"if ($conditional)"."\n".
'{'."\n".
"call_user_func($callback, \$method);\n".
'}'."\n".
'return;'."\n";
return create_function($functionArgs, $functionBody);
}
/**
* A registry for the objects that are passed into ComposeProcessor
*/
function ComposeProcessorRegistry($uniqueId, $value = null)
{
static $registry;
if (null === $value)
return $registry[$uniqueId];
else
$registry[$uniqueId] = $value;
}
Notice that we spend a lot of code determining the type of callback that $callback is. We don't really have to do that, we could just store the value of callback inside of the registry and leave it at that.
When to use Create Function and (emulated) Closures
If you want to sound like you know something about PHP, then you should parrot the common wisdom, and say "Never!" Be sure and question the programming skills and/or design of anyone who would be so rash as to use such dangerous language constructs.
The two scheme interpreters
I have used both of the two scheme interpreters I linked to. The first one, Scheme48 is for hard-core unix hackers. People who run x-windows just to have more terminal windows available on the screen. It is very powerful, and very command-line based. There is a unix shell based on s48. It won't be the easiest way to learn scheme, but I would be remiss in not mentioning it.
The second link, PLT Scheme Is an interpreter, editor, and kitchen sink that is explicitly for learning scheme. In fact, I think that it comes with a specialized scheme interpreter for using with SICP. This is good for people who like their GUIs, and good for learning with.
If you want to know something about programming and know when you should use create_function(), then you should really learn a language with first-class functions and then come back to PHP. You might find that you don't want to for a couple of months—that's okay. I suggest finding a Scheme interpreter, and downloading the SICP book and/or video lectures.
If you would like a rule of thumb as to when create_function is apropriate: generally if you are working with a function that takes a callback as an argument or returns one, create_function might be applicable.
As for emulated closures? I think that their uses are few and far between. In fact, I have left the name of the registry function I wrote as "ComposeProcessorRegistry" instead of generalizing it so that I am not as tempted to just go ahead and willy-nilly use closures whenever I jump to the conclusion that they are needed. To be sure, they are quite handy. However seeing as how this emulation is a hacky kludge, I am of the opinion that they should be used sparingly. The combined getter/setter makes for difficult to read code. I chose to use a closure in this situation by weighing in the pros and cons: the con of reduced readability was outweighed by the pro of enhanced flexibility for a library that may end up being used by thousands (I can dream, can't I?).





Apparently there was some kind of PHP vs. Javascript fight, to see which language sucked the most. The results? Amazingly Banal, if not downright wrong. There is the list of topics, and their scores:Syntax: PHP - Sucks, JS - Sucks HardData Types: PHP -
Tracked: Oct 12, 12:59