Y

How can a subroutine call itself without it knowing its own name?

Contents

Problem

This is Perl.

In reality, my recursive subroutine was one designed to find the differences between two XML documents, but I don't have space for that, so instead I'll use, ugh, Fibonacci numbers:

sub fib {
	my $n = shift;
	return $n == 0 ? 0 :
	       $n == 1 ? 1 :
	       fib($n-1) + fib($n-2);
}

print fib(17); # "1597"

One day I renamed the subroutine and forgot to refactor all of the calls to it, resulting in a runtime error:

sub fibonacci {
	my $n = shift;
	return $n == 0 ? 0 :
	       $n == 1 ? 1 :
	       fib($n-1) + fib($n-2);
}

print fibonacci(17); # "Undefined subroutine &main::fib called at asdf.pl line 3"

This is a trivial defect to spot in this situation because it stands out a mile off, but my XML differ was a few pages long and I'd forgotten that it was recursive, so I got tripped up. This is something that has tripped me up exactly once ever, due entirely to my own incompetence, so of course I was given to wonder how I could fix it programmatically so that it never happened again.

Fix #1: Symbolic references

Two useful pieces of information led to an initial fix.

Firstly, Perl has a concept called symbolic references. Any time you would use a bareword to refer to a package variable (not a lexical variable!) or to a subroutine, you can instead put a scalar containing that package variable or subroutine's name as a string. Here's what happens if we do this to a package variable $main::hats:

our $hats = 78;
my $varname = "hats";

print $varname;      # "hats"
print $hats;         # "78"
print ${ $varname }; # "78"
print $$varname;     # "78" since you can omit the braces

Similarly for a subroutine main::scarfs:

sub scarfs { return 56; }
my $subname = "scarfs";

print $subname;        # "scarfs"
print scarfs();        # "56"
print &scarfs();       # "56", using the now-optional "&" sigil to explicitly denote a subroutine call
print &{ $subname }(); # "56"; the "&" here is not optional
print &$subname();     # "56"; nor here
print $subname->();    # "56"; this also works

Secondly, Perl has the useful built-in function, caller, which returns a list of useful items of data about the current stack trace. caller 0 will give you information about the current stack frame, caller 1 for the frame above that, and so on (give or take the complicated edge-case behaviour that all Perl functions display). The stack frame that we care about is frame 0, the current subroutine call, and the item in the returned list that we care about is at index 3, which returns the current subroutine name, including package:

sub scarfs {
	my $subname = (caller 0)[3];
	return $subname;
}

print scarfs(); # "main::scarfs"

So here's a full solution:

sub fibonacci {
	my $subname = (caller 0)[3];
	my $n = shift;
	return $n == 0 ? 0 :
	       $n == 1 ? 1 :
	       $subname->($n-1) + $subname->($n-2); # no explicit "fibonacci" required
}

print fibonacci(17); # "1597"

Fix #2: strictness

But I've forgotten a couple of things. As we all know, every Perl script or module should always begin with the same two lines:

use strict;
use warnings;

sub fibonacci {
	my $subname = (caller 0)[3];
	my $n = shift;
	return $n == 0 ? 0 :
	       $n == 1 ? 1 :
	       $subname->($n-1) + $subname->($n-2);
}

print fibonacci(17); # 'Can't use string ("main::fibonacci") as a subroutine ref while "strict refs" in use at asdf.pl line 7'

Whoops!

So now we've learned something about Perl that's worth knowing. The strict pragma invoked by use strict; introduces three different types of strictness, which you can actually enable (or disable) separately if needed. One of these is called strict refs, and what this does is explicitly disable the symbolic reference behaviour shown above, which is actually considered unsafe compared to Perl's more explicit and sensible reference-making behaviour, which is always available even with strict refs enabled, and which works like this:

my $hats = 78; # works for both lexical and package variables
my $varref = \$hats;

print $varref;      # e.g. "SCALAR(0x4afeb0)"
print $hats;        # "78"
print ${ $varref }; # "78"
print $$varref;     # "78"
sub scarfs { return 56; }
my $subref = \&scarfs;

print $subref;        # e.g. "CODE(0x4afec8)"
print scarfs();       # "56"
print &scarfs();      # "56"
print &{ $subref }(); # "56"
print &$subref();     # "56"
print $subref->();    # "56"

So now it looks like we're out of luck. Perl's caller function cheerfully returns a subroutine name for the current stack frame, but it does not return a subroutine reference, nor is there any other way in Perl to get hold of this from inside the call itself.

Oh wait, yes there is.

use strict;
use warnings;

sub fibonacci {
	my $subname = (caller 0)[3];
	my $subref = \&$subname; # uh, what
	my $n = shift;
	return $n == 0 ? 0 :
	       $n == 1 ? 1 :
	       $subref->($n-1) + $subref->($n-2);
}

print fibonacci(17); # "1597"

Due to a wacky series of misadventures, or possibly for perfectly good reasons that aren't very obvious, the line my $subref = \&$subname, which uses a symbolic reference to a subroutine (that is, its name) in order to access it, is still apparently considered to be safe Perl, even with symbolic references otherwise disabled by having strict refs turned on. I'd love to know the reason for this, but in the meantime the loophole is wide enough for me to make use of. So in conclusion, how do you call a subroutine from inside the subroutine without it knowing its own name? Using the code above, or either of the equally elementary one-liners:

&{\&{(caller 0)[3]}}(@args);

or:

(\&{(caller 0)[3]})->(@args);

Peace out.

Fix #3: Anonymity

No. Wait a second. What if the subroutine's anonymous?

Perl does allow this. And this section is not a totally arbitrary problem arising entirely from my personal foolishness/inability to code. This is a real thing. A genuine example where you might end up with a totally anonymous subroutine is if it were given to you by a third party. Like this:

use strict;
use warnings;
package ThirdParty;

sub dispense_bespoke_recursive_subroutine {
	return sub {
		my $subname = (caller 0)[3];
		my $subref = &$subname;
		my $n = shift;
		return $n == 0 ? 0 :
		       $n == 1 ? 1 :
		       $subref->($n-1) + $subref->($n-2);
	};
	
	# or return a special version of &xml_diffs or &tree_walk or
	# whatever the supplied arguments indicate
}

return 1;

Your code would be:

use strict;
use warnings;
require ThirdParty;

my $fibonacci = ThirdParty->dispense_bespoke_recursive_subroutine();

print $fibonacci->(17); # "Undefined subroutine &ThirdParty::__ANON__ called at ThirdParty.pm line 8"

Daaaaang, we just ran headlong into one of Perl's classic and numerous edge cases. An anonymous subroutine has the name __ANON__, which is the name of a subroutine which doesn't exist. (Or, worse, which does exist. Yes, you can declare a subroutine called __ANON__ in whatever package you like. And yes, it would get called in this situation.)

So how can you and ThirdParty collectively get around this?

Here's one way, although this is pretty clunky. Simply pass in the subroutine reference as one of its own arguments, and pass that reference onwards as you recurse. Third party code:

use strict;
use warnings;
package ThirdParty;

sub dispense_bespoke_recursive_subroutine {
	return sub {
		my ($subref, $n) = @_;
		return $n == 0 ? 0 :
		       $n == 1 ? 1 :
		       $subref->($subref, $n-1) + $subref->($subref, $n-2);
	};
}

return 1;

Your code:

use strict;
use warnings;
require ThirdParty;

my $fibonacci = ThirdParty->dispense_bespoke_recursive_subroutine();

print $fibonacci->($fibonacci, 17); # "1597"

But this is rather presumptuous of ThirdParty. "Oh, please also pass this subroutine into itself as a reference every time you call it, otherwise it won't work! Thanks!!" What we want is to shift all of that burden to ThirdParty, so that our own code just looks like this:

use strict;
use warnings;
require ThirdParty;

my $fibonacci = ThirdParty->dispense_bespoke_recursive_subroutine()

print $fibonacci->(17); # "1597"

Is this possible?

Fix #4: Fixed-point combinators

There is no easy answer to this question. This remains true even if &{\&{(caller 0)[3]}}(@args); is counted as an easy answer. There is, however, a canonical hard answer. You, my friend, have just run into the reason for the existence of the fixed-point combinator.

Firstly, a definition. A fixed point of a function f is a value x such that x = f(x).

Now, rewrite our anonymous subroutine to look like this.

my $fib_incomplete = sub {
	my $subref = shift;
	return sub {
		my $n = shift;
		return $n == 0 ? 0 :
		       $n == 1 ? 1 :
		       $subref->($n-1) + $subref->($n-2);
	};
};

Here, f is a subroutine $fib_incomplete, whose argument x is a subroutine reference $subref. In this situation, a fixed point of $fib_incomplete would be a subroutine reference $subref such that

$subref

equals

$fib_incomplete->($subref)

, which itself coincidentally equals

sub {
	my $n = shift;
	return $n == 0 ? 0 :
	       $n == 1 ? 1 :
	       $subref->($n-1) + $subref->($n-2);
}

, as you can see from the code.

Wait! This means that a fixed point $subref of $fib_incomplete would have the property that

$subref = sub {
	my $n = shift;
	return $n == 0 ? 0 :
	       $n == 1 ? 1 :
	       $subref->($n-1) + $subref->($n-2);
}

which is coincidentally exactly the Fibonacci subroutine that we're looking for. What a well-chosen subroutine $fib_incomplete is! All we need to do is find $fib_incomplete's fixed point - if there is one? - and we're home and dry.

Is there such a fixed point? In fact, mathematically, we can guarantee that it exists, no matter what form $fib_incomplete happens to take. And can we produce it? Yes, we can, using a fixed-point combinator. A fixed-point combinator is a magical function which takes another function as input and returns that function's fixed point as its output.

There are infinitely many fixed-point combinators and all of them are brain-wrenchingly complicated. One of them is known as the Y combinator, and is supplied below.

use strict;
use warnings;
package ThirdParty;

# Take a subroutine as input, return its fixed point as output
sub Y {
	my $subref = shift;
	my $Y2 = sub {
		my $x = shift;
		my $Y3 = sub {
			return $x->($x)->(@_);
		};
		return $subref->($Y3);
	};
	return $Y2->($Y2);
}

sub dispense_bespoke_recursive_subroutine {
	# Start with an anonymous subroutine whose fixed point happens
	# to be the desired Fibonacci subroutine
	my $fib_incomplete = sub {
		my $subref = shift;
		return sub {
			my $n = shift;
			return $n == 0 ? 0 :
			       $n == 1 ? 1 :
			       $subref->($n-1) + $subref->($n-2);
		};
	};

	# Find that subroutine's fixed point
	my $fibonacci = Y($fib_incomplete);

	# Return it
	return $fibonacci;
}

return 1;

Your code is simply:

use strict;
use warnings;
require ThirdParty;

my $fibonacci = ThirdParty->dispense_bespoke_recursive_subroutine();

print $fibonacci->(17); # "1597"

Fix #5: That was all a complete waste of time

This essay is actually my third and hopefully final attempt to decode for myself exactly what it is that the Y combinator does, and why it is useful. During this long period of incomprehension, I decided, among other things, that it would probably be easier for me to directly modify my programming language of choice to be able to call <the current subroutine> using some special new keyword, than for me to actually understand how the Y combinator works.

It's fitting, then, that this has now happened. As of Perl version 5.16, you can just do:

use strict;
use warnings;
package ThirdParty;

sub dispense_bespoke_recursive_subroutine {
	return sub {
		my $n = shift;
		return $n == 0 ? 0 :
		       $n == 1 ? 1 :
		       __SUB__->($n-1) + __SUB__->($n-2);
	};
}

return 1;

We're stuck on 5.8 at work, and many programming languages still lack this feature. Nevertheless, I'm glad to be rid of this ridiculous problem once and for all.

Next: Variadic fixed-point combinators

Discussion (11)

2013-01-26 14:37:29 by Homer:

Regarding your original problem: What IDE do you use? Eclipse at least offers a (depending on the language) reasonably reliable rename refactoring (plus hotkey) that updates references and comments.

2013-01-26 15:14:33 by jason:

Does this have anything to do with your Ra story? It sound suspiciously similar to that quine and True Name aliasing technique.

2013-01-26 22:06:48 by Baughn:

That was hilarious, do it again! In all seriousness, if you want to understand the Y combinator, I'd suggest looking at Haskell instead. The Haskell version (fix) is defined much more directly, as 'fix f = let x = f x in x', leaning on laziness to make it work. One of the common uses is, indeed, to get a reference to an anonymous function.

2013-01-27 06:47:09 by Isaac:

I know nothing of perl yet still enjoyed reading this for some reason.

2013-01-27 10:15:49 by aaroncrane:

Thanks, I enjoyed this piece. It is deliberate that Perl's "strict refs" allows symbolic references to subroutines. The documentation (see "perldoc strict") says that this "is allowed so that "goto &$AUTOLOAD" would not break under stricture." (Myself, I'd prefer "strict refs" to complain about "&$name", but I think that's quite unlikely to change.)

2013-02-02 15:11:30 by qntm:

Having just read the documentation about AUTOLOAD and the "goto &$subname;" formation, I see the uses of both. I agree with you that "goto &$subname;" isn't a compelling enough reason to break strict subs, though. Surely it would be easy enough to modify the goto built-in so that it also allows "goto &$subref;"?

2013-02-02 22:03:20 by aaroncrane:

Hi, Sam. I’m not sure I follow your question — "goto &$subref" already works. More generally, I definitely agree that the use case — letting people do "goto &$AUTOLOAD" (and therefore "goto &$subname") in an AUTOLOAD routine with "strict refs" in scope — could have been solved in a way more targetted to that particular goal. (Or, hey, such people could be invited to do "no strict 'refs'" before such a goto; that’s hardly a significant burden in code that’s already doing something crazy enough to need an AUTOLOAD.) But, still, the fact that "use strict" has had its current behaviour for a couple of decades means that we’re probably stuck with it the way it is.

2013-02-02 22:15:07 by qntm:

Ah, "goto &$subref;" does work. I thought it didn't. Wait, so if "goto &$subref;" works, why does anybody care about "goto &$subname;" being disabled?

2013-02-04 17:00:30 by PhantomHoover:

Er, doesn't the Y combinator only work in languages with non-strict evaluation?

2013-02-04 18:38:14 by qntm:

In its original form, yes, but here I use the eta-expanded version from the bottom of this section: https://en.wikipedia.org/wiki/Fixed-point_combinator#Y_combinator

2013-02-22 00:45:33 by Veky:

> Wait, so if "goto &$subref;" works, why does anybody care about "goto &$subname;" being disabled? I see you still haven't got the answer to this question. It's because $AUTOLOAD is an ordinary scalar. You _can_ goto subref if you _have_ a subref, but if all you have is a string (as is the case with $AUTOLOAD), you have to have a safe way to convert it to a subref. Your next question will probably be one of these: * why don't we change $AUTOLOAD to be a subref instead of a string? (Because you'd still have to change goto &$AUTOLOAD to goto $AUTOLOAD.) * why don't we make & idempotent for subrefs? (Because it would be highly magical. Perl is already magical enough, and adding more is considered bad taste.) BTW your "sqrt" textbox is very inflexible. I suggest accepting j, (0,1) or even opposites of these. It is also imprecise: square root of minus one can be two, mod 5. :-]

New comment by :

Plain text only. Line breaks become <br/>
The square root of minus one: