Why is this regex not greedy?
In this regex
$line = 'this is a regular expression';
$line =~ s/^(w+)b(.*)b(w+)$/$3 $2 $1/;
print $line;
Why is $2 equal to " is a regular " ? My thought process is that (.*) should be greedy and match all characters until the end of the line and therefore $3 would be empty.
That's not happening, though. The regex matcher is somehow stopping right before the last word boundary and populating $3 with what's after the last word boundary and the rest of the string is sent to $2.
Any explanation? Thanks.
$3 can't be empty when using this regex because the corresponding capturing group is (w+) , which must match at least one word character or the whole match will fail.
So what happens is (.*) matches " is a regular expression ", b matches the end of the string, and (w+) fails to match. The regex engine then backtracks to (.*) matching " is a regular " (note the match includes the space), b matches the word boundary before e , and (w+) matches " expression ".
If you change (w+) to (w*) then you will end up with the result you expected, where (.*) consumes the whole string.
Greedy doesn't mean it gets to match absolutely everything. It just means it can take as much as possible and still have the regex succeed .
This means that since you use the + in group 3 it can't be empty and still succeed as + means 1 or more .
If you want 3 to be empty, just change (w+) to (w?) . Now since ? means 0 or 1 it can be empty, and therefore the greedy .* takes everything. Note: This seems to work only in Perl, due to how perl deals with lines.
In order for the regex to match the whole string, ^(w+)b requires that the entire first word be 1 . Likewise, b(w+)$ requires that the entire last word be 3 . Therefore, no matter how greedy (.*) is, it can only capture ' is a regular ', otherwise the pattern won't match. At some point while matching the string, .* probably did take up the entire ' is a regular expression', but then it found that it had to backtrack and let the w+ get its match too.
上一篇: 贪婪与非
下一篇: 为什么这个正则表达式不贪婪?
