What difference does the lazy quantifier make in this specific regex?

2018-06-27 12:57:29

I am reading about a specific example/exercise on regular expressions.
The sentence to process is:

<b>Billions</b> and <b>Zillions</b> of suns

The match wanted is Billions ie the text between 
The solution proposes 2 regexes:
First:

<b>((?!<b>).)*?</b>

I did not understand why is the lazy quantifier needed here. It seems to me redundant.
Then the second solution proposes the following in order be able to remove the lazy qualifier:
Second:

<b>((?!</?b>).)*?</b>

I can understand the second as a solution but to me it seems irrelant to addressing any issue related to laziness. I mean this:

<b>((?!<b>).)*</b>

as far as I can tell will match the Billions just fine. It will greedily reach up to the  of Zillions, then it will start backtracking up until it will reach the  of Billions and achieve the match.

Example:

$ perl -e '  
my $var = "<b>Billions</b> and <b>Zillions</b> of suns";  
$var =~ /<b>(((?!<b>).)*)</b>/;print "$1n";  
'  
Billions

Am I misunderstanding something here?
Could it be the case that the author tried to write a regex that is valid for all tools?

The difference between ((?!).)*? and ((?!).)* is only about performance and the amount of backtracking involved.

The first regex will match Billions in your sample sentence and stop there.

The second regex will match Billions and , and then start backtracking before finding a match. The second thus becomes less efficient. But if you look again, the regex could also be the equivalent of .*? in terms of the number of characters matched if you include the number of characters to backtrack, provided there are no nested tags (eg Billions and Zillions of suns but that's just silly since nested  don't change the format.)

I would myself use:

<b>((?!</b>).)*</b>

As regex. The  in the negative lookahead prevents the matching of  and in the end is a little more efficient than the first regex.

For instance, you can see the number of 'steps' taken until a match is obtained for the:

first regex (50 steps)

second regex (66 steps)

my regex (48 steps)

链接地址: http://www.djcxy.com/p/76986.html

上一篇: 懒惰的量词和前瞻

下一篇: 惰性量词在这个特定的正则表达式中有什么不同？