What difference does the lazy quantifier make in this specific regex?

I am reading about a specific example/exercise on regular expressions.
The sentence to process is:

<b>Billions</b> and <b>Zillions</b> of suns   

The match wanted is Billions ie the text between <b></b>
The solution proposes 2 regexes:
First:

<b>((?!<b>).)*?</b>   

I did not understand why is the lazy quantifier needed here. It seems to me redundant.
Then the second solution proposes the following in order be able to remove the lazy qualifier:
Second:

<b>((?!</?b>).)*?</b>   

I can understand the second as a solution but to me it seems irrelant to addressing any issue related to laziness. I mean this:

<b>((?!<b>).)*</b>   

as far as I can tell will match the Billions just fine. It will greedily reach up to the <b> of Zillions, then it will start backtracking up until it will reach the </b> of Billions and achieve the match.

Example:

$ perl -e '  
my $var = "<b>Billions</b> and <b>Zillions</b> of suns";  
$var =~ /<b>(((?!<b>).)*)</b>/;print "$1n";  
'  
Billions  

Am I misunderstanding something here?
Could it be the case that the author tried to write a regex that is valid for all tools?


The difference between <b>((?!<b>).)*?</b> and <b>((?!<b>).)*</b> is only about performance and the amount of backtracking involved.

The first regex will match Billions in your sample sentence and stop there.

The second regex will match Billions and , and then start backtracking before finding a match. The second thus becomes less efficient. But if you look again, the regex could also be the equivalent of <b>.*?</b> in terms of the number of characters matched if you include the number of characters to backtrack, provided there are no nested tags (eg <b>Billions and <b>Zillions</b></b> of suns but that's just silly since nested <b> don't change the format.)

I would myself use:

<b>((?!</b>).)*</b>

As regex. The </b> in the negative lookahead prevents the matching of </b> and in the end is a little more efficient than the first regex.

For instance, you can see the number of 'steps' taken until a match is obtained for the:

  • first regex (50 steps)
  • second regex (66 steps)
  • my regex (48 steps)
  • 链接地址: http://www.djcxy.com/p/76986.html

    上一篇: 懒惰的量词和前瞻

    下一篇: 惰性量词在这个特定的正则表达式中有什么不同?