match and lazy quantifier with strange behavior

I know that:
Lazy quantifier matches: As Few As Possible (shortest match)

Also know that the constructor:

basic_regex( ...,
            flag_type f = std::regex_constants::ECMAScript );

ECMAScript supports non-greedy matches,
and the ECMAScript regex "<tag[^>]*>.*?</tag>"
would match only until the first closing tag ... en.cppreference

At most one grammar option must be chosen out of ECMAScript , basic , extended , awk , grep , egrep . If no grammar is chosen, ECMAScript is assumed to be selected ... en.cppreference

Note that regex_match will only successfully match a regular expression to an entire character sequence, whereas std::regex_search will successfully match subsequences...std::regex_match

Here is my code: + Live

#include <iostream>
#include <string>
#include <regex>

int main(){

        std::string string( "s/one/two/three/four/five/six/g" );
        std::match_results< std::string::const_iterator > match;
        std::basic_regex< char > regex ( "s?/.+?/g?" );  // non-greedy
        bool test = false;

        using namespace std::regex_constants;

        // okay recognize the lazy operator .+?
        test = std::regex_search( string, match, regex );
        std::cout << test << 'n';
        std::cout << match.str() << 'n';
        // does not recognize the lazy operator .+?
        test = std::regex_match( string, match, regex, match_not_bol | match_not_eol );
        std::cout << test << 'n';
        std::cout << match.str() << 'n';

and the output:


Process returned 0 (0x0)   execution time : 0.008 s
Press ENTER to continue.

std::regex_match should not match anything and it should return 0 with non-greedy quantifier .+?

In fact, here, the non-greedy .+? quantifier has the same meaning as greedy one, and both /.+?/ and /.+/ match the same string. They are different patterns. So the problem is why the question mark is ignored?


Fast test:

$ echo 's/one/two/three/four/five/six/g' | perl -lne '/s?/.+?/g?/ && print $&'
$ s/one/
$ echo 's/one/two/three/four/five/six/g' | perl -lne '/s?/.+/g?/ && print $&'
$ s/one/two/three/four/five/six/g

this regex: std::basic_regex< char > regex ( "s?/.+?/g?" ); non-greedy
and this : std::basic_regex< char > regex ( "s?/.+/g?" ); greedy
have the same output with std::regex_match . Still both match the entire of the string!
But with std::regex_search have the different output.
Also s? or g? does not matter and with /.*?/ still matches the entire of the string!

More Detail

g++ --version
g++ (Ubuntu 6.2.0-3ubuntu11~16.04) 6.2.0 20160901

I don't see any inconsistency. regex_match tries to match the whole string, so s?/.+?/g? lazily expands till the whole string is covered.

These "diagrams" (for regex_search ) will hopefully help to get the idea of greediness:


a.*?a: ababa
a|.*?a: a|baba
a.*?|a: a|baba  # ok, let's try .*? == "" first
# can't go further, backtracking
a.*?|a: ab|aba  # lets try .*? == "b" now
a.*?a|: aba|ba
# If the regex were a.*?a$, there would be two extra backtracking
# steps such that .*? == "bab".


a.*?a: ababa
a|.*a: a|baba
a.*|a: ababa|  # try .* == "baba" first
# backtrack
a.*|a: abab|a  # try .* == "bab" now
a.*a|: ababa|

And regex_match( abc ) is like regex_search( ^abc$ ) in this case.


上一篇: 在Ruby 1.9.3中没有实现的所有通用量词{m,n} +?

下一篇: 匹配和具有奇怪行为的懒惰量词