Match comma separated list with Ruby Regex

Given the following string, I'd like to match the elements of the list and parts of the rest after the colon:

foo,bar,baz:something

Ie I am expecting the first three match groups to be "foo", "bar", "baz". No commas and no colon. The minimum number of elements is 1, and there can be arbitrarily many. Assume no whitespace and lower case.

I've tried this, which should work, but doesn't populate all the match groups for some reason:

^([az]+)(?:,([az]+))*:(something)

That matches foo in 1 and baz (or whatever the last element is) in 2. I don't understand why I don't get a match group for bar.

Any ideas?

EDIT: Ruby 1.9.3, if that matters.

EDIT2: Rubular link: http://rubular.com/r/pDhByoarbA

EDIT3: Add colon to the end, because I am not just trying to match the list. Sorry, oversimplified the problem.


这个表达式适用于我: /(w+)/i


也许split会更好的解决这种情况?

'foo,bar,baz'.split(',')
=> ["foo", "bar", "baz"]

If you want to do it with regex, how about this?

(?<=^|,)("[^"]*"|[^,]*)(?=,|$)

This matches comma-separated fields, including the possibility of commas appearing inside quoted strings like 123,"Yes, No" . Regexr for this.

More verbosely:

(?<=^|,)       # Must be preceded by start-of-line or comma
(
    "[^"]*"|   # A quote, followed by a bunch of non-quotes, followed by quote, OR
    [^,]*      # OR anything until the next comma
)
(?=,|$)        # Must end with comma or end-of-line

Usage would be with something like Python's re.findall() , which returns all non-overlapping matches in the string (working from left to right, if that matters.) Don't use it with your equivalent of re.search() or re.match() which only return the first match found.

(NOTE: This actually doesn't work in Python because the lookbehind (?<=^|,) isn't fixed width. Grr. Open to suggestions on this one.)


Edit: Use a non-capturing group to consume start-of-line or comma, instead of a lookbehind, and it works in Python.

>>> test_str = '123,456,"String","String, with, commas","Zero-width fields next",,"",nyet,123'
>>> m = re.findall('(?:^|,)("[^"]*"|[^,]*)(?=,|$)',test_str)
>>> m
['123', '456', '"String"', '"String, with, commas"',
 '"Zero-width fields next"', '', '""', 'nyet', '123']

Edit 2: The Ruby equivalent of Python's re.findall(needle, haystack) is haystack.scan(needle) .

链接地址: http://www.djcxy.com/p/76888.html

上一篇: 正则表达式只匹配字母

下一篇: 使用Ruby Regex匹配逗号分隔的列表