split string with regex using a release character and separators
I need to parse an EDI file, where the separators are + , : and ' signs and the escape (release) character is ? . You first split into segments
var data = "NAD+UC+ABC2378::92++XYZ Corp.:Tel ?: ?+90 555 555 11 11:Mobile1?: ?+90 555 555 22 22:Mobile2?: ?+90 555 555 41 71+Duzce+Seferihisar / IZMIR++35460+TR"
var segments = data.Split(''');
then each segment is split into segment data elements by + , then segment data elements are split into component data elements via : .
var dataElements = segments[0].Split('+');
the above sample string is not parsed correctly because of the use of release character. I have special code dealing with this, but I am thinking that this should be all doable using
Regex.Split(data, separator);
I am not familiar with Regex'es and could not find a way to do this so far. The best I came up so far is
string[] lines = Regex.Split(data, @"[^?]+");
which omits the character before + sign.
NA
U
ABC2378::9
+XYZ Corp.:Tel ?: ?+90 555 555 11 11:Mobile1?: ?+90 555 555 22 22:Mobile2?: ?+90 555 555 41 7
Duzc
Seferihisar / IZMI
+3546
TR
Correct Result Should be:
NAD
UC
ABC2378::92
XYZ Corp.:Tel ?: ?+90 555 555 11 11:Mobile1?: ?+90 555 555 22 22:Mobile2?: ?+90 555 555 41 7
Duzce
Seferihisar / IZMIR
35460
TR
So the question is this doable with Regex.Split, and what should the regex separator look like.
I can see that you want to split around plus signs + only if they are not preceded (escaped) by a question mark ? . This can be done using the following:
(?<!?)+
This matches one or more + signs if they are not preceded by a question mark ? .
Edit: The problem or bug with the previous expression if that it doesn't handle situations like ??+ or ???+ or or ????+ , in other words it doesn't handle situations where ? s are used to escape themselves.
We can solve this problem by noticing that if there is an odd number of ? preceding a + then the last one is definitely escaping the + so we must not split, but if there is an even number of ? before a plus then those cancel out each leaving the + so we should split around it.
From the previous observation we should come up with an expression that matches a + only if it is preceded by an even number of question marks ? , and here it is:
(?<!(^|[^?])(??)*?)+
string[] lines = Regex.Split(data, @"+");
would it meet the requirement??
Here is the edit for escaping the '?' before '+'.
string[] lines = Regex.Split(data, @"(?<!?)[+]+");
The '+' end the end would match multiple consecutive occurances of seperator '+'. If you want white spaces instead.
string[] lines = Regex.Split(data, @"(?<!?)[+]");
链接地址: http://www.djcxy.com/p/73858.html
