Matches and Groups in the Text To XML Preprocessor
For more on Text to XML, view the following: Text to XML Reference
Values from text files are transferred into XML format as Parameters. Initially Parameters could be read from Lines using starting column and number of characters. To handle widely variant column information as seen in Unicode based text files, Matches and Groups where added to the Parameter selection process.
After a line is matched while in a State, the Parameters of that line are found. If the Matches and Groups method is used, a regular expression is used repeatedly on the line until no more matches are found. Each time a the regular expression is found it is termed a “Match”.
The regular expression used in for the match can include one or more “Groups”. Each group is distinguished by parenthesis. Groups are numbered starting with one. Group zero is the complete Match.
Examples
In the following expression, one group is defined.\s{3,}(\S{1,}(\s{1,2}\S{1,}){0,})
Both groups zero and one may be selected. In group zero, you will see at least three leading spaces. Group one starts after all leading spaces and through any number of non-spaces separated by optionally one or two spaces.
Input: “one two three four five”
Regex: “\s{3,}(\S{1,}(\s{1,2}\S{1,}){0,})”
match 0 with 2 groups { one two three}
0:0{ one two three}
0:1{one two three}
0:2{ three}
match 1 with 2 groups { four}
1:0{ four}
1:1{four}
1:2{null}
match 2 with 2 groups { five}
2:0{ five}
2:1{five}
2:2{null}
Regex: “(\s\S)”
match 0 with 1 groups { one}
0:0{ one}
0:1{ one}
match 1 with 1 groups { two}
1:0{ two}
1:1{ two}
match 2 with 1 groups { three}
2:0{ three}
2:1{ three}
match 3 with 1 groups { four}
3:0{ four}
3:1{ four}
match 4 with 1 groups { five}
4:0{ five}
4:1{ five}
match 5 with 1 groups {}
5:0{}
5:1{}
Regex: “(\s\S)(\s\S)(\s\S)”
match 0 with 3 groups { one two three}
0:0{ one two three}
0:1{ one}
0:2{ two}
0:3{ three}
match 1 with 3 groups { four five}
1:0{ four five}
1:1{ four}
1:2{ five}
1:3{}
match 2 with 3 groups {}
2:0{}
2:1{}
2:2{}
2:3{}