MATCHES Predicate
The MATCHES predicate can perform complex pattern matches on text columns, including "group" or "count" matches. Group matches search for alternative groups of characters in the column, while count matches search for a specified number of occurrences of a match pattern.
Following is the syntax for the MATCHES predicate:
…WHERE MATCHES(<column>,'<group_match> | <count_match>')
The column can be a regular or delimited identifier, and must be single-valued. That is, the column cannot be a multi-valued column. The column is also limited to the properties in the property store.
You specify the text that the MATCHES predicate searches for by using a string literal enclosed in quotation marks. The MATCHES predicate can perform either group matching or count matching. However, you cannot mix the two matching styles.
The following table describes the two matching styles.
Matching style | Description |
---|---|
Group matching | Consists of search patterns that are matched zero or more times. Also supports defining alternative patterns. |
Count matching | Consists of search patterns that are matched a specified number of times. |
Escaping Special Characters
In the matches string, all the characters with special meaning must be "escaped" from their simple character meaning. The vertical bar (|) character is the escape character, and must appear immediately before the special character. The special characters that must be escaped are the asterisk (*), the question mark (?), the plus sign (+), the right and left parentheses (( and )), the right and left braces ({ and }), and the right and left square brackets ([ and ]). When a comma (,) is used inside a pattern to delimit alternative patterns, the vertical bar must escape the comma.
Creating Search Patterns
A search pattern is the basic unit of text that the MATCHES predicate searches for in the specified column. When wildcards are in the MATCHES predicate, they apply to the most recently defined pattern.
For example, the plus sign is a wildcard that matches one or more of the preceding patterns. The following pattern matches values that contain one or more "M" characters.
'M|+'
If more than one character appears before the wildcard, only the last character is affected by the wildcard. The following example matches values that begin with "softwar" followed by one or more "e" characters.
'software|+'
To increase the size of a pattern that applies to a wildcard beyond one character, enclose the pattern in escaped parentheses. For example, the following string matches columns that have one or more sequential occurrences of "Comp."
'|(Comp|)|+'
This would match "Comp", "CompComp", "CompCompComp", and other similar permutations.
You can indicate alternatives inside a pattern by separating them with an escaped comma. For example, the following string matches columns that have one or more sequential occurrences of "Computer" or "Software".
'|(Computer|,Software|)|+'
It would match "Computer", "ComputerComputer", "ComputerSoftware", "SoftwareComputer", "SoftwareSoftware", "SoftwareComputerComputerSoftwareSoftware", and so on.
In addition to defining sequences of characters to include, a pattern can also define a range or set of characters to include or exclude from the pattern. Place the desired characters inside escaped square brackets ([ and ]). To exclude a range or set of characters, place a caret (^) as the first character inside the escaped square brackets. The caret does not need to be escaped.
For example, the following string matches columns containing the letter c, followed by a vowel (a, e, i, o, or u), followed by the letter p.
'c|[aeiou|]p'
The string matches "cap", "cep", "cip", "cop", or "cup".
Similarly, the following string matches columns that have one or more of the letters in the specified range, such as "ca", "cb", "cc", "caa", "cab", "cac", "cba", "cbbbbcccaaaa", and so on.
'c|[a-c|]|+'
Using Group Matching
A group match search string consists of one or more patterns and escaped wildcards. The following table shows and describes the defined group match wildcard characters.
Character | Description |
---|---|
* | Asterisk matches zero or more occurrences of the previous search pattern. |
? | Question mark matches zero or one occurrence of the previous search pattern. |
+ | Plus sign matches one or more occurrences of the previous search pattern. |
(...) | Parentheses delimit the search pattern if the pattern consists of more than one character. They also surround a set of alternative patterns. |
{...} | Braces delimit the count parameters in a count-matching string. |
[...] | Square brackets delimit a range or set of characters in a group-matching string. |
, | Comma separates alternative patterns inside a group-matching pattern. |
The MATCHES predicate can have more than one group-matching, pattern-wildcard pair, but you cannot mix group matching with count matching.
The following example matches columns that have the word "computer" followed by zero or more occurrences of the number "75".
'computer|(75|)|*'
It matches "computer", "computer75", "computer7575", and so on.
The following example matches columns that have the word "computer" followed by zero or one "s". The pattern matches "computer" or "computers".
'computers|?'
The following example matches columns that have the word "soft" and one or more occurrences of "ware".
'soft|(ware|)|+'
The pattern matches "software", "softwareware", "softwarewareware", and so on.
The following example matches columns that have one or more sequential occurrences of "Computer" or "Software".
'|(Computer|,Software|)|+'
It matches "Computer", "ComputerComputer", "ComputerSoftware", "SoftwareComputer", "SoftwareSoftware", "SoftwareComputerComputerSoftwareSoftware", and so on.
Using Count Matching
The MATCHES predicate count matching style supports matching the search pattern a specific number of times, or a range of times. Braces ({…}) are used to indicate a count match.
A count match string consists of one or more patterns and count specifiers. You can specify a required number of matches by including the required number in the braces. The MATCHES predicate can match counts from 0 to 256.
The following example matches values that contain the pattern "software" exactly twice. That is, it matches "softwaresoftware". It does not match "software" or "softwaresoftwaresoftware".
'|(software|)|{2|}'
To specify that the pattern must be found at least the specified number of times, follow the count with a comma. The following example matches "computercomputer", "computercomputercomputer", and other similar permutations.
'|(computer|)|{2,|}'
Note Inside a pattern, the comma must be escaped to indicate it is separating two alternatives. The comma should not be escaped when it appears in a count specifier.
To specify that the pattern must be matched within a range of occurrences, include the minimum and maximum count values in the braces, separated by a comma. The following example matches from 1 to 5 occurrences of the letter o in the word "computer". It matches "computer", "coomputer", "cooomputer", "coooomputer", and "cooooomputer".
'co|{1,5|}mputer'
Note If the pattern is not surrounded by escaped parentheses, the wildcard and count specifier apply to the immediately preceding single character.