Partager via


Example: Scanning for HREFs

The following example searches an input string and prints out all the href="..." values and their locations in the string. It does this by constructing a compiled Regex object and then using a Match object to iterate through all the matches in the string.

In this example, the metacharacter \s matches any space character, and \S matches any non-space character.

    Sub DumpHrefs(inputString As String)
        Dim r As Regex
        Dim m As Match
    
        r = New Regex("href\s*=\s*(?:""(?<1>[^""]*)""|(?<1>\S+))", _
            RegexOptions.IgnoreCase Or RegexOptions.Compiled)
    
        m = r.Match(inputString)
        While m.Success
            Console.WriteLine("Found href " & m.Groups(1).Value _
                & " at " & m.Groups(1).Index.ToString())
            m = m.NextMatch()
        End While
    End Sub
[C#]
    void DumpHrefs(String inputString) 
    {
        Regex r;
        Match m;

        r = new Regex("href\\s*=\\s*(?:\"(?<1>[^\"]*)\"|(?<1>\\S+))",
            RegexOptions.IgnoreCase|RegexOptions.Compiled);
        for (m = r.Match(inputString); m.Success; m = m.NextMatch()) 
        {
            Console.WriteLine("Found href " + m.Groups[1] + " at " 
                + m.Groups[1].Index);
        }
    }

Compiled Pattern

Before beginning the loop to search the string, this code example creates a Regex object for storing the compiled pattern. Because it takes some time to parse, optimize, and compile a regular expression, these tasks are done outside the loop so that they are not repeated.

Instances of the Regex class are immutable; each one corresponds to a single pattern and is stateless. This allows a single Regex instance to be shared by different functions or even by different threads.

Match Result Class

The results of a search are stored in the Match class, which provides access to all the substrings extracted by the search. It also remembers the string being searched and the regular expression being used, so it can use them to perform another search starting where the last one ended.

Explicitly Named Captures

In traditional regular expressions, capturing parentheses are automatically numbered sequentially. This leads to two problems. First, if a regular expression is modified by inserting or removing a set of parentheses, all code that refers to the numbered captures must be rewritten to reflect the new numbering. Second, because different sets of parentheses often are used to provide two alternative expressions for an acceptable match, it might be difficult to determine which of the two expressions actually returned a result.

To address these problems, Regex supports the syntax (?<name>...) for capturing a match into a specified slot (the slot can be named using a string or an integer; integers can be recalled more quickly). Thus, alternative matches for the same string all can be directed to the same place. In case of a conflict, the last match dropped into a slot is the successful match. (However, a complete list of multiple matches for a single slot is available. See the Group.Captures collection for details.)

See Also

.NET Framework Regular Expressions