• <ins id="pjuwb"></ins>
    <blockquote id="pjuwb"><pre id="pjuwb"></pre></blockquote>
    <noscript id="pjuwb"></noscript>
          <sup id="pjuwb"><pre id="pjuwb"></pre></sup>
            <dd id="pjuwb"></dd>
            <abbr id="pjuwb"></abbr>
            Matrix
            Klarke's C/C++ Home
            posts - 61,comments - 0,trackbacks - 0

            The regexp Command

            The regexp command provides direct access to the regular expression matcher. Not only does it tell you whether a string matches a pattern, it can also extract one or more matching substrings. The return value is 1 if some part of the string matches the pattern; it is 0 otherwise. Its syntax is:

            regexp ?flags? pattern string ?match sub1 sub2...?
            

            The flags are described in Table 11-6:

            Table 11-6. Options to the regexp command

            -nocase

            Lowercase characters in pattern can match either lowercase or uppercase letters in string.

            -indices

            The match variables each contain a pair of numbers that are in indices delimiting the match within string. Otherwise, the matching string itself is copied into the match variables.

            -expanded

            The pattern uses the expanded syntax discussed on page 154.

            -line

            The same as specifying both -lineanchor and -linestop.

            -lineanchor

            Change the behavior of ^ and $ so they are line-oriented as discussed on page 153.

            -linestop

            Change matching so that . and character classes do not match newlines as discussed on page 153.

            -about

            Useful for debugging. It returns information about the pattern instead of trying to match it against the input.

            --

            Signals the end of the options. You must use this if your pattern begins with -.

            The pattern argument is a regular expression as described earlier. If string matches pattern, then regexp stores the results of the match in the variables provided. These match variables are optional. If present, match is set to the part of the string that matched the pattern. The remaining variables are set to the substrings of string that matched the corresponding subpatterns in pattern. The correspondence is based on the order of left parentheses in the pattern to avoid ambiguities that can arise from nested subpatterns.

            Example 11-2 uses regexp to pick the hostname out of the DISPLAY environment variable, which has the form:

            hostname:display.screen
            
            Example 11-2 Using regular expressions to parse a string
            set env(DISPLAY) sage:0.1
            regexp {([^:]*):} $env(DISPLAY) match host
            => 1
            set match
            => sage:
            set host
            => sage
            

            The pattern involves a complementary set, [^:], to match anything except a colon. It uses repetition, *, to repeat that zero or more times. It groups that part into a subexpression with parentheses. The literal colon ensures that the DISPLAY value matches the format we expect. The part of the string that matches the complete pattern is stored into the match variable. The part that matches the subpattern is stored into host. The whole pattern has been grouped with braces to quote the square brackets. Without braces it would be:

            regexp (\[^:\]*): $env(DISPLAY) match host
            

            With advanced regular expressions the nongreedy quantifier *? can replace the complementary set:

            regexp (.*?): $env(DISPLAY) match host
            

            This is quite a powerful statement, and it is efficient. If we had only had the string command to work with, we would have needed to resort to the following, which takes roughly twice as long to interpret:

            set i [string first : $env(DISPLAY)]
            if {$i >= 0} {
            set host [string range $env(DISPLAY) 0 [expr $i-1]]
            }
            

            A Pattern to Match URLs

            Example 11-3 demonstrates a pattern with several subpatterns that extract the different parts of a URL. There are lots of subpatterns, and you can determine which match variable is associated with which subpattern by counting the left parenthesis. The pattern will be discussed in more detail after the example:

            Example 11-3 A pattern to match URLs
            set url http://www.beedub.com:80/index.html
            regexp {([^:]+)://([^:/]+)(:([0-9]+))?(/.*)} $url \
            match protocol server x port path
            => 1
            set match
            => http://www.beedub.com:80/index.html
            set protocol
            => http
            set server
            => www.beedub.com
            set x
            => :80
            set port
            => 80
            set path
            => /index.html
            

            Let's look at the pattern one piece at a time. The first part looks for the protocol, which is separated by a colon from the rest of the URL. The first part of the pattern is one or more characters that are not a colon, followed by a colon. This matches the http: part of the URL:

            [^:]+:
            

            Using nongreedy +? quantifier, you could also write that as:

            .+?:
            

            The next part of the pattern looks for the server name, which comes after two slashes. The server name is followed either by a colon and a port number, or by a slash. The pattern uses a complementary set that specifies one or more characters that are not a colon or a slash. This matches the //www.beedub.com part of the URL:

            //[^:/]+
            

            The port number is optional, so a subpattern is delimited with parentheses and followed by a question mark. An additional set of parentheses are added to capture the port number without the leading colon. This matches the :80 part of the URL:

            (:([0-9]+))?
            

            The last part of the pattern is everything else, starting with a slash. This matches the /index.html part of the URL:

            /.*
            

            Use subpatterns to parse strings.


            To make this pattern really useful, we delimit several subpatterns with parentheses:

            ([^:]+)://([^:/]+)(:([0-9]+))?(/.*)
            

            These parentheses do not change the way the pattern matches. Only the optional port number really needs the parentheses in this example. However, the regexp command gives us access to the strings that match these subpatterns. In one step regexp can test for a valid URL and divide it into the protocol part, the server, the port, and the trailing path.

            The parentheses around the port number include the : before the digits. We've used a dummy variable that gets the : and the port number, and another match variable that just gets the port number. By using noncapturing parentheses in advanced regular expressions, we can eliminate the unused match variable. We can also replace both complementary character sets with a nongreedy .+? match. Example 11-4 shows this variation:

            Example 11-4 An advanced regular expression to match URLs
            set url http://www.beedub.com:80/book/
            regexp {(.+?)://(.+?)(?::([0-9]+))?(/.*)$} $url \
            match protocol server port path
            => 1
            set match
            => http://www.beedub.com:80/book/
            set protocol
            => http
            set server
            => www.beedub.com
            set port
            => 80
            set path
            => /book/
            

            Bugs When Mixing Greedy and Non-Greedy Quantifiers

            If you have a regular expression pattern that uses both greedy and non-greedy quantifiers, then you can quickly run into trouble. The problem is that in complex cases there can be ambiguous ways to resolve the quantifiers. Unfortunately, what happens in practice is that Tcl tends to make all the quantifiers either greedy, or all of them non-greedy. Example 11-4 has a $ at the end to force the last greedy term to go to the end of the string. In theory, the greediness of the last subpattern should match all the characters out to the end of the string. In practice, Tcl makes all the quantifiers non-greedy, so the anchor is necessary to force the pattern to match to the end of the string.

            Sample Regular Expressions

            The table in this section lists regular expressions as you would use them in Tcl commands. Most are quoted with curly braces to turn off the special meaning of square brackets and dollar signs. Other patterns are grouped with double quotes and use backslash quoting because the patterns include backslash sequences like \n and \t. In Tcl 8.0 and earlier, these must be substituted by Tcl before the regexp command is called. In these cases, the equivalent advanced regular expression is also shown.

            Table 11-7. Sample regular expressions

            {^[yY]}

            Begins with y or Y, as in a Yes answer.

            {^(yes|YES|Yes)$}

            Exactly "yes", "Yes", or "YES".

            {^[^ \t:\]+:}

            Begins with colon-delimited field that has no spaces or tabs.

            {^\S+?:}

            Same as above, using \S for "not space".

            "^\[ \t]*$"

            A string of all spaces or tabs.

            {(?n)^\s*$}

            A blank line using newline sensitive mode.

            "(\n|^)\[^\n\]*(\n|$)"

            A blank line, the hard way.

            {^[A-Za-z]+$}

            Only letters.

            {^[[:alpha:]]+$}

            Only letters, the Unicode way.

            {[A-Za-z0-9_]+}

            Letters, digits, and the underscore.

            {\w+}

            Letters, digits, and the underscore using \w.

            {[][${}\\]}

            The set of Tcl special characters: ] [ $ { } \

            "\[^\n\]*\n"

            Everything up to a newline.

            {.*?\n}

            Everything up to a newline using nongreedy *?

            {\.}

            A period.

            {[][$^?+*()|\\]}

            The set of regular expression special characters:

            ] [ $ ^ ? + * ( ) | \

            <H1>(.*?)</H1>

            An H1 HTML tag. The subpattern matches the string between the tags.

            <!--.*?-->

            HTML comments.

            {[0-9a-hA-H][0-9a-hA-H]}

            2 hex digits.

            {[[:xdigit:]]{2}}

            2 hex digits, using advanced regular expressions.

            {\d{1,3}}

            1 to 3 digits, using advanced regular expressions.

            posted on 2010-09-26 17:22 Klarke 閱讀(495) 評論(0)  編輯 收藏 引用
            国产免费久久精品99久久| 国产一区二区三区久久| 精品国产91久久久久久久 | 久久青青草原精品国产软件| 欧洲国产伦久久久久久久 | 国产精品成人久久久久久久| 久久精品18| 久久国产乱子精品免费女| 久久AⅤ人妻少妇嫩草影院| 久久婷婷五月综合色高清| 久久久噜噜噜久久中文字幕色伊伊| 国产 亚洲 欧美 另类 久久| 欧美日韩精品久久免费| 人人狠狠综合久久亚洲88| 亚洲愉拍99热成人精品热久久| 久久久久一级精品亚洲国产成人综合AV区 | 久久精品九九亚洲精品天堂| 久久精品国产色蜜蜜麻豆| 久久精品国产亚洲AV香蕉| 久久99热这里只有精品66| 成人国内精品久久久久影院| 亚洲AV无码久久精品蜜桃| 中文国产成人精品久久亚洲精品AⅤ无码精品 | 国产激情久久久久影院老熟女免费 | 亚洲精品国产美女久久久| 久久久WWW成人免费精品| 国产精品九九九久久九九| 久久婷婷五月综合成人D啪| 性做久久久久久久久久久| 日韩欧美亚洲综合久久影院d3| 人妻无码αv中文字幕久久| 漂亮人妻被中出中文字幕久久| 狠狠人妻久久久久久综合蜜桃| 久久久九九有精品国产| 99久久777色| 99久久99久久精品国产片果冻| 久久精品国产91久久麻豆自制| 99精品久久久久久久婷婷| 久久久精品国产免大香伊| 久久精品国产99久久久古代| 国产毛片欧美毛片久久久|