青青草原综合久久大伊人导航_色综合久久天天综合_日日噜噜夜夜狠狠久久丁香五月_热久久这里只有精品

不會飛的鳥

2010年12月10日 ... 不鳥他們!!! 我要用自己開發的分布式文件系統、分布式調度系統、分布式檢索系統, 做自己的搜索引擎!!!大魚有大志!!! ---楊書童

一款簡單的正則表達式處理模塊([轉]Fast-regular-expressions)

原文出處:http://www.codeproject.com/Articles/798/Fast-regular-expressions

Fast regular expressions

By , 29 Oct 2000
 

Sample Image - RexSearch.jpg

Introduction

Regular expressions are a well recognized way for describing string patterns. The following regular expression defines a floating point number with a (possibly empty) integer part, a non empty fractional part and an optional exponent:

Collapse | Copy Code
[0-9]* \.[0-9]+ ([Ee](\+|-)?[0-9]+)?

The rules for interpreting and constructing such regular expressions are explained below. A regular expression parser takes a regular expression and a source string as arguments and returns the source position of the first match. Regular expression parsers either interpret the search pattern at runtime or they compile the regular expression into an efficient internal form (known as deterministic finite automaton). The regular expression parser described here belongs to the second category. Besides being quite fast, it also supports dictionaries of regular expressions. With the definitions $Int= [0-9], $Frac= \.[0-9]+ and $Exp= ([Ee](\+|-)?[0-9]+), the above regular expression for a floating point number can be abbreviated to $Int* $Frac $Exp?.

Interface

I separated algorithmic from interface issues. The files RexAlgorithm.h and RexAlgorithm.cpp implement the regular expression parser using only standard C++ (relying on STL), whereas the file RexInterface.h and RexInterface.cpp contain the interfaces for the end user. Currently there is only one interface, implemented in the class REXI_Search. Interfaces for replace functionality and for programming language scanners are planned for future releases.

Collapse | Copy Code
struct REXI_DefErr{
    enum{eNoErr,eErrInName,eErrInRegExp} eErrCode;
    string  strErrMsg;
    int     nErrOffset;
    };
    class REXI_Search : public REXI_Base
    {
    public:
    REXI_Search(char cEos='\0');
    REXI_DefErr
    AddRegDef   (string strName,string strRegExp);
    inline  REXI_DefErr
    SetRegexp  (string strRegExp);
    bool    MatchHere   (const char*& rpcszSrc, int& nMatchLen,bool& bEos);
    bool    Find        (const char*& rpcszSrc, int& nMatchLen,bool& bEos);
    private:
    bool    MatchHereImpl();
    int     m_nIdAnswer;
    };

Example usage

Collapse | Copy Code
int main(int argc, char* argv[])
    {
    const char szTestSrc[]= "3.1415 is the same as 31415e-4";
    const int ncOk= REXI_DefErr::eNoErr;
    REXI_Search rexs;
    REXI_DefErr err;
    err= rexs.AddRegDef("$Int","[0-9]+");  assert(err.eErrCode==ncOk);
    err= rexs.AddRegDef("$Frac","\\.[0-9]+"); assert(err.eErrCode==ncOk);
    err= rexs.AddRegDef("$Exp","([Ee](\\+|-)?[0-9]+)");
    assert(err.eErrCode==ncOk);
    err= rexs.SetRegexp("($Int? $Frac $Exp?|$Int \\. $Exp?|$Int $Exp)[fFlL]?");
    assert(err.eErrCode==ncOk);
    const char*     pCur= szTestSrc;
    int             nMatchLen;
    bool            bEosFound= false;
    cout    <<  "Source text is: \""    <<  szTestSrc   << "\"" <<  endl;
    while(rexs.Find(pCur,nMatchLen,bEosFound)){
    cout <<  "Floating point number found  at position "
    <<  ((pCur-szTestSrc)-nMatchLen)
    <<  " having length "  <<  nMatchLen  <<  endl;
    }
    int i;
    cin >> i;
    return 0;
    }

Performance issues

A call to the member function REXI_Search::SetRegexp(strRegExp)involves quite a lot of computing. The regular expression strRegExp is analyzed and after several steps transformed into a compiled form. Because of this preprocessing work, which is not needed in the case of an interpreting regular expression parser, this regular expression parser shows its efficiency only when you apply it to large input strings or if you are searching again and again for the same regular expression. A typical application which profits from the preprocessing needed by this parser is a utility which searches all files in a directory.

Limitations

Currently Unicode is not supported. There is no fundamental reason for this limitation and I think that a later release will correct this. I just did not yet find an efficient representation of a compiled regular expression which supports Unicode.

Constructing regular expressions

Regular expressions can be built from characters and special symbols. There are some similarities between regular expressions and arithmetic expressions. The most basic elements of arithmetic expressions are numbers and expressions enclosed in parens ( ). The most basic elements of regular expressions are characters, regular expressions enclosed in parens ( ) and character sets. On the next higher level, arithmetic expressions have '*' and '/' operators, whereas regular expressions have operators indicating the multiplicity of the preceding element.

Most basic elements of regular expressions

  • Individual characters. e.g. "h" is a regular expression. In the string "this home" it matches the beginning of 'home'. For non printable characters, one has to use either the notation \xhh where h means a hexadecimal digit or one of the escape sequences \n \r \t \v known from "C". Because the characters * + ? . | [ ] ( ) - $ ^ have a special meaning in regular expressions, escape sequences must also be used to specify these characters literally: \*  \+  \?  \.  \|  \[  \]  \(  \)  \-  \$  \^ . Furthermore, use '\ ' to indicate a space, because this implementation skips spaces in order to support a more readable style.
  • Character sets enclosed in square brackets [ ]. e.g. "[A-Za-z_$]" matches any alphabetic character, the underscore and the dollar sign (the dash (-) indicates a range), e.g. [A-Za-z$_] matches "B", "b", "_", "$" and so on. A ^ immediately following the [ of a character set means 'form the inverse character set'. e.g. "[^0-9A-Za-z]" matches non-alphanumeric characters.
  • Expressions enclosed in round parens ( ). Any regular expression can be used on the lowest level by enclosing it in round brackets.
  • the dot . It means 'match any character'.
  • an identifier prefixed by a $. It refers to an already defined regular expression. e.g. "$Ident" stands for a user defined regular expression previously defined. Think of it as a regular expression enclosed in round parens, which has a name.

Operators indicating the multiplicity of the preceding element

Any of the above five basic regular expressions can be followed by one of the special characters * + ? /i

  • * meaning repetition (possibly zero times); e.g. "[0-9]*" not only matches "8" but also "87576" and even the empty string "".
  • + meaning at least one occurrence; e.g. "[0-9]+" matches "8", "9185278", but not the empty string.
  • ? meaning at most one occurrence; e.g. "[$_A-Z]?" matches "_", "U", "$", .. and ""
  • \i meaning ignore case

Catenation of regular expressions

The regular expressions described above can be catenated to form longer regular expressions. E.g. "[_A-Za-z][_A-Za-z0-9]*" is a regular expression which matches any identifier of the programming language "C", namely the first character must be alphabetic or an underscore and the following characters must be alphanumeric or an underscore. "[0-9]*\.[0-9]+" describes a floating point number with an arbitrary number of digits before the decimal point and at least one digit following the decimal point. (The decimal point must be preceded by a backslash, otherwise the dot would mean 'accept any character at this place'). "(Hallo (,how are you\?)?)\i" matches "Hallo" as well as "Hallo, how are you?" in a case insensitive way.

Alternative regular expressions

Finally - on the top level - regular expressions can be separated by the | character. The two regular expressions on the left and right side of the | are alternatives, meaning that either the left expression or the right expression should match the source text. E.g. "[0-9]+ | [A-Za-z_][A-Za-z_0-9]*" matches either an integer or a "C"-identifier.

A complex example

The programming language "C" defines a floating point constant in the following way: A floating point constant has the following parts: An integer part, a decimal point, a fraction, an exponential part beginning with e or E followed by an optional sign and digits and an optional type suffix formed by one the characters f, F, l, L. Either the integer part or the fractional part can be absent (but not both). Either the decimal point or the exponential part can be absent (but not both).

The corresponding regular expression is quite complex, but it can be simplified by using the following definitions:

Collapse | Copy Code
$Int = "[0-9]+."
    $Frac= "\.[0-9]+".
    $Exp = "([Ee](\+|-)?[0-9]+)".

So we get the following expression for a floating point constant:

Collapse | Copy Code
($Int? $Frac $Exp?|$Int \. $Exp?|$Int $Exp)[fFlL]?

posted on 2013-01-08 19:45 不會飛的鳥 閱讀(392) 評論(0)  編輯 收藏 引用

青青草原综合久久大伊人导航_色综合久久天天综合_日日噜噜夜夜狠狠久久丁香五月_热久久这里只有精品
  • <ins id="pjuwb"></ins>
    <blockquote id="pjuwb"><pre id="pjuwb"></pre></blockquote>
    <noscript id="pjuwb"></noscript>
          <sup id="pjuwb"><pre id="pjuwb"></pre></sup>
            <dd id="pjuwb"></dd>
            <abbr id="pjuwb"></abbr>
            久久久亚洲欧洲日产国码αv| 亚洲一区二区三区精品在线观看| 久久激情视频免费观看| 亚洲一区在线播放| 国产日韩专区| 亚洲精品欧美在线| 亚洲欧美不卡| 久久人人爽国产| 黄色另类av| 午夜精品在线| 亚洲福利专区| 国产精品天天摸av网| 亚洲四色影视在线观看| 亚洲国产精品嫩草影院| 欧美插天视频在线播放| 欧美在线国产| 国产日韩在线视频| 久久躁日日躁aaaaxxxx| 久久久青草青青国产亚洲免观| 国产一区二区三区日韩欧美| 午夜一级久久| 在线中文字幕一区| 久久精品国产77777蜜臀| 欧美激情在线播放| 亚洲国产高清自拍| 久久精品夜夜夜夜久久| 亚洲男人的天堂在线| 亚洲电影视频在线| 亚洲在线观看| 亚洲欧美制服另类日韩| 午夜日韩视频| 亚洲人午夜精品| 亚洲婷婷国产精品电影人久久| 亚洲人成人77777线观看| 欧美性jizz18性欧美| 美女亚洲精品| 亚洲一区二区三区国产| 亚洲深夜福利网站| 一区二区高清视频在线观看| 久久国产主播| 亚洲电影免费观看高清完整版| 一区二区三区国产精华| 亚洲激情一区二区| 欧美本精品男人aⅴ天堂| 亚洲电影成人| 日韩亚洲精品视频| 一本一道久久综合狠狠老精东影业| 久久一区精品| 亚洲欧美日韩国产成人| 香蕉国产精品偷在线观看不卡| 久久久精品一区| 午夜精品久久| 亚洲精选久久| 久久久久国产精品一区三寸| 欧美国产视频在线| 久久久五月天| 欧美视频中文字幕在线| 国产美女精品视频| 日韩视频在线观看免费| 久久中文久久字幕| 欧美一区激情| 久热国产精品视频| 麻豆精品传媒视频| 亚洲综合精品四区| 欧美美女bb生活片| 国产精品无码永久免费888| 国内一区二区三区在线视频| 亚洲精品日韩欧美| 久久婷婷麻豆| 国产伦精品一区二区三区免费| 国产在线精品自拍| 久久精品国产77777蜜臀| 亚洲一区免费观看| 国产精品久久久久aaaa| 亚洲日本成人在线观看| 蜜臀va亚洲va欧美va天堂 | 午夜精品久久久久久久久久久| 亚洲综合色网站| 中文在线资源观看网站视频免费不卡 | 日韩一级视频免费观看在线| 国产精品伦一区| 亚洲一区欧美| 亚洲欧美日韩国产一区| 国产视频一区三区| 久久久xxx| 亚洲黄页视频免费观看| 国产一区二区三区四区在线观看| 香蕉成人伊视频在线观看 | 亚洲国产日韩一区| 亚洲国产精品一区制服丝袜| 欧美一级大片在线免费观看| 国内精品国产成人| 亚洲国产精品美女| 亚洲看片一区| 亚洲人被黑人高潮完整版| 久久国产精品99国产| 一区二区三区四区五区视频| 欧美黄色大片网站| 国产乱码精品| 久久国产一区| 国语对白精品一区二区| 亚洲国产精品激情在线观看| 久久亚洲视频| 欧美成人精品h版在线观看| 久久精品水蜜桃av综合天堂| 久久久久久久综合日本| 久久综合一区| 欧美激情一区二区| 能在线观看的日韩av| 欧美大片在线观看| 最新国产の精品合集bt伙计| 欧美jizz19hd性欧美| 久久精品动漫| 亚洲国产一区在线| 亚洲欧美日韩视频二区| 午夜精品视频在线| 一区二区在线观看视频在线观看| 亚洲综合成人在线| 亚洲专区一区二区三区| 亚洲精品国产日韩| 亚洲麻豆国产自偷在线| 欧美精品在线极品| 欧美影院久久久| 一本色道久久加勒比88综合| 久久精品国产99| 亚洲欧洲日韩女同| 欧美a级片网| 中国亚洲黄色| 日韩小视频在线观看| 亚洲午夜一区| 国产一区二区毛片| 国产精品亚洲综合天堂夜夜| 老鸭窝亚洲一区二区三区| 在线观看免费视频综合| 欧美国产先锋| 久久综合色88| 另类成人小视频在线| 久久国产乱子精品免费女| 开心色5月久久精品| 亚洲国产精品精华液2区45| 老司机久久99久久精品播放免费 | 在线日韩欧美视频| 欧美性事免费在线观看| 欧美久久久久久久久| 久久gogo国模裸体人体| 亚洲欧洲三级电影| 老牛嫩草一区二区三区日本| 亚洲欧美一区二区视频| 午夜精品三级视频福利| 欧美一区二区三区免费观看| 亚洲一区二区在线免费观看视频| 日韩午夜中文字幕| 亚洲欧美综合一区| 欧美大片免费观看在线观看网站推荐| 亚洲第一区色| 亚洲人线精品午夜| 亚洲欧美日韩在线综合| 久久国产精品第一页| 亚洲主播在线| 欧美激情亚洲综合一区| 久久中文精品| 欧美成人一区二区| 亚洲免费观看高清完整版在线观看熊 | 欧美在线看片| 午夜久久久久| 欧美特黄视频| 亚洲精品一区在线观看香蕉| 日韩亚洲精品在线| 亚洲精品日本| 亚洲午夜一二三区视频| 免费试看一区| 久久爱www久久做| 国产亚洲欧美色| 亚洲免费网址| 久久人人爽爽爽人久久久| 亚洲第一二三四五区| 亚洲无毛电影| 欧美激情国产日韩| 国产主播精品| 亚洲一区二区三区中文字幕| 麻豆精品一区二区综合av| 香蕉久久精品日日躁夜夜躁| 欧美午夜精品一区| 亚洲欧美日韩另类| 一区二区av在线| 国产女人精品视频| 午夜精品久久久久久久久久久久| 亚洲大胆视频| 国产免费观看久久| 免费看成人av| 国产一区二区三区观看 | 久久这里只精品最新地址| 国产精品男女猛烈高潮激情 | 美女视频网站黄色亚洲| 日韩午夜电影在线观看| 国产精品入口| 欧美国产一区二区| 国产欧美一区二区三区视频| 久久激情中文| 欧美自拍偷拍|