青青草原综合久久大伊人导航_色综合久久天天综合_日日噜噜夜夜狠狠久久丁香五月_热久久这里只有精品

不會(huì)飛的鳥(niǎo)

2010年12月10日 ... 不鳥(niǎo)他們!!! 我要用自己開(kāi)發(fā)的分布式文件系統(tǒng)、分布式調(diào)度系統(tǒng)、分布式檢索系統(tǒng), 做自己的搜索引擎!!!大魚(yú)有大志!!! ---楊書(shū)童

一款簡(jiǎn)單的正則表達(dá)式處理模塊([轉(zhuǎn)]Fast-regular-expressions)

原文出處:http://www.codeproject.com/Articles/798/Fast-regular-expressions

Fast regular expressions

By , 29 Oct 2000
 

Sample Image - RexSearch.jpg

Introduction

Regular expressions are a well recognized way for describing string patterns. The following regular expression defines a floating point number with a (possibly empty) integer part, a non empty fractional part and an optional exponent:

Collapse | Copy Code
[0-9]* \.[0-9]+ ([Ee](\+|-)?[0-9]+)?

The rules for interpreting and constructing such regular expressions are explained below. A regular expression parser takes a regular expression and a source string as arguments and returns the source position of the first match. Regular expression parsers either interpret the search pattern at runtime or they compile the regular expression into an efficient internal form (known as deterministic finite automaton). The regular expression parser described here belongs to the second category. Besides being quite fast, it also supports dictionaries of regular expressions. With the definitions $Int= [0-9], $Frac= \.[0-9]+ and $Exp= ([Ee](\+|-)?[0-9]+), the above regular expression for a floating point number can be abbreviated to $Int* $Frac $Exp?.

Interface

I separated algorithmic from interface issues. The files RexAlgorithm.h and RexAlgorithm.cpp implement the regular expression parser using only standard C++ (relying on STL), whereas the file RexInterface.h and RexInterface.cpp contain the interfaces for the end user. Currently there is only one interface, implemented in the class REXI_Search. Interfaces for replace functionality and for programming language scanners are planned for future releases.

Collapse | Copy Code
struct REXI_DefErr{
    enum{eNoErr,eErrInName,eErrInRegExp} eErrCode;
    string  strErrMsg;
    int     nErrOffset;
    };
    class REXI_Search : public REXI_Base
    {
    public:
    REXI_Search(char cEos='\0');
    REXI_DefErr
    AddRegDef   (string strName,string strRegExp);
    inline  REXI_DefErr
    SetRegexp  (string strRegExp);
    bool    MatchHere   (const char*& rpcszSrc, int& nMatchLen,bool& bEos);
    bool    Find        (const char*& rpcszSrc, int& nMatchLen,bool& bEos);
    private:
    bool    MatchHereImpl();
    int     m_nIdAnswer;
    };

Example usage

Collapse | Copy Code
int main(int argc, char* argv[])
    {
    const char szTestSrc[]= "3.1415 is the same as 31415e-4";
    const int ncOk= REXI_DefErr::eNoErr;
    REXI_Search rexs;
    REXI_DefErr err;
    err= rexs.AddRegDef("$Int","[0-9]+");  assert(err.eErrCode==ncOk);
    err= rexs.AddRegDef("$Frac","\\.[0-9]+"); assert(err.eErrCode==ncOk);
    err= rexs.AddRegDef("$Exp","([Ee](\\+|-)?[0-9]+)");
    assert(err.eErrCode==ncOk);
    err= rexs.SetRegexp("($Int? $Frac $Exp?|$Int \\. $Exp?|$Int $Exp)[fFlL]?");
    assert(err.eErrCode==ncOk);
    const char*     pCur= szTestSrc;
    int             nMatchLen;
    bool            bEosFound= false;
    cout    <<  "Source text is: \""    <<  szTestSrc   << "\"" <<  endl;
    while(rexs.Find(pCur,nMatchLen,bEosFound)){
    cout <<  "Floating point number found  at position "
    <<  ((pCur-szTestSrc)-nMatchLen)
    <<  " having length "  <<  nMatchLen  <<  endl;
    }
    int i;
    cin >> i;
    return 0;
    }

Performance issues

A call to the member function REXI_Search::SetRegexp(strRegExp)involves quite a lot of computing. The regular expression strRegExp is analyzed and after several steps transformed into a compiled form. Because of this preprocessing work, which is not needed in the case of an interpreting regular expression parser, this regular expression parser shows its efficiency only when you apply it to large input strings or if you are searching again and again for the same regular expression. A typical application which profits from the preprocessing needed by this parser is a utility which searches all files in a directory.

Limitations

Currently Unicode is not supported. There is no fundamental reason for this limitation and I think that a later release will correct this. I just did not yet find an efficient representation of a compiled regular expression which supports Unicode.

Constructing regular expressions

Regular expressions can be built from characters and special symbols. There are some similarities between regular expressions and arithmetic expressions. The most basic elements of arithmetic expressions are numbers and expressions enclosed in parens ( ). The most basic elements of regular expressions are characters, regular expressions enclosed in parens ( ) and character sets. On the next higher level, arithmetic expressions have '*' and '/' operators, whereas regular expressions have operators indicating the multiplicity of the preceding element.

Most basic elements of regular expressions

  • Individual characters. e.g. "h" is a regular expression. In the string "this home" it matches the beginning of 'home'. For non printable characters, one has to use either the notation \xhh where h means a hexadecimal digit or one of the escape sequences \n \r \t \v known from "C". Because the characters * + ? . | [ ] ( ) - $ ^ have a special meaning in regular expressions, escape sequences must also be used to specify these characters literally: \*  \+  \?  \.  \|  \[  \]  \(  \)  \-  \$  \^ . Furthermore, use '\ ' to indicate a space, because this implementation skips spaces in order to support a more readable style.
  • Character sets enclosed in square brackets [ ]. e.g. "[A-Za-z_$]" matches any alphabetic character, the underscore and the dollar sign (the dash (-) indicates a range), e.g. [A-Za-z$_] matches "B", "b", "_", "$" and so on. A ^ immediately following the [ of a character set means 'form the inverse character set'. e.g. "[^0-9A-Za-z]" matches non-alphanumeric characters.
  • Expressions enclosed in round parens ( ). Any regular expression can be used on the lowest level by enclosing it in round brackets.
  • the dot . It means 'match any character'.
  • an identifier prefixed by a $. It refers to an already defined regular expression. e.g. "$Ident" stands for a user defined regular expression previously defined. Think of it as a regular expression enclosed in round parens, which has a name.

Operators indicating the multiplicity of the preceding element

Any of the above five basic regular expressions can be followed by one of the special characters * + ? /i

  • * meaning repetition (possibly zero times); e.g. "[0-9]*" not only matches "8" but also "87576" and even the empty string "".
  • + meaning at least one occurrence; e.g. "[0-9]+" matches "8", "9185278", but not the empty string.
  • ? meaning at most one occurrence; e.g. "[$_A-Z]?" matches "_", "U", "$", .. and ""
  • \i meaning ignore case

Catenation of regular expressions

The regular expressions described above can be catenated to form longer regular expressions. E.g. "[_A-Za-z][_A-Za-z0-9]*" is a regular expression which matches any identifier of the programming language "C", namely the first character must be alphabetic or an underscore and the following characters must be alphanumeric or an underscore. "[0-9]*\.[0-9]+" describes a floating point number with an arbitrary number of digits before the decimal point and at least one digit following the decimal point. (The decimal point must be preceded by a backslash, otherwise the dot would mean 'accept any character at this place'). "(Hallo (,how are you\?)?)\i" matches "Hallo" as well as "Hallo, how are you?" in a case insensitive way.

Alternative regular expressions

Finally - on the top level - regular expressions can be separated by the | character. The two regular expressions on the left and right side of the | are alternatives, meaning that either the left expression or the right expression should match the source text. E.g. "[0-9]+ | [A-Za-z_][A-Za-z_0-9]*" matches either an integer or a "C"-identifier.

A complex example

The programming language "C" defines a floating point constant in the following way: A floating point constant has the following parts: An integer part, a decimal point, a fraction, an exponential part beginning with e or E followed by an optional sign and digits and an optional type suffix formed by one the characters f, F, l, L. Either the integer part or the fractional part can be absent (but not both). Either the decimal point or the exponential part can be absent (but not both).

The corresponding regular expression is quite complex, but it can be simplified by using the following definitions:

Collapse | Copy Code
$Int = "[0-9]+."
    $Frac= "\.[0-9]+".
    $Exp = "([Ee](\+|-)?[0-9]+)".

So we get the following expression for a floating point constant:

Collapse | Copy Code
($Int? $Frac $Exp?|$Int \. $Exp?|$Int $Exp)[fFlL]?

posted on 2013-01-08 19:45 不會(huì)飛的鳥(niǎo) 閱讀(392) 評(píng)論(0)  編輯 收藏 引用


只有注冊(cè)用戶登錄后才能發(fā)表評(píng)論。
網(wǎng)站導(dǎo)航: 博客園   IT新聞   BlogJava   博問(wèn)   Chat2DB   管理


青青草原综合久久大伊人导航_色综合久久天天综合_日日噜噜夜夜狠狠久久丁香五月_热久久这里只有精品
  • <ins id="pjuwb"></ins>
    <blockquote id="pjuwb"><pre id="pjuwb"></pre></blockquote>
    <noscript id="pjuwb"></noscript>
          <sup id="pjuwb"><pre id="pjuwb"></pre></sup>
            <dd id="pjuwb"></dd>
            <abbr id="pjuwb"></abbr>
            欧美国产成人精品| 99国产精品一区| 午夜在线观看欧美| 欧美日韩你懂的| 欧美激情女人20p| 欧美成人日韩| 欧美日韩免费高清| 国产精品国产福利国产秒拍| 国产精品亚洲片夜色在线| 国产日韩欧美在线观看| 国内外成人免费激情在线视频网站| 国产精品日韩高清| 国产在线视频欧美| 亚洲国产高清高潮精品美女| 亚洲日本一区二区三区| 亚洲午夜高清视频| 久久爱另类一区二区小说| 老牛嫩草一区二区三区日本| 欧美激情一区二区三区不卡| 99视频在线观看一区三区| 亚洲欧美制服另类日韩| 久久中文欧美| 国产欧美69| 亚洲精品小视频在线观看| 亚洲欧美日韩在线一区| 欧美国产视频在线观看| 亚洲欧美日韩一区| 欧美黄色一区二区| 国产性做久久久久久| 99re6热只有精品免费观看| 久久精品国产视频| 99精品99| 欧美岛国激情| 精品99视频| 午夜精品久久久| 亚洲国产中文字幕在线观看| 一二三区精品福利视频| 久久色中文字幕| 国产精品日本一区二区| 一本色道久久精品| 欧美成人精品1314www| 亚洲欧美成人一区二区三区| 欧美久久久久久久| 亚洲激情二区| 免费成人在线观看视频| 亚洲欧美中文日韩在线| 欧美视频中文一区二区三区在线观看 | 中文一区二区| 欧美国产日本韩| 久久裸体艺术| 韩日在线一区| 久久久久久夜| 亚洲欧美日韩国产综合在线| 欧美性久久久| 亚洲午夜久久久| 亚洲国产成人av| 美女脱光内衣内裤视频久久网站| 亚洲综合日本| 日韩视频在线一区| 欧美激情视频网站| 亚洲免费观看| 亚洲精品极品| 欧美精品一区二区三区一线天视频 | 亚洲日韩中文字幕在线播放| 久久综合一区| 久久久久久久久岛国免费| 好吊成人免视频| 欧美成人精品1314www| 久久综合免费视频影院| 亚洲电影观看| 亚洲国产成人tv| 你懂的视频一区二区| 亚洲日韩视频| 亚洲视频axxx| 国产一区二区三区日韩| 麻豆freexxxx性91精品| 久久精品国产精品亚洲精品| 尤物在线精品| 亚洲欧洲日韩在线| 国产精品国色综合久久| 久久精品一二三区| 久久综合国产精品台湾中文娱乐网| 亚洲黄色成人久久久| 亚洲精品自在在线观看| 国产精品久久久久毛片软件| 久久精品国产96久久久香蕉| 久久综合久久综合久久综合| 国产精品99久久久久久久vr| 亚洲一区二区在线播放| 狠狠色狠狠色综合系列| 亚洲成色精品| 欧美性大战久久久久久久蜜臀| 欧美主播一区二区三区美女 久久精品人 | 午夜精品久久久久久99热软件| 国产主播一区二区三区| 亚洲电影专区| 国产日韩精品视频一区| 欧美黄色一级视频| 国产精品美女一区二区| 麻豆成人综合网| 欧美日韩国产123| 久久久久久久综合| 欧美日韩亚洲国产一区| 快播亚洲色图| 国产精品美女在线观看| 欧美激情无毛| 国产精品萝li| 亚洲激情视频网| 狠狠做深爱婷婷久久综合一区 | 久久久久国产一区二区| 蜜乳av另类精品一区二区| 亚洲欧美另类在线| 欧美成人午夜视频| 久久久久久欧美| 国产精品高潮久久| 亚洲欧洲在线看| 精品成人在线视频| 亚洲图片欧洲图片日韩av| 亚洲人成网站精品片在线观看| 午夜精品一区二区三区在线| 夜夜嗨网站十八久久| 久久综合福利| 欧美+日本+国产+在线a∨观看| 国产酒店精品激情| 一区二区三区成人| 中国av一区| 欧美人牲a欧美精品| 欧美xx视频| 国内欧美视频一区二区| 香蕉成人伊视频在线观看| 一区二区三区四区五区精品| 免费欧美视频| 欧美国产亚洲精品久久久8v| 伊人久久综合97精品| 欧美伊人影院| 久久精品亚洲一区二区| 国产日产高清欧美一区二区三区| 一区二区三区高清不卡| 亚洲午夜精品久久| 国产精品久久久久三级| 亚洲永久在线| 欧美怡红院视频| 国产亚洲福利一区| 欧美一区二区视频网站| 久久亚洲一区二区三区四区| 国产视频一区三区| 欧美一区2区三区4区公司二百| 欧美在线视频观看| 国产综合av| 久久五月激情| 91久久久久| 亚洲视频在线免费观看| 国产精品每日更新| 小嫩嫩精品导航| 久久综合久久综合久久综合| 一色屋精品视频在线看| 蜜臀av一级做a爰片久久| 亚洲级视频在线观看免费1级| 日韩五码在线| 国产伦理一区| 另类国产ts人妖高潮视频| 亚洲国产成人久久综合| 制服丝袜亚洲播放| 国内精品国语自产拍在线观看| 久久综合久久综合久久| 99这里只有精品| 久久久999国产| 亚洲人体影院| 国产麻豆精品theporn| 久久精品99国产精品| 欧美激情第二页| 亚洲欧美国产77777| 韩国精品久久久999| 欧美精品成人| 久久爱另类一区二区小说| 最近中文字幕日韩精品| 欧美一区二区三区在线免费观看 | 欧美一级淫片aaaaaaa视频| 国产一区二区三区在线观看免费视频 | 久久久久久久久蜜桃| 99国产精品久久久久老师| 国产亚洲精品美女| 欧美二区在线| 欧美伊人久久| 亚洲视频在线免费观看| 嫩草成人www欧美| 午夜亚洲一区| av成人老司机| 激情欧美丁香| 国产欧美精品一区二区三区介绍| 欧美.com| 久久久久久久综合狠狠综合| 一片黄亚洲嫩模| 亚洲国产精品123| 久久精品国产视频| 午夜亚洲伦理| 亚洲夜间福利| 99国产精品99久久久久久粉嫩| 国产香蕉久久精品综合网| 国产精品www色诱视频|