diceidea

parser

留言簿(1)

隨筆分類

文章分類

Dev log (rss)

收藏夾

隨筆檔案

閱讀排行榜

評論排行榜

常用鏈接

Others

我的外文地址 (rss)
C++ blog這里主要是國內交流，外語的嘛在msn空間里，目前只有英語，以后可能會學西班牙語，或者法語，我想學的好多啊

有用的東西

DFA和lexical analysis

對于hand written的lexical analyzer來說，NFA和DFA的運用是不可避免的，除非你的grammer十分簡單。
一旦給出了source program(也就是你想處理的character stream)的一個pattern的正則表達式，就可以構造對應的NFA，然后轉換為DFA，這個DFA就可以用來處理你的source program, 將里面能夠match這個pattern的lexeme全都找出來。按照這樣的流程，對于一種編程語言，不管是常用的語言，還是腳本語言，只要對所有的pattern構造DFA，就能夠寫出自己的lexical analyzer了。
有兩篇關于正則表達式到DFA的文章寫的很好：
1.Writing own regular expression parser By Amer Gerzic英文的
http://www.codeproject.com/KB/recipes/OwnRegExpressionsParser.aspx
有源碼
2. 《構造正則表達式引擎》新鮮出爐啦！中文的，by vczh,華南理工大學
http://m.shnenglu.com/vczh/archive/2008/05/22/50763.html
閱讀完上面兩篇文章，寫個能夠運行的lexer就不成問題了。
另外附上龍書（Compilers, principles techniques and tools）里一段token,pattern和lexeme術語的區別：
1. A t o k e n is a pair consisting of a token name and an optional attribute
value. The token name is an abstract symbol representing a kind of
lexical unit(lexeme), e.g., a particular keyword, or a sequence of input characters
denoting an identifier. The token names are the input symbols that the
parser processes. In what follows, we shall generally write the name of a
token in boldface. We will often refer to a token by its token name.
2. A pattern is a description of the form that the lexemes of a token may take.
In the case of a keyword as a token, the pattern is just the sequence of
characters that form the keyword. For identifiers and some other tokens,
the pattern is a more complex structure that is matched by many strings.
3. A lexeme is a sequence of characters in the source program that matches
the pattern for a token and is identified by the lexical analyzer as an
instance of that token.
notes:
1. more than one lexeme can match a pattern
2. 看看example 3.1

posted on 2008-05-24 13:59 diceidea 閱讀(542) 評論(0) 編輯收藏引用所屬分類: Dev log

只有注冊用戶登錄后才能發表評論。


相關文章: DFA和lexical analysis dev log(1)

網站導航: 博客園 IT新聞 BlogJava 博問 Chat2DB 管理

青青草原综合久久大伊人导航_色综合久久天天综合_日日噜噜夜夜狠狠久久丁香五月_热久久这里只有精品