隨筆-341 評論-2670 文章-0 trackbacks-0

    今天在測試封裝在FreeScript內的正則表達式接口的時候發現了一個垃圾收集器的Bug，不過很容易就看出來了，于是立刻fix掉。出錯的原因在于垃圾收集的時候只標記了運算堆棧的內容，忘了標記調用堆棧的內容。

    這個新的Syngram包含了三個工具，分別是正則表達式、詞法分析器和語法分析器。

    正則表達式分純、安全和貪婪三種。純正則表達式僅僅用于匹配，速度非常快（以前的測試表明一秒鐘可以匹配44萬次），但是沒有預查和捕獲等功能。安全和貪婪兩種正則表達式則是通過不同的搜索方法來匹配字符串的內容，雖然慢了一點，不過有了預查和捕獲等功能。之前的文章有提到過關于一個少回溯多捕獲的測試用例下的速度。安全分析法回溯將會占用很多時間，而貪婪分析法則回溯基本是沒什么消耗的。

    詞法分析器則可以輸入不同的正則表達式，然后將字符串切割成匹配和不匹配的段落，并告訴你匹配的部分實際上是匹配了哪一條正則表達式。這個功能在分析很多字符串的時候都是相當好用的。

    至于語法分析器，則是實現了一個上下文無關文法庫。語法分析器可以通過接受支持Choice、Sequence、Option等操作的上下文無關文法（介于BNF和EBNF中間的一種表示）來講一個字符串分析之后執行用戶指定的語義規則。自己以前寫的Syngram for C++的一大優勢是支持左遞歸，使用Boost::Spirit來分析具有明顯左遞歸性質的文法的話你將不得不接受一個線性表來表示原本應該是樹的結構，這樣的話很難簡潔地搞定很多復雜的分析過程。Syngram for C++解決了這個問題。于是我將Syngram for C++包裝進了FreeScript，于是腳本也具有這個能力來分析復雜但是有遞歸結構的字符串了。

    在此貼一個例子。正則表達式大家都很熟悉就不貼了，這里貼一個語法分析器分析四則運算式子的FreeScript代碼：

1 _run(_global,readfile(apppath++"Syngram.free"));
2 using Syngram;
3
4 Parser=Syner.new();
5
6 Parser.SetDiscard("\\s+");
7 Parser.SetToken("number","\\d+(.\\d+)?");
8 Parser.SetToken("left","\\(");
9 Parser.SetToken("right","\\)");
10 Parser.SetToken("add","\\+|\\-");
11 Parser.SetToken("mul","\\*|/");
12 Parser.SetDefaultError("未知錯誤。");
13 Parser.SetUnexpectedEndError("表達式過早結束。");
14
15 Parser.SetRule("TERM","number",func(items)
16 {
17     return items[0];
18 });
19 Parser.SetRule("TERM","left EXP:\"括號后缺少表達式。\" right:\"缺少右括號。\"",func(items)
20 {
21     return items[1];
22 });
23 Parser.SetRule("TERM","add TERM:\"單目操作符后缺少表達式。\"",func(items)
24 {
25     if(items[0]=="+")
26         return items[1];
27     else
28         return -items[1];
29 });
30 Parser.SetRule("FACTOR","TERM",func(items)
31 {
32     return items[0];
33 });
34 Parser.SetRule("FACTOR","FACTOR mul TERM:\"雙目操作符后缺少表達式。\"",func(items)
35 {
36     if(items[1]=="*")
37         return items[0]*items[2];
38     else
39         return items[0]/items[2];
40 });
41 Parser.SetRule("EXP","FACTOR",func(items)
42 {
43     return items[0];
44 });
45 Parser.SetRule("EXP","EXP add FACTOR:\"雙目操作符后缺少表達式。\"",func(items)
46 {
47     if(items[1]=="+")
48         return items[0]+items[2];
49     else
50         return items[0]-items[2];
51 });
52
53 Parser.Initialize("EXP");
54
55 try_catch(
56     func()
57     {
58         writeln(Parser.Parse(read("輸入一個四則運算式子：")));
59     },
60     func(errmsg)
61     {
62         writeln("格式錯誤：",errmsg);
63     }
64 );

這段程序輸入一個四則運算式子，如果輸入錯誤則顯示配置進去的相應的錯誤信息，否則則使用綁定的語義規則（Parse.SetRule中的func(items)）來計算整個式子的結果。Syngram for C++的文法并不是使用字符串表示的，但是Syner在開發的時候FreeScript尚未實現操作符重載，于是就算了，懶得重新封裝一個。封裝的那一層用了Syngram for C++實現了字符串到文法的分析器，然后套上一層新的Syngram for C++來分析FreeScript的代碼所要分析的內容。事實上這個分析器是Syngram²。

好了，現在貼出Syngram for FreeScript的代碼：

  1 /****************************************************************
  2 本庫需要【Collections.free】的支持
  3
  4 RegexpMatch：正則表達式匹配結果
  5     Captures            ：匿名捕獲只讀表
  6     Storages            ：命名捕獲多值表
  7     Position            ：匹配位置
  8     Text                ：匹配結果
  9     Matched                ：是否成功匹配
10 RegexpBase：正則表達式基類
11     Find({value}Text)        ：在字符串中尋找所有匹配的只讀表
12     Split({value}Text)        ：使用正則表達式分割字符串的只讀表
13     Cut({value}Text)        ：將字符串分割成匹配或不匹配正則表達式的部分的只讀表
14     Match({value}Text)        ：在字符串中尋找第一個匹配
15     MatchHead({value}Text)        ：返回從第一個字符開始的匹配
16     MatchWhole({value}Text)        ：返回匹配整個字符串的匹配
17
18 RegexpPure：純匹配正則表達式
19     constructor({value}Expression)    ：使用字符串構造正則表達式
20 RegexpSafe：安全正則表達式
21     constructor({value}Expression)    ：使用字符串構造正則表達式
22 RegexpGreed：貪婪正則表達式
23     constructor({value}Expression)    ：使用字符串構造正則表達式
24
25 LexerToken：詞法分析器記號
26     Data                ：自定義數據
27     Position            ：位置
28     Text                ：記號內容
29 Lexer：詞法分析器
30     constructor()            ：構造詞法分析器
31     Add({value}Exp,Data)        ：添加類型并綁定自定義數據
32     Initialize()            ：初始化
33     Parse({value}Input)        ：分析字符串，返回LexerToken的只讀表
34
35
36 Syner：上下文無關文法分析器
37     SetDiscard(Regex)        ：設置詞法分析后需要刪掉的記號的正則表達式
38     SetToken(Name,Regex)        ：設置有效記號的名字和對應的正則表達式
39     SetRule(Name,Rule,Func)        ：設置推導式的名字、推導式和語義回調函數
40     Initialize(Nonterminator)    ：設置初始符號并初始化
41     IsReady()            ：返回是否已經初始化
42     Parse(Text)            ：分析字符串
43     SetDefaultError(Text)        ：一般錯誤返回的消息
44     SetUnexpectedEndError(Text)    ：過早結束返回的消息
45 ****************************************************************/
46 Syngram=namespace
47 {
48     fixed RegexpMatch=class()
49     {
50         local Captures=null;
51         local Storages=null;
52         local Position=null;
53         local Text=null;
54         local Matched=null;
55
56         local constructor=func(Item)
57         {
58             Matched=matched(Item);
59             Text=text(Item);
60             Position=pos(Item);
61             Captures=ReadonlyList.new(catched(Item));
62             Storages=MultiMap.new();
63             for(name in allstorages(Item))
64                 Storages.Add(name,storage(Item,name));
65         };
66     };
67
68     fixed RegexpBase=class()
69     {
70         local Find=null;
71         local Split=null;
72         local Cut=null;
73         local Match=null;
74         local MatchHead=null;
75         local MatchWhole=null;
76
77         local constructor=func()
78         {
79             local Engine=null;
80
81             local TransformResult=multifunc
82             {
83                 func({array}Items)
84                 {
85                     return ReadonlyList.new(Items).Map(func(Item)return RegexpMatch.new(Item););
86                 }
87                 func(Item)
88                 {
89                     return RegexpMatch.new(Item);
90                 }
91             };
92
93             Find=func({value}Text)
94             {
95                 return TransformResult(find(Engine,Text));
96             };
97
98             Split=func({value}Text)
99             {
100                 return TransformResult(split(Engine,Text));
101             };
102
103             Cut=func({value}Text)
104             {
105                 return TransformResult(cut(Engine,Text));
106             };
107
108             Match=func({value}Text)
109             {
110                 return TransformResult(match(Engine,Text));
111             };
112
113             MatchHead=func({value}Text)
114             {
115                 return TransformResult(matchhead(Engine,Text));
116             };
117
118             MatchWhole=func({value}Text)
119             {
120                 return TransformResult(matchwhole(Engine,Text));
121             };
122
123             return func(Regexp)
124             {
125                 Engine=Regexp;
126             };
127         }();
128     };
129
130     fixed RegexpPure=class(RegexpBase)
131     {
132         local constructor=func({value}Expression)
133         {
134             base.constructor(regexppure(Expression));
135         };
136     };
137
138     fixed RegexpSafe=class(RegexpBase)
139     {
140         local constructor=func({value}Expression)
141         {
142             base.constructor(regexpsafe(Expression));
143         };
144     };
145
146     fixed RegexpGreed=class(RegexpBase)
147     {
148         local constructor=func({value}Expression)
149         {
150             base.constructor(regexpgreed(Expression));
151         };
152     };
153
154     fixed LexerToken=class()
155     {
156         local Data=null;
157         local Position=-1;
158         local Text="";
159     };
160
161     fixed Lexer=class()
162     {
163         local Add=null;
164         local Initialize=null;
165         local Parse=null;
166
167         local constructor=func()
168         {
169             local DataMap=Map.new();
170             local Engine=lexercreate();
171
172             local TransformResult=func(Item)
173             {
174                 local Result=LexerToken.new();
175                 Result.Position=Item.Position;
176                 Result.Text=Item.Text;
177                 if(Item.Type!=-1)
178                     Result.Data=DataMap[Item.Type];
179                 return Result;
180             };
181
182             Add=func({value}Expression,Data)
183             {
184                 DataMap.Add(lexeradd(Engine,Expression),Data);
185             };
186
187             Initialize=func()
188             {
189                 lexerbuild(Engine);
190             };
191
192             Parse=func({value}Text)
193             {
194                 return ReadonlyList.new(lexerparse(Engine,Text)).Map(TransformResult);
195             };
196
197             return func()
198             {
199             };
200         }();
201     };
202
203     fixed Syner=class()
204     {
205         local SetDiscard=null;        /*設置詞法分析后需要刪掉的記號類型*/
206         local SetToken=null;        /*設置有效記號*/
207         local SetRule=null;            /*設置推到規則以及綁定該規則的語義處理函數*/
208         local Initialize=null;        /*設置起始非終結符并完成整個分析器的建立*/
209         local IsReady=null;            /*返回是否已經完成分析器的建立*/
210         local Parse=null;            /*分析一個字符串并返回該字符串經過語義處理函數處理的結果*/
211         local SetDefaultError=null;        /*設置一般錯誤拋出的異常*/
212         local SetUnexpectedEndError=null;    /*設置由于表達式不完整導致的錯誤拋出的異常*/
213
214         constructor=func()
215         {
216             local _IsReady=false;
217             local _Grammar="";
218             local _Processors=[];
219             local _RuleCount=0;
220             local _Analyzer=null;
221
222             local _TextProcess=func(Text)
223             {
224                 local Result="";
225                 for(c in Text)
226                 {
227                     if(c=="\"")Result=Result++"\\\"";
228                     else             Result=Result++c;
229                 }
230                 return Result;
231             };
232
233             SetDiscard=func(Regex)
234             {
235                 if(!_IsReady)
236                 {
237                     _Grammar=_Grammar++"discard "++Regex++"\r\n";
238                 }
239             };
240
241             SetToken=func(Name,Regex)
242             {
243                 if(!_IsReady)
244                 {
245                     _Grammar=_Grammar++Name++"="++Regex++"\r\n";
246                 }
247             };
248
249             SetRule=func(Name,Rule,Func)
250             {
251                 if(!_IsReady)
252                 {
253                     local NonTerminator=Name++"._"++_RuleCount;
254                     _RuleCount=_RuleCount+1;
255                     _Grammar=_Grammar++NonTerminator++"->"++Rule++"\r\n";
256                     _Processors[#_Processors:0]=[[NonTerminator,Func]];
257                 }
258             };
259
260             Initialize=func(NonTerminator)
261             {
262                 if(!_IsReady)
263                 {
264                     _Grammar=_Grammar++"init "++NonTerminator;
265                     _Analyzer=buildsyner(_Grammar,_Processors);
266                     _IsReady=true;
267                 }
268             };
269
270             IsReady=func()
271             {
272                 return _IsReady;
273             };
274
275             Parse=func(Text)
276             {
277                 return runsyner(_Analyzer,Text);
278             };
279
280             SetDefaultError=func(Text)
281             {
282                 if(!_IsReady)
283                 {
284                     _Grammar=_Grammar++"default \""++_TextProcess(Text)++"\"\r\n";
285                 }
286             };
287
288             SetUnexpectedEndError=func(Text)
289             {
290                 if(!_IsReady)
291                 {
292                     _Grammar=_Grammar++"end \""++_TextProcess(Text)++"\"\r\n";
293                 }
294             };
295         };
296     };
297 };

posted on 2008-05-19 00:56 陳梓瀚(vczh) 閱讀(1650) 評論(4) 編輯收藏引用所屬分類: Vczh Free Script

評論:

# re: Vczh Free Script 2.0的Syngram庫完成 2008-05-19 01:56 | foxtail

感覺你大腦里有一張很全面的網絡回復更多評論

# re: Vczh Free Script 2.0的Syngram庫完成 2008-05-19 06:29 | 陳梓瀚(vczh)

我自己做的東西，我當然清楚了。回復更多評論

# re: Vczh Free Script 2.0的Syngram庫完成 2008-05-19 07:56 | 空明流轉

你這個代碼帖的也是邪門. 回復更多評論

# re: Vczh Free Script 2.0的Syngram庫完成[未登錄] 2008-05-19 09:20 | Alex

不錯不錯，支持一下回復更多評論

刷新評論列表

只有注冊用戶登錄后才能發表評論。
【推薦】100%開源！大型工業跨平臺軟件C++源碼提供，建模，組態！

相關文章: 今天在硬盤上發現了去年寫的一個小型IDE Vczh Free Script 2.0 最新.NET接口完成！ Vczh Free Script 2.0類插件完成 Vczh Free Script 2.0的Syngram庫完成今天使用新的feature為Vczh Free Script 2.0寫了一個collection庫 Vczh Free Script 2.0中namespace和大部分操作符重載完成！今天發現了Vczh Free Script 2.0的一個bug

網站導航: 博客園 IT新聞 BlogJava 博問 Chat2DB 管理

留言簿(70)

隨筆分類(347)

好友博客

Graphixer
何詠師弟的圖形學網站
jetricy
Jetricy的技術博客
KlayGE游戲引擎
叛叛大神
Lomox UI框架
Lomox UI框架
MiGL
Tyeah的博客
vczh的百度空間
vczh的百度空間
YMK的后花園
YMK的技術博客
德利菲
德利菲的技術博客
怪盜KID的游戲開發博客
怪盜KID的游戲開發博客
華工微軟俱樂部
華南理工大學微軟俱樂部科技部博客
開發視界
開發視界 - 移動開發社區
老趙點滴
趙姐夫的.net博客
臨淵羨魚，不如退而山寨
另一個SOS團的C++程序員……
某白食(Lyt)
某白食的C++博客
歲月流轉，往昔空明
空明流轉的blog
微軟一站式實例代碼庫
500個經典示例，速學速用，效率倍增。
我在博客園的blog
我在博客園的blog
一個不靠譜的程序員
JeffChen的技術博客

青青草原综合久久大伊人导航_色综合久久天天综合_日日噜噜夜夜狠狠久久丁香五月_热久久这里只有精品

留言簿(70)

隨筆分類(347)

好友博客

搜索

最新評論

閱讀排行榜

評論排行榜