锘??xml version="1.0" encoding="utf-8" standalone="yes"?>久久综合久久自在自线精品自,天天躁日日躁狠狠久久,久久久久久久人妻无码中文字幕爆http://m.shnenglu.com/AutomateProgram/category/16810.html褰撻煶涔愬拰浼犺鍦ㄦ繁澶滀腑娌夊瘋鍚庯紝紼嬪簭鐨勬瘡涓瓧絎﹁繕鍦ㄨ煩鍔紒zh-cnWed, 31 Aug 2011 13:42:16 GMTWed, 31 Aug 2011 13:42:16 GMT60鎬庢牱鍐欎竴涓嫾鍐欐鏌ュ櫒(緙栧啓 鐧懼害 鎮(zhèn)ㄨ鎵劇殑鏄笉鏄?鍔熻兘 )http://m.shnenglu.com/AutomateProgram/archive/2011/06/25/149446.html婕傛紓婕傛紓Sat, 25 Jun 2011 09:29:00 GMThttp://m.shnenglu.com/AutomateProgram/archive/2011/06/25/149446.htmlhttp://m.shnenglu.com/AutomateProgram/comments/149446.htmlhttp://m.shnenglu.com/AutomateProgram/archive/2011/06/25/149446.html#Feedback0http://m.shnenglu.com/AutomateProgram/comments/commentRss/149446.htmlhttp://m.shnenglu.com/AutomateProgram/services/trackbacks/149446.html鍘熸枃鍦板潃錛?a >http://blog.csdn.net/deadspace/archive/2011/02/17/6190810.aspx
涓婁釜鏄熸湡, 鎴戠殑涓や釜鏈嬪弸 Dean 鍜?Bill 鍒嗗埆鍛婅瘔鎴戣浠栦滑瀵?Google 鐨勫揩閫熼珮璐ㄩ噺鐨勬嫾鍐欐鏌ュ伐鍏鋒劅鍒版儕濂? 姣斿璇村湪鎼滅儲鐨勬椂鍊欓敭鍏?[speling], 鍦ㄤ笉鍒?0.1 縐掔殑鏃墮棿鍐? Google 浼氳繑鍥? 浣犺鎵劇殑鏄笉鏄?[spelling]. (Yahoo! 鍜屽井杞篃鏈夌被浼肩殑鍔熻兘). 璁╂垜鎰熷埌鏈夌偣濂囨殑鏄垜鍘熸兂 Dean 鍜?Bill 榪欎袱涓緢鐗涚殑宸ョ▼甯堝拰鏁板瀹跺簲璇ュ浜庝嬌鐢ㄧ粺璁¤璦妯″瀷鏋勫緩鎷煎啓媯鏌ュ櫒鏈夎亴涓氱殑鏁忔劅. 浣嗘槸浠栦滑浼間箮娌℃湁榪欎釜鎯蟲硶. 鎴戝悗鏉ユ兂浜嗘兂, 浠栦滑鐨勭‘娌′粈涔堢悊鐢卞緢鐔熸?zhèn)壘l熻璇█妯″瀷. 涓嶆槸浠栦滑鐨勭煡璇嗘湁闂, 鑰屾槸鎴戦鎯崇殑鏈潵灝辨槸涓嶅鐨?

鎴戣寰? 濡傛灉瀵硅繖鏂歸潰鐨勫伐浣滃仛涓В閲? 浠栦滑鍜屽叾浠栦漢鑲畾浼氬彈鐩? 鐒惰屽儚Google 鐨勯偅鏍峰伐涓氬己搴︾殑鎷煎啓媯鏌ュ櫒鐨勫叏閮ㄧ粏鑺傚彧浼氳浜烘劅鍒拌糠鎯戣屼笉鏄彈鍒板惎榪? 鍓嶅嚑澶╂垜涔橀鏈哄洖瀹剁殑鏃跺? 欏轟究鍐欎簡鍑犲崄琛岀▼搴? 浣滀負涓涓帺鍏鋒ц川鐨勬嫾鍐欐鏌ュ櫒. 榪欎釜鎷煎啓媯鏌ュ櫒澶х害1 縐掕兘澶勭悊10 澶氫釜鍗曡瘝, 騫朵笖杈懼埌 80% -90% 鐨勫噯紜巼. 涓嬮潰灝辨槸鎴戠殑浠g爜, 鐢≒ython 2.5 鍐欐垚, 涓鍏?1 琛? 鏄竴涓姛鑳藉畬澶囩殑鎷煎啓媯鏌ュ櫒.

import re, collections

def words( text): return re. findall( '[a-z]+' , text. lower())

def train( features):
    model = collections. defaultdict( lambda : 1 )
    for f in features:
        model[ f] += 1
    return model

NWORDS = train( words( file( 'big.txt' ). read()))

alphabet = 'abcdefghijklmnopqrstuvwxyz'

def edits1( word):
    n = len( word)
    return set([ word[ 0 : i]+ word[ i+ 1 :] for i in range( n)] +                      # deletion
               [ word[ 0 : i]+ word[ i+ 1 ]+ word[ i]+ word[ i+ 2 :] for i in range( n- 1 )] + # transposition
               [ word[ 0 : i]+ c+ word[ i+ 1 :] for i in range( n) for c in alphabet] + # alteration
               [ word[ 0 : i]+ c+ word[ i:] for i in range( n+ 1 ) for c in alphabet])   # insertion

def known_edits2( word):
    return set( e2 for e1 in edits1( word) for e2 in edits1( e1) if e2 in NWORDS)

def known( words): return set( w for w in words if w in NWORDS)

def correct( word):
    candidates = known([ word]) or known( edits1( word)) or known_edits2( word) or [ word]
    return max( candidates, key= lambda w: NWORDS[ w])

榪欐浠g爜瀹氫箟浜嗕竴涓嚱鏁板彨 correct , 瀹冧互涓涓崟璇嶄綔涓鴻緭鍏ュ弬鏁? 榪斿洖鏈鍙兘鐨勬嫾鍐欏緩璁粨鏋? 姣斿璇?

>>> correct( 'speling' )
'spelling'
>>> correct( 'korrecter' )
'corrector'

 

鎷煎啓媯鏌ュ櫒鐨勫師鐞? 涓浜涚畝鍗曠殑姒傜巼鐭ヨ瘑

鎴戠畝鍗曠殑浠嬬粛涓涓嬪畠鐨勫伐浣滃師鐞? 緇欏畾涓涓崟璇? 鎴戜滑鐨勪換鍔℃槸閫夋嫨鍜屽畠鏈鐩鎬技鐨勬嫾鍐欐紜殑鍗曡瘝. ( 濡傛灉榪欎釜鍗曡瘝鏈韓鎷煎啓灝辨槸姝g‘鐨? 閭d箞鏈鐩歌繎鐨勫氨鏄畠鑷繁鍟?. 褰撶劧, 涓嶅彲鑳界粷瀵圭殑鎵懼埌鐩歌繎鐨勫崟璇? 姣斿璇寸粰瀹?lates 榪欎釜鍗曡瘝, 瀹冨簲璇ュ埆鏇存涓?late 鍛㈣繕鏄?latest 鍛? 榪欎簺鍥伴毦鎸囩ず鎴戜滑, 闇瑕佷嬌鐢ㄦ鐜囪, 鑰屼笉鏄熀浜庤鍒欑殑鍒ゆ柇. 鎴戜滑璇? 緇欏畾涓涓瘝 w, 鍦ㄦ墍鏈夋紜殑鎷煎啓璇嶄腑, 鎴戜滑鎯寵鎵句竴涓紜殑璇?c, 浣垮緱瀵逛簬 w 鐨勬潯浠舵鐜囨渶澶? 涔熷氨鏄:

argmaxc P(c |w )

鎸夌収 璐濆彾鏂悊璁?涓婇潰鐨勫紡瀛愮瓑浠蜂簬:

argmaxc P(w |c ) P(c ) / P(w )

鍥犱負鐢ㄦ埛鍙互杈撻敊浠諱綍璇? 鍥犳瀵逛簬浠諱綍 c 鏉ヨ, 鍑虹幇 w 鐨勬鐜?P(w) 閮芥槸涓鏍風(fēng)殑, 浠庤屾垜浠湪涓婂紡涓拷鐣ュ畠, 鍐欐垚:

argmaxc P(w |c ) P(c )

榪欎釜寮忓瓙鏈変笁涓儴鍒? 浠庡彸鍒板乏, 鍒嗗埆鏄?

1. P(c), 鏂囩珷涓嚭鐜頒竴涓紜嫾鍐欒瘝 c 鐨勬鐜? 涔熷氨鏄, 鍦ㄨ嫳璇枃绔犱腑, c 鍑虹幇鐨勬鐜囨湁澶氬ぇ鍛? 鍥犱負榪欎釜姒傜巼瀹屽叏鐢辮嫳璇繖縐嶈璦鍐沖畾, 鎴戜滑縐頒箣涓哄仛璇█妯″瀷 . 濂芥瘮璇? 鑻辮涓嚭鐜?the 鐨勬鐜?P('the') 灝辯浉瀵歸珮, 鑰屽嚭鐜?P('zxzxzxzyy') 鐨勬鐜囨帴榪?( 鍋囪鍚庤呬篃鏄竴涓瘝鐨勮瘽).

2. P(w|c), 鍦ㄧ敤鎴鋒兂閿叆 c 鐨勬儏鍐典笅鏁叉垚 w 鐨勬鐜? 鍥犱負榪欎釜鏄唬琛ㄧ敤鎴蜂細浠ュ澶х殑姒傜巼鎶?c 鏁查敊鎴?w, 鍥犳榪欎釜琚О涓鴻宸ā鍨?.

3. argmaxc , 鐢ㄦ潵鏋氫婦鎵鏈夊彲鑳界殑 c 騫朵笖閫夊彇姒傜巼鏈澶х殑, 鍥犱負鎴戜滑鏈夌悊鐢辯浉淇? 涓涓? 姝g‘鐨? 鍗曡瘝鍑虹幇鐨勯鐜囬珮, 鐢ㄦ埛鍙堝鏄撴妸瀹冩暡鎴愬彟涓涓敊璇殑鍗曡瘝, 閭d箞, 閭d釜鏁查敊鐨勫崟璇嶅簲璇ヨ鏇存涓鴻繖涓紜殑.

鏈変漢鑲畾瑕侀棶, 浣犵鍟? 涓轟粈涔堟妸鏈綆鍗曠殑涓涓?P(c |w ) 鍙樻垚涓ら」澶嶆潅鐨勫紡瀛愭潵璁$畻? 絳旀鏄湰璐ㄤ笂 P(c|w) 灝辨槸鍜岃繖涓ら」鍚屾椂鐩稿叧鐨? 鍥犳鎷嗘垚涓ら」鍙嶈屽鏄撳鐞? 涓句釜渚嬪瓙, 姣斿涓涓崟璇?thew 鎷奸敊浜? 鐪嬩笂鍘?thaw 搴旇鏄紜殑, 鍥犱負灝辨槸鎶?a 鎵撴垚 e 浜? 鐒惰? 涔熸湁鍙兘鐢ㄦ埛鎯寵鐨勬槸 the, 鍥犱負 the 鏄嫳璇腑甯歌鐨勪竴涓瘝, 騫朵笖寰堟湁鍙兘鎵撳瓧鏃跺欐墜涓嶅皬蹇冧粠 e 婊戝埌 w 浜? 鍥犳, 鍦ㄨ繖縐嶆儏鍐典笅, 鎴戜滑鎯寵璁$畻 P(c |w ), 灝卞繀欏誨悓鏃惰冭檻 c 鍑虹幇鐨勬鐜囧拰浠?c 鍒?w 鐨勬鐜? 鎶婁竴欏規(guī)媶鎴愪袱欏瑰弽鑰岃榪欎釜闂鏇村姞瀹規(guī)槗鏇村姞娓呮櫚.

鐜板湪, 璁╂垜浠湅鐪嬬▼搴忕┒绔熸槸鎬庝箞涓鍥炰簨. 棣栧厛鏄綆?P(c), 鎴戜滑鍙互璇誨叆涓涓法澶х殑鏂囨湰鏂囦歡, big.txt , 榪欎釜閲岄潰澶х害鏈夊嚑鐧句竾涓瘝( 鐩稿綋浜庢槸璇枡搴撲簡). 榪欎釜鏂囦歡鏄敱Gutenberg 璁″垝 涓彲浠ヨ幏鍙栫殑涓浜涗功, Wiktionary 鍜?British National Corpus 璇枡搴撴瀯鎴? ( 褰撴椂鍦ㄩ鏈轟笂鎴戝彧鏈夌灝旀懇鏂叏闆? 鎴戝悗鏉ュ張鍔犲叆浜嗕竴浜? 鐩村埌鏁堟灉涓嶅啀鏄捐憲鎻愰珮涓烘).

鐒跺悗, 鎴戜滑鍒╃敤涓涓彨 words 鐨勫嚱鏁版妸璇枡涓殑鍗曡瘝鍏ㄩ儴鎶藉彇鍑烘潵, 杞垚灝忓啓, 騫朵笖鍘婚櫎鍗曡瘝涓棿鐨勭壒孌婄鍙? 榪欐牱, 鍗曡瘝灝變細鎴愪負瀛楁瘝搴忓垪, don't 灝卞彉鎴?don 鍜?t 浜?1 鎺ョ潃鎴戜滑璁粌涓涓鐜囨ā鍨? 鍒榪欎釜鏈鍚撳? 瀹為檯涓婂氨鏄暟涓鏁版瘡涓崟璇嶅嚭鐜板嚑嬈? 鍦?train 鍑芥暟涓? 鎴戜滑灝卞仛榪欎釜浜嬫儏.

def words( text): return re. findall( '[a-z]+' , text. lower())

def train( features):
    model = collections. defaultdict( lambda : 1 )
    for f in features:
        model[ f] += 1
    return model

NWORDS = train( words( file( 'big.txt' ). read()))

瀹為檯涓? NWORDS[w] 瀛樺偍浜嗗崟璇?w 鍦ㄨ鏂欎腑鍑虹幇浜嗗灝戞. 涓嶈繃涓涓棶棰樻槸瑕佹槸閬囧埌鎴戜滑浠庢潵娌℃湁榪囪榪囩殑鏂拌瘝鎬庝箞鍔? 鍋囧璇翠竴涓瘝鎷煎啓瀹屽叏姝g‘, 浣嗘槸璇枡搴撲腑娌℃湁鍖呭惈榪欎釜璇? 浠庤岃繖涓瘝涔熸案榪滀笉浼氬嚭鐜板湪璁粌闆嗕腑. 浜庢槸, 鎴戜滑灝辮榪斿洖鍑虹幇榪欎釜璇嶇殑姒傜巼鏄?. 榪欎釜鎯呭喌涓嶅お濡? 鍥犱負姒傜巼涓? 榪欎釜浠h〃浜嗚繖涓簨浠剁粷瀵逛笉鍙兘鍙戠敓, 鑰屽湪鎴戜滑鐨勬鐜囨ā鍨嬩腑, 鎴戜滑鏈熸湜鐢ㄤ竴涓緢灝忕殑姒傜巼鏉ヤ唬琛ㄨ繖縐嶆儏鍐? 瀹為檯涓婂鐞嗚繖涓棶棰樻湁寰堝鎴愬瀷鐨勬爣鍑嗘柟娉? 鎴戜滑閫夊彇涓涓渶綆鍗曠殑鏂規(guī)硶: 浠庢潵娌℃湁榪囪榪囩殑鏂拌瘝涓寰嬪亣璁懼嚭鐜拌繃涓嬈? 榪欎釜榪囩▼涓鑸垚涓?#8221; 騫蟲粦鍖?#8221;, 鍥犱負鎴戜滑鎶婃鐜囧垎甯冧負0 鐨勮緗負涓涓皬鐨勬鐜囧? 鍦ㄨ璦瀹炵幇涓? 鎴戜滑鍙互浣跨敤Python collention 鍖呬腑鐨?defaultdict 綾? 榪欎釜綾誨拰 python 鏍囧噯鐨?dict ( 鍏朵粬璇█涓彲鑳界О涔嬩負 hash 琛? 涓鏍? 鍞竴鐨勪笉鍚屽氨鏄彲浠ョ粰浠繪剰鐨勯敭璁劇疆涓涓粯璁ゅ? 鍦ㄦ垜浠殑渚嬪瓙涓? 鎴戜滑浣跨敤涓涓尶鍚嶇殑 lambda:1 鍑芥暟, 璁劇疆榛樿鍊間負 1.


鐒跺悗鐨勯棶棰樻槸: 緇欏畾涓涓崟璇?w, 鎬庝箞鑳藉鏋氫婦鎵鏈夊彲鑳界殑姝g‘鐨勬嫾鍐欏憿? 瀹為檯涓婂墠浜哄凡緇忕爺絀跺緱寰堝厖鍒嗕簡, 榪欎釜灝辨槸涓涓紪杈戣窛紱?鐨勬蹇? 榪欎袱涓瘝涔嬮棿鐨勭紪杈戣窛紱?br />瀹氫箟涓轟嬌鐢ㄤ簡鍑犳鎻掑叆( 鍦ㄨ瘝涓彃鍏ヤ竴涓崟瀛楁瘝), 鍒犻櫎( 鍒犻櫎涓涓崟瀛楁瘝), 浜ゆ崲( 浜ゆ崲鐩擱偦涓や釜瀛楁瘝), 鏇挎崲( 鎶婁竴涓瓧姣嶆崲鎴愬彟涓涓? 鐨勬搷浣滀粠涓涓瘝鍙樺埌鍙︿竴涓瘝.
涓嬮潰榪欎釜鍑芥暟鍙互榪斿洖鎵鏈変笌鍗曡瘝 w 緙栬緫璺濈涓?1 鐨勯泦鍚?

def edits1( word):
    n = len( word)
    return set([ word[ 0 : i]+ word[ i+ 1 :] for i in range( n)] +                      # deletion
               [ word[ 0 : i]+ word[ i+ 1 ]+ word[ i]+ word[ i+ 2 :] for i in range( n- 1 )] + # transposition
               [ word[ 0 : i]+ c+ word[ i+ 1 :] for i in range( n) for c in alphabet] + # alteration
               [ word[ 0 : i]+ c+ word[ i:] for i in range( n+ 1 ) for c in alphabet])   # insertion

鏄劇劧, 榪欎釜闆嗗悎寰堝ぇ. 瀵逛簬涓涓暱搴︿負 n 鐨勫崟璇? 鍙兘鏈塶 縐嶅垹闄? n-1 涓鎹? 26n 縐?( 璇戞敞: 瀹為檯涓婃槸 25n 縐? 鏇挎崲 鍜?26(n+1) 縐嶆彃鍏?( 璇戞敞: 瀹為檯涓婃瘮榪欎釜灝? 鍥犱負鍦ㄤ竴涓瓧姣嶅墠鍚庡啀鎻掑叆榪欎釜瀛楁瘝鏋勬垚鐨勮瘝鏄瓑浠風(fēng)殑). 榪欐牱鐨勮瘽, 涓鍏卞氨鏄?54n + 25 涓儏鍐?( 褰撲腑榪樻湁涓鐐歸噸澶?. 姣斿璇? 鍜?something 榪欎釜鍗曡瘝鐨勭紪杈戣窛紱諱負1 鐨勮瘝鎸夌収榪欎釜綆楁潵鏄?511 涓? 鑰屽疄闄呬笂鏄?494 涓?

涓鑸鎷煎啓媯鏌ョ殑鏂囩尞瀹gО澶х害80-95% 鐨勬嫾鍐欓敊璇兘鏄粙浜庣紪璇戣窛紱?1 浠ュ唴. 鐒惰屼笅闈㈡垜浠湅鍒? 褰撴垜瀵逛簬涓涓湁270 涓嫾鍐欓敊璇殑璇枡鍋氬疄楠岀殑鏃跺? 鎴戝彂鐜板彧鏈?6% 鐨勬嫾鍐欓敊璇槸灞炰簬緙栬緫璺濈涓? 鐨勯泦鍚? 鎴栬鏄垜閫夊彇鐨勪緥瀛愭瘮鍏稿瀷鐨勪緥瀛愰毦澶勭悊涓鐐瑰惂. 涓嶇鎬庢牱, 鎴戣寰楄繖涓粨鏋滀笉澶熷ソ, 鍥犳鎴戝紑濮嬭冭檻緙栬緫璺濈涓?2 鐨勯偅浜涘崟璇嶄簡. 榪欎釜浜嬫儏寰堢畝鍗? 閫掑綊鐨勬潵鐪? 灝辨槸鎶?edit1 鍑芥暟鍐嶄綔鐢ㄥ湪 edit1 鍑芥暟鐨勮繑鍥為泦鍚堢殑姣忎竴涓厓绱犱笂灝辮浜? 鍥犳, 鎴戜滑瀹氫箟鍑芥暟 edit2:

def edits2( word):
    return set( e2 for e1 in edits1( word) for e2 in edits1( e1))

榪欎釜璇彞鍐欒搗鏉ュ緢綆鍗? 瀹為檯涓婅儗鍚庢槸寰堝簽澶х殑璁$畻閲? 涓?something 緙栬緫璺濈涓? 鐨勫崟璇嶅眳鐒惰揪鍒頒簡 114,324 涓? 涓嶈繃緙栬緫璺濈鏀懼鍒? 浠ュ悗, 鎴戜滑鍩烘湰涓婂氨鑳借鐩栨墍鏈夌殑鎯呭喌浜? 鍦?70 涓牱渚嬩腑, 鍙湁3 涓殑緙栬緫璺濈澶т簬2. 褰撶劧鎴戜滑鍙互鍋氫竴浜涘皬灝忕殑浼樺寲: 鍦ㄨ繖浜涚紪杈戣窛紱誨皬浜? 鐨勮瘝涓棿, 鍙妸閭d簺姝g‘鐨勮瘝浣滀負鍊欓夎瘝. 鎴戜滑浠嶇劧鑰冭檻鎵鏈夌殑鍙兘鎬? 浣嗘槸涓嶉渶瑕佹瀯寤轟竴涓緢澶х殑闆嗗悎, 鍥犳, 鎴戜滑鏋勫緩涓涓嚱鏁板彨鍋?known_edits2 , 榪欎釜鍑芥暟鍙繑鍥為偅浜涙紜殑騫朵笖涓?w 緙栬緫璺濈灝忎簬2 鐨勮瘝鐨勯泦鍚?

def known_edits2( word):
    return set( e2 for e1 in edits1( word) for e2 in edits1( e1) if e2 in NWORDS)

鐜板湪, 鍦ㄥ垰鎵嶇殑 something 渚嬪瓙涓? known_edits2('something') 鍙兘榪斿洖 3 涓崟璇? 'smoothing', 'something' 鍜?'soothing', 鑰屽疄闄呬笂鎵鏈夌紪杈戣窛紱諱負 1 鎴栬?2 鐨勮瘝涓鍏辨湁 114,324 涓? 榪欎釜浼樺寲澶х害鎶婇熷害鎻愰珮浜?10%.

鏈鍚庡墿涓嬬殑灝辨槸璇樊妯″瀷閮ㄥ垎 P(w |c ) 浜? 榪欎釜涔熸槸褰撴椂闅句綇鎴戠殑閮ㄥ垎. 褰撴椂鎴戝湪椋炴満涓? 娌℃湁緗戠粶, 涔熷氨娌℃湁鏁版嵁鐢ㄦ潵鏋勫緩涓涓嫾鍐欓敊璇ā鍨? 涓嶈繃鎴戞湁涓浜涘父璇嗘х殑鐭ヨ瘑: 鎶婁竴涓厓闊蟲嫾鎴愬彟涓涓殑姒傜巼瑕佸ぇ浜庤緟闊?( 鍥犱負浜哄父甯告妸 hello 鎵撴垚 hallo 榪欐牱); 鎶婂崟璇嶇殑絎竴涓瓧姣嶆嫾閿欑殑姒傜巼浼氱浉瀵瑰皬, 絳夌瓑. 浣嗘槸鎴戝茍娌℃湁鍏蜂綋鐨勬暟瀛楀幓鏀拺榪欎簺璇佹嵁. 鍥犳, 鎴戦夋嫨浜嗕竴涓畝鍗曠殑鏂規(guī)硶: 緙栬緫璺濈涓? 鐨勬紜崟璇嶆瘮緙栬緫璺濈涓? 鐨勪紭鍏堢駭楂? 鑰岀紪杈戣窛紱諱負0 鐨勬紜崟璇嶄紭鍏堢駭姣旂紪杈戣窛紱諱負1 鐨勯珮. 鍥犳, 鐢ㄤ唬鐮佸啓鍑烘潵灝辨槸:

( 璇戞敞: 姝ゅ浣滆呬嬌鐢ㄤ簡Python 璇█鐨勪竴涓閥濡欐ц川: 鐭礬琛ㄨ揪寮? 鍦ㄤ笅闈㈢殑浠g爜涓? 濡傛灉known(set) 闈炵┖, candidate 灝變細閫夊彇榪欎釜闆嗗悎, 鑰屼笉緇х畫璁$畻鍚庨潰鐨? 鍥犳, 閫氳繃Python 璇█鐨勭煭璺〃杈懼紡, 浣滆呭緢綆鍗曠殑瀹炵幇浜嗕紭鍏堢駭)

def known( words): return set( w for w in words if w in NWORDS)

def correct( word):
    candidates = known([ word]) or known( edits1( word)) or known_edits2( word) or [ word]
    return max( candidates, key= lambda w: NWORDS[ w])

correct 鍑芥暟浠庝竴涓欓夐泦鍚堜腑閫夊彇鏈澶ф鐜囩殑. 瀹為檯涓? 灝辨槸閫夊彇鏈夋渶澶?P(c ) 鍊肩殑閭d釜. 鎵鏈夌殑 P(c) 鍊奸兘瀛樺偍鍦?NWORDS 緇撴瀯涓?

鏁堟灉

鐜板湪鎴戜滑鐪嬬湅綆楁硶鏁堟灉鎬庝箞鏍? 鍦ㄩ鏈轟笂鎴戝皾璇曚簡濂藉嚑涓緥瀛? 鏁堟灉榪樿. 椋炴満鐫闄嗗悗, 鎴戜粠鐗涙觸鏂囨湰妗f搴?(Oxford Text Archive) 涓嬭澆浜?Roger Mitton 鐨?Birkbeck 鎷煎啓閿欒璇枡搴?. 浠庤繖涓簱涓? 鎴戝彇鍑轟簡涓や釜闆嗗悎, 浣滀負鎴戣鍋氭嫾鍐欐鏌ョ殑鐩爣. 絎竴涓泦鍚堢敤鏉ヤ綔涓哄湪寮鍙戜腑浣滀負鍙傝? 絎簩涓綔涓烘渶鍚庣殑緇撴灉嫻嬭瘯. 涔熷氨鏄, 鎴戠▼搴忓畬鎴愪箣鍓嶄笉鍙傝冨畠, 鑰屾妸紼嬪簭鍦ㄥ叾涓婄殑嫻嬭瘯緇撴灉浣滀負鏈鍚庣殑鏁堟灉. 鐢ㄤ袱涓泦鍚堜竴涓緇冧竴涓鐓ф槸涓縐嶈壇濂界殑瀹炶返, 鑷沖皯榪欐牱鍙互閬垮厤鎴戦氳繃瀵圭壒瀹氭暟鎹泦鍚堣繘琛岀壒孌婅皟鏁翠粠鑰岃嚜嬈烘浜? 榪欓噷鎴戠粰鍑轟簡涓涓祴璇曠殑渚嬪瓙鍜屼竴涓繍琛屾祴璇曠殑渚嬪瓙. 瀹為檯鐨勫畬鏁存祴璇曚緥瀛愬拰紼嬪簭鍙互鍙傝 spell.py .

浠g爜涓嬭澆錛?br />鏈枃鏉ヨ嚜CSDN鍗氬錛岃漿杞借鏍囨槑鍑哄錛?a >http://blog.csdn.net/deadspace/archive/2011/02/17/6190810.aspx



婕傛紓 2011-06-25 17:29 鍙戣〃璇勮
]]>
久久久久亚洲av成人网人人软件 | 久久亚洲sm情趣捆绑调教| 久久精品成人欧美大片| 国产精品99久久久久久猫咪| 久久久久久青草大香综合精品| 久久久久亚洲精品天堂久久久久久 | 亚洲国产一成人久久精品| 午夜欧美精品久久久久久久| 国内精品久久久久影院日本 | 久久天天躁狠狠躁夜夜2020一| 亚洲日韩中文无码久久| 色偷偷88欧美精品久久久| 91精品国产综合久久婷婷 | 久久婷婷五月综合成人D啪| 久久九九精品99国产精品| 国产精品久久久久免费a∨| 99久久无码一区人妻a黑| 亚洲а∨天堂久久精品9966| 伊人色综合久久天天| 久久久久亚洲av无码专区导航| 久久久久久一区国产精品| 国内精品久久久久影院优| 综合人妻久久一区二区精品| 91麻精品国产91久久久久| 丁香五月网久久综合| 香蕉久久av一区二区三区| 久久久亚洲裙底偷窥综合| 国产呻吟久久久久久久92| 久久精品国产亚洲欧美| 伊人色综合久久| 韩国无遮挡三级久久| 久久福利青草精品资源站免费| 久久亚洲日韩看片无码| 久久伊人五月丁香狠狠色| 2021最新久久久视精品爱| 免费久久人人爽人人爽av| 中文字幕久久亚洲一区| 97久久香蕉国产线看观看| 亚洲av日韩精品久久久久久a| 久久99精品国产麻豆婷婷| 久久久久久青草大香综合精品|