sunrise

每天不斷學習，才能不斷提升自己。

:: 管理 ::

64 隨筆 :: 0 文章 :: 92 評論 :: 0 Trackbacks

參見：http://www.pythonclub.org/python-basic/codec

主要介紹了python的編碼機制，unicode, utf-8, utf-16, GBK, GB2312,ISO-8859-1 等編碼之間的轉換。

常見的編碼轉換分為以下幾種情況：
1.自動識別字符串編碼：

#coding:utf8

#chartdet官方下載網站http://pypi.python.org/pypi/chardet

import urllib

import chardet

rawdata = urllib.urlopen('http://www.google.cn/').read()

print chardet.detect(rawdata)

輸出：

#confidence是可信度，encoding是編碼

{'confidence': 0.99, 'encoding': 'utf-8'}

2.unicode轉換為其他編碼

#coding:utf8

a = u'中文'

a_gb2312 = a.encode('gb2312')

print a_gb2312

輸出：

中文

3.其他編碼轉換為unicode

#coding:utf8

a = u'中文'

a_gb2312 = a.encode('gb2312')

print a_gb2312

#a為gb2312編碼，要轉為unicode. unicode(a, 'gb2312')或a.decode('gb2312')

print [unicode(a_gb2312,'gb2312')]

print [a_gb2312.decode('gb2312')]

輸出：

中文

[u'\u4e2d\u6587']

4.非unicode編碼之間的相互轉化

#coding:utf8

a = u'中文'

a_gb2312 = a.encode('gb2312')

print a_gb2312

#編碼1轉換為編碼2可以先轉為unicode再轉為編碼2

a_unicode = a_gb2312.decode('gb2312')

print [a_unicode]

a_utf8 = a_unicode.encode('utf8')

#dos不識別utf8編碼，直接輸出會是亂碼

print [a_utf8]

輸出：

中文

[u'\u4e2d\u6587']

['\xe4\xb8\xad\xe6\x96\x87']

5.判斷字符串編碼

#coding:utf8

#isinstance(s, str) 用來判斷是否為一般字符串

#isinstance(s, unicode) 用來判斷是否為unicode 3

#如果一個字符串已經是unicode了，再執行unicode轉換有時會出錯(并不都出錯)

def u(s,encoding):

if isinstance(s,unicode):

return s

else:

return unicode(s,encoding)

6.漢字轉化為unicode編碼

#coding:utf8

#該方法沒看懂，先留下了

name = '中國'

name = name.decode('utf8')

print name

tmpname = ""

for c in name:

c = "%%u%04X" % ord(c)

tmpname += c

print tmpname

輸出結果：

中國

%u4E2D%u56FD

posted on 2012-12-27 16:11 SunRise_at 閱讀(4218) 評論(0) 編輯收藏引用所屬分類: 可愛的python

只有注冊用戶登錄后才能發表評論。
【推薦】100%開源！大型工業跨平臺軟件C++源碼提供，建模，組態！

相關文章: turbogear2上傳文件功能關于PIL庫的一些概念 python的默認參數 Google Translate API json的編碼和解析 python多線程 python編碼轉換 Python yield 用法 python enumerate用法 python之Queue

網站導航: 博客園 IT新聞 BlogJava 博問 Chat2DB 管理

sunrise

常用鏈接

留言簿(12)

隨筆分類(63)

隨筆檔案(64)

收藏夾

ACMer

技術聯盟

可愛的python

數據挖掘

算法之道

友情鏈接

最新隨筆

搜索

積分與排名

最新隨筆

最新評論

閱讀排行榜

評論排行榜