decode 解码
encode 转码
unicode是一种编码,具体可以百度搜
# coding: UTF-8 u = u'汉'print repr(u) # u'\u6c49's = u.encode('UTF-8')print repr(s) # '\xe6\xb1\x89'u2 = s.decode('UTF-8')print repr(u2) # u'\u6c49' # 对unicode进行解码是错误的# s2 = u.decode('UTF-8')# 同样,对str进行编码也是错误的# u2 = s.encode('UTF-8')
s = u.encode('UTF-8') 是把u转码成utf-8
u2 = s.decode('UTF-8')是把u解码成utf-8 如果是windows下编码一般是gbk,所以解码时候要用 u.decode('gbk'),如下
>>> u='格式'>>> u.decode('gbk')u'\u683c\u5f0f'>>> u.decode('utf-8')Traceback (most recent call last): File "", line 1, in u.decode('utf-8') File "C:\Python27\lib\encodings\utf_8.py", line 16, in decode return codecs.utf_8_decode(input, errors, True)UnicodeDecodeError: 'utf8' codec can't decode byte 0xb8 in position 0: invalid start byte>>>