Re: [問題] 請問一下unicode的問題

看板Python作者pkyosx (Insomnia)時間18年前 (2007/01/06 03:00)推噓1(1推 0噓 0→)

留言1則, 1人參與討論串4/18 (看更多)

因為讀 utf-8 的檔案一直出錯最後一氣之下實驗了一些東西錯誤訊息: >>> file("d:\\utf8.txt","r").read().decode('utf8').encode('big5') UnicodeDecodeError: 'utf8' codec can't decode byte 0xff in position 0: unexpected code byte 首先.. 在 windows 的環境下如果你是存 unicode 事實上就是存成 utf-16 其次在python編輯器上如果沒改的話預設是用 big5 再來 utf-16 utf-8 在檔案的最前端都會有 0xFEFF 我猜是 header 實驗一: 我在 notepad 檔案裡面放一個字 "我" 然後各存成 utf-8, unicode(utf-16) 讀檔案並decode >>> file("d:\\unicode.txt","r").read().decode('utf-16') u'\u6211' >>> file("d:\\utf8.txt","r").read().decode('utf8') u'\ufeff\u6211' 幹點一: utf-8 decode 出來以後多個 0xFEFF 實驗二: 刪除 python 讀 utf-8 出錯的可能性 >>> file("d:\\unicode.txt","w").write('我'.decode('big5').encode('utf-16')) >>> file("d:\\utf8.txt","w").write('我'.decode('big5').encode('utf-8')) >>> file("d:\\unicode.txt","r").read().decode('utf-16') u'\u6211' >>> file("d:\\utf8.txt","r").read().decode('utf8') u'\u6211' 以防萬一把檔案複製一份到其他資料夾在讀看看 >>> file("d:\\wow\\utf8.txt","r").read().decode('utf8') u'\u6211' >>> file("d:\\wow\\unicode.txt","r").read().decode('utf-16') u'\u6211' 奇怪一切都正常!! 接著我用 notepad 把兩個檔案打開只做存檔不更改內容 >>> file("d:\\unicode.txt","r").read().decode('utf-16') u'\u6211' >>> file("d:\\utf8.txt","r").read().decode('utf8') u'\ufeff\u6211' 幹點二: 用 notepad 存過的 0xFEFF 又回來了直接用 Ultra Editor Hex進位模式驗證: 存檔前: => FF FE 11 62 存檔後: 終於發現問題就在於 notepad 存 UTF-8 的時候多存東西上去了!! => FF FE FF FE 11 62 但是 notepad 存 unicode(UTF-16), Ultra-Editor 存 UTF-8, UTF-16 都不會有問題 => FF FE 11 62 結論: 習慣用 notepad 開文件的人小心阿= =" ...TMD -- ※ 發信站: 批踢踢實業坊(ptt.cc) ◆ From: 140.113.128.52