Python 2.7 UnicodeDecodeError：'ascii' codec无法解码字节-Java 学习之路

我一直在解析一些带有特殊字符（捷克语字母）的docx文件（UTF-8编码的XML） . 当我尝试输出到stdout时，一切顺利，但我无法将数据输出到文件，

Traceback（最近一次调用最后一次）：文件“./test.py”，第360行，inile.write（u'\ t \ t \ t \ t \ t \ t \ n \ n'）UnicodeEncodeError：'ascii'codec'' t编码位置37的字符u'\ xed'：序数不在范围内（128）

虽然我明确地将 word 变量转换为unicode类型（ type(word) 返回unicode），但我试图用 .encode('utf-8) 编码它我仍然坚持这个错误 .

以下是现在看到的代码示例：

for word in word_list:
    word = unicode(word)
    #...
    ofile.write(u'\t\t\t\t\t<feat att="writtenForm" val="'+word+u'"/>\n')
    #...

我也尝试过以下方法：

for word in word_list:
    word = word.encode('utf-8')
    #...
    ofile.write(u'\t\t\t\t\t<feat att="writtenForm" val="'+word+u'"/>\n')
    #...

即使是这两者的组合：

word = unicode(word)
word = word.encode('utf-8')

我有点绝望所以我甚至试图在 ofile.write() 内编码单词变量

ofile.write(u'\t\t\t\t\t<feat att="writtenForm" val="'+word.encode('utf-8')+u'"/>\n')

我很欣赏任何我做错的提示 .

4 回答

0
ofile 是一个字节流，您正在写一个字符串 . 因此，它会尝试通过编码为字节字符串来处理您的错误 . 这通常只对ASCII字符安全 . 由于 word 包含非ASCII字符，因此失败：
```
>>> open('/dev/null', 'wb').write(u'ä')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe4' in position 0:
                    ordinal not in range(128)
```
使用io.open打开文件，使用 'wt' 等模式和显式编码，使 ofile 成为文本流：
```
>>> import io
>>> io.open('/dev/null', 'wt', encoding='utf-8').write(u'ä')
1L
```
或者，您也可以使用codecs.open与几乎相同的接口，或使用encode手动编码所有字符串 .
回复于 2024-05-06T17:50:55+08:00
2
Phihag的回答是正确的 . 我只是想建议使用显式编码手动将unicode转换为字节字符串：
```
ofile.write((u'\t\t\t\t\t<feat att="writtenForm" val="' +
             word + u'"/>\n').encode('utf-8'))
```
（也许你想知道如何使用基本机制而不是高级魔法和黑魔法如 io.open . ）
回复于 2024-05-06T17:50:55+08:00
11
在写入word文档（.docx）时，我遇到了类似的错误 . 特别是欧元符号（€） .
```
x = "€".encode()
```
这给出了错误：

UnicodeDecodeError：'ascii'编解码器无法解码位置0中的字节0xe2：序号不在范围内（128）

我是如何解决的是：
```
x = "€".decode()
```
我希望这有帮助！
回复于 2024-05-06T17:50:55+08:00
2
我在stackoverflow中找到的最佳解决方案是在这篇文章中：How to fix: "UnicodeDecodeError: 'ascii' codec can't decode byte"放入代码的开头，默认编码将是utf8
```
# encoding=utf8
import sys
reload(sys)
sys.setdefaultencoding('utf8')
```
回复于 2024-05-06T17:50:55+08:00

Python 2.7 UnicodeDecodeError：'ascii' codec无法解码字节

4 回答

相关问题