Python - 'ascii'编解码器无法解码字节-Java 学习之路

105

我真的很困惑 . 我试着编码，但错误说 can't decode... .

>>> "你好".encode("utf8")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe4 in position 0: ordinal not in range(128)

我知道如何避免字符串上带有“u”前缀的错误 . 我只是想知道为什么在调用编码时错误是“无法解码” . 什么是Python在幕后做什么？

7 回答

0
```
"你好".encode('utf-8')
```
encode 将unicode对象转换为 string 对象 . 但是在这里你已经在一个 string 对象上调用了它（因为你没有你） . 所以python必须首先将 string 转换为 unicode 对象 . 所以它确实相当于
```
"你好".decode().encode('utf-8')
```
但解码失败，因为字符串无效ascii . 这就是为什么你得到一个关于无法解码的投诉 .
回复于 2024-04-20T03:15:35+08:00
3
始终从unicode编码为字节 .
在这方面， you get to choose the encoding .
```
>>> u"你好".encode("utf8")
'\xe4\xbd\xa0\xe5\xa5\xbd'
>>> print _
你好
```
另一种方法是从字节解码到unicode .
在这方面， you have to know what the encoding is .
```
>>> bytes = '\xe4\xbd\xa0\xe5\xa5\xbd'
>>> print bytes
你好
>>> bytes.decode('utf-8')
u'\u4f60\u597d'
>>> print _
你好
```
这一点不够强调 . 如果你想避免玩unicode“whack-a-mole”，重要的是要了解数据层面发生了什么 . 这里解释另一种方式：
- unicode对象已经被解码，你永远不想在它上面调用 decode .
- 一个bytestring对象已经编码，你永远不想在它上面调用 encode .
现在，在字节字符串上看到 .encode 时，Python 2首先尝试将其隐式转换为文本（ unicode 对象） . 类似地，在unicode字符串上看到 .decode 时，Python 2会隐式尝试将其转换为字节（ str 对象） .

这些隐式转换是您在调用 encode 时可以获得 Unicode Decode Error 的原因 . 这是因为编码通常接受 unicode 类型的参数;当接收 str 参数时，在使用另一种编码重新编码之前，会对 unicode 类型的对象进行隐式解码 . 此转换选择默认的'ascii'解码器†，为您提供编码器内的解码错误 .

实际上，在Python 3中，方法 str.decode 和 bytes.encode 甚至不存在 . 他们被撤职是一种[有争议的]企图避免这种常见的混乱 .

†...或任何编码sys.getdefaultencoding（）提及;通常这是'ascii'
回复于 2024-04-20T03:15:35+08:00
35
You can try this
```
import sys
reload(sys)
sys.setdefaultencoding("utf-8")
```
要么

You can also try following

在.py文件的顶部添加以下行 .
```
# -*- coding: utf-8 -*-
```
回复于 2024-04-20T03:15:35+08:00

150

如果你需要告诉翻译你的string literal is Unicode by prefixing it with a u：

Python 2.7.2 (default, Jan 14 2012, 23:14:09) 
[GCC 4.2.1 (Based on Apple Inc. build 5658) (LLVM build 2335.15.00)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> "你好".encode("utf8")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe4 in position 0: ordinal not in range(128)
>>> u"你好".encode("utf8")
'\xe4\xbd\xa0\xe5\xa5\xbd'

Further reading ：Unicode HOWTO .

回复于 2024-04-20T03:15:35+08:00

2
您使用 u"你好".encode('utf8') 来编码unicode字符串 . 但是如果你想表示 "你好" ，你应该解码它 . 就像：
```
"你好".decode("utf8")
```
你会得到你想要的 . 也许您应该了解有关编码和解码的更多信息 .
回复于 2024-04-20T03:15:35+08:00
48
如果您正在处理Unicode，有时而不是 encode('utf-8') ，您也可以尝试忽略特殊字符，例如
```
"你好".encode('ascii','ignore')
```
或者something.decode('unicode_escape').encode('ascii','ignore') as suggested here .

在此示例中不是特别有用，但在无法转换某些特殊字符的其他情况下可以更好地工作 .

或者你可以考虑replacing particular character using replace() .
回复于 2024-04-20T03:15:35+08:00
8
如果您从Linux或类似系统（BSD，不确定Mac）上的shell启动python解释器，您还应该检查shell的默认编码 .

从shell（而不是python解释器）调用 locale charmap ，您应该看到
```
[user@host dir] $ locale charmap
UTF-8
[user@host dir] $
```
如果不是这种情况，您会看到其他内容，例如：
```
[user@host dir] $ locale charmap
ANSI_X3.4-1968
[user@host dir] $
```
Python（至少在某些情况下，比如我的）会继承shell 's encoding and will not be able to print (some? all?) unicode characters. Python'自己的默认编码，你看到并通过 sys.getdefaultencoding() 控制，在这种情况下 sys.setdefaultencoding() 被忽略 .

如果您发现自己遇到此问题，可以通过以下方式解决问题
```
[user@host dir] $ export LC_CTYPE="en_EN.UTF-8"
[user@host dir] $ locale charmap
UTF-8
[user@host dir] $
```
（或者选择您想要的键盘图而不是en_EN . ）您也可以编辑 /etc/locale.conf （或管理系统中区域设置定义的文件）来纠正此问题 .
回复于 2024-04-20T03:15:35+08:00

Python - 'ascii'编解码器无法解码字节

7 回答

相关问题