在Python中使用HTTP GET的最快方法是什么？-Java 学习之路

478

如果我知道内容将是一个字符串，那么在Python中使用HTTP GET的最快方法是什么？我正在搜索文档中的快速单行，如：

contents = url.get("http://example.com/foo/bar")

但我使用Google找到的所有内容都是 httplib 和 urllib - 我无法在这些库中找到快捷方式 .

标准Python 2.5是否有如上所述的某种形式的快捷方式，还是应该编写函数 url_get ？

我宁愿不将shell的输出捕获到 wget 或 curl .

10 回答

theller的wget解决方案非常有用，但是，我发现它并没有打印出整个下载过程中的进度 . 如果在reporthook中的print语句后添加一行，这是完美的 .

import sys, urllib

def reporthook(a, b, c):
    print "% 3.1f%% of %d bytes\r" % (min(100, float(a * b) / c * 100), c),
    sys.stdout.flush()
for url in sys.argv[1:]:
    i = url.rfind("/")
    file = url[i+1:]
    print url, "->", file
    urllib.urlretrieve(url, file, reporthook)
print

回复于 2024-04-26T15:11:13+08:00

这是Python中的wget脚本：

# From python cookbook, 2nd edition, page 487
import sys, urllib

def reporthook(a, b, c):
    print "% 3.1f%% of %d bytes\r" % (min(100, float(a * b) / c * 100), c),
for url in sys.argv[1:]:
    i = url.rfind("/")
    file = url[i+1:]
    print url, "->", file
    urllib.urlretrieve(url, file, reporthook)
print

回复于 2024-04-26T15:11:13+08:00

3
如果你想让httplib2的解决方案成为oneliner考虑实例化匿名Http对象
```
import httplib2
resp, content = httplib2.Http().request("http://example.com/foo/bar")
```
回复于 2024-04-26T15:11:13+08:00
6
看看httplib2，它旁边有许多非常有用的功能 - 提供你想要的 .
```
import httplib2

resp, content = httplib2.Http().request("http://example.com/foo/bar")
```
其中content将是响应主体（作为字符串），resp将包含状态和响应头 .

它不包含在标准的python安装中（但它只需要标准的python），但它绝对值得一试 .
回复于 2024-04-26T15:11:13+08:00
3
如果您专门使用HTTP API，还有更方便的选择，例如Nap .

例如，以下是自2014年5月1日起如何从Github获取要点：
```
from nap.url import Url
api = Url('https://api.github.com')

gists = api.join('gists')
response = gists.get(params={'since': '2014-05-01T00:00:00Z'})
print(response.json())
```
更多例子：https://github.com/kimmobrunfeldt/nap#examples
回复于 2024-04-26T15:11:13+08:00
702
没有进一步必要的导入，这个解决方案（对我来说）也有效 - 也可以使用https
```
try:
    import urllib2 as urlreq # Python 2.x
except:
    import urllib.request as urlreq # Python 3.x
req = urlreq.Request("http://example.com/foo/bar")
req.add_header('User-Agent', 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112.113 Safari/537.36')
urlreq.urlopen(req).read()
```
在 Headers 信息中未指定"User-Agent"时，我常常难以抓取内容 . 然后通常会使用以下内容取消请求： urllib2.HTTPError: HTTP Error 403: Forbidden 或 urllib.error.HTTPError: HTTP Error 403: Forbidden .
回复于 2024-04-26T15:11:13+08:00

324

您可以使用名为requests的库 .

import requests
r = requests.get("http://example.com/foo/bar")

这很容易 . 然后你可以这样做：

>>> print(r.status_code)
>>> print(r.headers)
>>> print(r.content)

回复于 2024-04-26T15:11:13+08:00

How to also send headers

Python 3：

import urllib.request
contents = urllib.request.urlopen(urllib.request.Request(
    "https://api.github.com/repos/cirosantilli/linux-kernel-module-cheat/releases/latest",
    headers={"Accept" : 'application/vnd.github.full+json"text/html'}
)).read()
print(contents)

Python 2：

import urllib2
contents = urllib2.urlopen(urllib2.Request(
    "https://api.github.com",
    headers={"Accept" : 'application/vnd.github.full+json"text/html'}
)).read()
print(contents)

回复于 2024-04-26T15:11:13+08:00

优秀的解决方案Xuan，Theller .

为了使用python 3进行以下更改

import sys, urllib.request

def reporthook(a, b, c):
    print ("% 3.1f%% of %d bytes\r" % (min(100, float(a * b) / c * 100), c))
    sys.stdout.flush()
for url in sys.argv[1:]:
    i = url.rfind("/")
    file = url[i+1:]
    print (url, "->", file)
    urllib.request.urlretrieve(url, file, reporthook)
print

此外，您输入的URL前面应该有一个“http：//”，否则会返回一个未知的url类型错误 .

回复于 2024-04-26T15:11:13+08:00

Python 2.x：

import urllib2
contents = urllib2.urlopen("http://example.com/foo/bar").read()

Python 3.x：

import urllib.request
contents = urllib.request.urlopen("http://example.com/foo/bar").read()

urllib.request和read的文档 .

那个怎么样？

回复于 2024-04-26T15:11:13+08:00

在Python中使用HTTP GET的最快方法是什么？

10 回答

相关问题