Web Crawler - 忽略Robots.txt文件？-Java 学习之路

某些服务器具有robots.txt文件，以阻止网络抓取工具抓取其网站 . 有没有办法让网络抓取工具忽略robots.txt文件？我正在使用Mechanize for python .

2 回答

机械化的documentation有以下示例代码：

br = mechanize.Browser()
....
# Ignore robots.txt.  Do not do this without thought and consideration.
br.set_handle_robots(False)

这正是你想要的 .

回复于 2024-05-21T07:43:32+08:00