Google网站管理员中的robots.txt消息阻止了网址-Java 学习之路

我在root域中有一个wordpress站点 . 现在，我在子文件夹中添加了一个论坛作为mydomain / forum，它创建了一个站点 Map 如下：mydomain / forum / sitemap_index.xml . 将该站点 Map 提交给谷歌，听起来谷歌无法使用“由robots.txt阻止的网址”的消息访问子站点 Map - 值：mydomain / forum / sitemap-forums.xml？page = 1 ---值：mydoamin /forum/sitemap-index.xml?page=1 .

这是我的robots.txt：

User-agent: *
Disallow: /cgi-bin
Disallow: /wp-admin
Disallow: /wp-includes
Disallow: /wp-content/plugins
Disallow: /wp-content/cache
Disallow: /wp-content/themes
Disallow: /trackback
Disallow: /feed
Disallow: /comments
Disallow: /category/*/*
Disallow: */trackback
Disallow: */feed
Disallow: */comments
Disallow: /*?*
Disallow: /*?
Allow: /wp-content/uploads


# Google Image
User-agent: Googlebot-Image
Disallow:
Allow: /*

Sitemap: mydomain/sitemap_index.xml
Sitemap: mydomain/forum/sitemap_index.xml

我应该向robots.txt添加什么？任何帮助将不胜感激 . 提前致谢

1 回答

1
只是为了澄清，我的例子中的'm assuming ' mydomain'是该方案的替身加上完全合格的域名，对吗？（例如“http://www.whatever.com ", not " whatever.com " or " www.whatever.com”）我认为必须是这种情况，因为您在Google错误消息中以相同的格式存在它 .

该错误消息表明Google正在从您的robots.txt文件以外的其他位置获取该网址 . robots.txt文件将站点 Map 网址列为：
```
mydomain/forum/sitemap_index.xml
```
但错误消息显示Google正在尝试加载该网址：
```
mydomain/forum/sitemap-index.xml?page=1
```
第二个网址被阻止，因为您的robots.txt文件会阻止包含问号的任何网址：
```
Disallow: /*?*
Disallow: /*?
```
（顺便说一句，这两行完全相同 . 你可以放心地删除第一行）但Google应该仍然能够使用更简单的URL读取站点 Map 文件，因此页面可能仍然会被抓取 . 如果你真的想摆脱错误信息，你可以随时添加：
```
Allow: /forum/sitemap-index.xml?page=1
```
这将仅覆盖站点 Map URL的不允许 . （这至少适用于谷歌 - 任何其他搜索引擎的YMMV）
回复于 2024-05-04T07:55:42+08:00

Google网站管理员中的robots.txt消息阻止了网址

1 回答

相关问题