经典WordPress robots.txt编写指南：必读设置

5星 · 超过95%的资源需积分: 29 66 浏览量更新于2024-09-20 收藏 863B TXT 举报

在管理WordPress网站的搜索引擎优化（SEO）和网站访问权限时，robots.txt文件起着至关重要的作用。它定义了搜索引擎爬虫如何访问您的网站，确保数据安全、隐私保护以及防止不必要的抓取。一个经典的WordPress站点的robots.txt写法应该遵循以下规则： 1. **通用禁止访问设置**: - `User-agent:*` 通配符表示针对所有搜索引擎爬虫。这行告诉它们不要执行接下来的指令。 - `Disallow:/search.html` 禁止访问搜索结果页面，防止搜索引擎抓取重复内容。 - `/404.shtml` 和 `/wp-admin/` 被屏蔽，因为这些是后台管理和错误页面，不需要索引。 - `/wp-` 和 `/wp-includes/` 通常用于WordPress核心文件，避免包含敏感信息。 - `index.php?` 和 `/?` 通常用于动态内容处理，不希望搜索引擎频繁抓取。 - 仅允许访问`wp-content/uploads/`目录，这是上传文件的地方，可能包含用户生成的内容。 2. **针对特定搜索引擎的设置**: - 对于Googlebot，进一步细化了禁止规则，如`.php$`, `.js$`, `.inc$`, `.css$`, `.gz$`, `.wmv$`, 和 `.cgi$` 文件，以防敏感文件被索引。 - `Disallow:/*?*` 阻止所有包含查询参数的URL，以避免抓取内部链接。 - `duggmirror` 是一个特定爬虫，这里通过`Disallow:` 全局禁止其访问。 3. **Google图片爬虫**: - 对于Googlebot-Image，`Disallow:` 指令表示允许搜索引擎爬虫抓取所有图像内容，以便正确索引图片。 4. **广告和统计追踪**: - 针对AdSense等广告平台的爬虫 (`Mediapartners-Google*`)，设置`Disallow:` 以排除广告相关区域。 - `Allow:/*` 允许其他合法爬虫访问网站内容，但不会抓取广告或特定禁止的部分。这个经典WordPress的robots.txt文件配置提供了一种基本且灵活的方法来控制搜索引擎爬虫的行为，保护网站内容和隐私，同时确保搜索引擎能够正确地索引和呈现有价值的网页。在实际使用时，可以根据具体需求进行适当的调整。确保定期检查并更新robots.txt，以适应网站结构的变化。

User-agent: *
Disallow: /search.html
Disallow: /404.shtml

Disallow: /wp-admin/
Disallow: /wp-
Disallow: /wp-includes/
Disallow: /index.php?
Disallow:/?
Allow£º /wp-content/uploads/
Disallow£º /feed
Disallow£º /trackback
Disallow£º /date/
Disallow£º /page/
Sitemap: http://www.yourURL.com/sitemap_baidu.xml

User-agent: Googlebot
# disallow all files ending with these extensions
Disallow: /*.php$
Disallow: /*.js$
Disallow: /*.inc$
Disallow: /*.css$
Disallow: /*.gz$
Disallow: /*.wmv$
Disallow: /*.cgi$
Disallow: /*.xhtml$

# disallow all files with ? in url
Disallow: /*?*

下载后可阅读完整内容，剩余1页未读，立即下载

liuweok

粉丝: 2
资源: 3

经典WordPress robots.txt编写指南：必读设置

wordpress robots.txt优化你的博客

robots.txt用法与写法

一个经典的zencart的robots.txt的写法

robots.txt：robots.txt即服务。 抓取robots.txt文件，下载并解析它们以通过API检查规则

Robots.txt-Parser-Class：robots.txt解析的Php类

robots.io:Robots.txt 解析库

Robots.io:Robots.txt解析库-开源

dokku-robots.txt:为应用程序创建robots.txt，可以将其设置为允许或禁止应用程序使用网络抓取工具

robots.js:用于robots.txt的用于node.js的解析器

一个遵守 robots.txt 规则的爬虫

最新资源

robots.txt：robots.txt即服务。抓取robots.txt文件，下载并解析它们以通过API检查规则