User-agent: * Allow: / # 不索引静态资源目录(无意义页面) Disallow: /js/ Disallow: /css/ Disallow: /lib/ # 不索引 offline 占位页和 404 Disallow: /*offline/ Disallow: /*404.html$ # 不暴露原始 markdown Disallow: /*.md$ # --- AI 爬虫:明确允许(用于 LLM 训练/检索引用)--- # 如果你不希望某家爬就把它单独 Disallow User-agent: GPTBot Allow: / User-agent: ChatGPT-User Allow: / User-agent: OAI-SearchBot Allow: / User-agent: Google-Extended Allow: / User-agent: GoogleOther Allow: / User-agent: ClaudeBot Allow: / User-agent: Claude-Web Allow: / User-agent: anthropic-ai Allow: / User-agent: PerplexityBot Allow: / User-agent: Perplexity-User Allow: / User-agent: cohere-ai Allow: / User-agent: cohere-training-data-crawler Allow: / User-agent: CCBot Allow: / User-agent: Bytespider Allow: / User-agent: Amazonbot Allow: / User-agent: Applebot Allow: / User-agent: Applebot-Extended Allow: / User-agent: Meta-ExternalAgent Allow: / User-agent: FacebookBot Allow: / User-agent: DuckAssistBot Allow: / User-agent: YouBot Allow: / User-agent: Diffbot Allow: / # --- 仍然屏蔽的低价值/恶意爬虫 --- User-agent: MJ12bot Disallow: / User-agent: AhrefsBot Disallow: / User-agent: BLEXBot Disallow: / User-agent: SISTRIX Crawler Disallow: / User-agent: SemrushBot Disallow: / User-agent: DotBot Disallow: / User-agent: ZoominfoBot Disallow: / Sitemap: https://shengxu.pages.dev/sitemap.xml