Skip to content

Combination of Crawl-delay and badbot Disallow results in blocking of Googlebot #51

@mojmirdurik

Description

@mojmirdurik

For example Googlebot gets blocked by following robots.txt (check it in google testing tool):

# Slow down bots
User-agent: *
Crawl-delay: 10

# Disallow: Badbot
User-agent: badbot
Disallow: /

# allow explicitly all other bots
User-agent: *
Disallow:

If you remove Crawl-delay directive Googlebot will pass. This works:

# Disallow: Badbot
User-agent: badbot
Disallow: /

# allow explicitly all other bots
User-agent: *
Disallow:

And this too:

# Disallow: Badbot
User-agent: badbot
Disallow: /

If you would like to use Crawl-delay directive and to not block Googlebot you must add Allow directive:

# Slow down bots
User-agent: *
Crawl-delay: 10

# Disallow: Badbot
User-agent: badbot
Disallow: /

# allow explicitly all other bots
User-agent: *
Disallow:

# allow explicitly all other bots (supported only by google and bing)
User-agent: *
Allow: /

Both Crawl-delay and Allow are unofficial directives. Crawl-delay is widely supported (except of Googlebot). Allow is supported only by Googlebot and Bingbot (AFAIK). Normally Googlebot should be allowed by all above mentioned robots.txt. E.g. if you choose Adsbot-Google in mentioned google tool it will pass for all. All other google bots will fail in the same way. For first time we have noticed this unexpected behaviour at the end of 2021.

Is this a mistake in parsing of robots.txt by Googlebot or do I just miss something?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions