robots cheatsheet
Sitemap: https://example.com/sitemap_index.xml User-agent: * Disallow: /wp-admin/ Allow: /wp-admin/admin-ajax.php
examples:
User-agent: * Disallow: /admin/ Disallow: /assets/components/ Disallow: /core/ Disallow: /connect/ Disallow: /index.php Disallow: *? Allow: .js Allow: .css
Host: example.com Sitemap: http://exammple.com/sitemap.xml
//wildcards
The * wildcard character will simply match any sequence of characters. This is useful whenever there are clear URL patterns that you want to disallow such as filters and parameters.
$ wildcards The $ wildcard character is used to denote the end of a URL. This is useful for matching specific file types, such as .pdf.
Block search engines from accessing any URL that has a ? in it:
User-agent: * Disallow: /*? Block search engines from crawling any URL a search results page (query?kw=)
User-agent: * Disallow: /query?kw=* Block search engines from crawling any URL url with the ?color= parameter in it, except for ?color=blue
User-agent: * Disallow: /?color Allow: /?color=blue Block search engines from crawling comment feeds in WordPress
User-agent: * Disallow: /comments/feed/ Block search engines from crawling URLs in a common child directory
User-agent: * Disallow: /*/child/ Block search engines from crawling URLs in a specific directory which 3 or more dashes
User-agent: * Disallow: /directory/--*- Block search engines from crawling any URL that ends with “.pdf” – Note, if there are parameters appended to the URL, this wildcard will not prevent crawling since the URL no longer ends with “.pdf”
User-agent: * Disallow: /*.pdf$
Block access to every URL that contains a question mark “?” User-agent: * Disallow: /*?
Comments powered by Disqus.