Remove URLs using a robots.txt file
Last Updated:
It’s likely that you have pages on your website that you don’t want indexed by search engines. These could be pages like your privacy policy or simply pages that you don’t want accessible to the public. For example, if the page is linked and accessible via your website, such as a privacy policy, you can block it using a robots.txt
file.
Creating a robots.txt
File
A robots.txt
file is a standard text file that can be created with any text editor, such as Notepad, and saved with the .txt
extension. Upload the robots.txt
file to the root of your website so it can be found by search engines here: https://www.domain.com/robots.txt
.
Denying Bots from Indexing Using a robots.txt
File
To deny bots from accessing an entire website:
User-agent: * Disallow: /
To deny all bots from indexing a specific page:
User-agent: * Disallow: /page.html
To deny all bots from indexing a folder:
User-agent: * Disallow: /folder/
To deny all bots from indexing any URL containing ‘monkey’ using a wildcard:
User-agent: * Disallow: /*monkey
To deny dynamic URLs that contain a ‘?’, use the following method:
User-agent: * Disallow: /*?
To specify which bot you want to block, you can change the User Agent. To deny Googlebot:
User-agent: Googlebot Disallow: /page.html Disallow: /folder/ Disallow: /*monkey Disallow: /*?
How to Remove a Page from Google that Has Been Indexed
To remove a page that has already been indexed by Google, use the noindex
directive within the HTML <meta>
tag of the page or serve it via the HTTP header using an X-Robots-Tag. You can then log in to Google Search Console, go to URL Removal Tool, and request removal of the URL.
Example using the <meta>
tag:
<meta name="robots" content="noindex">
Example using an X-Robots-Tag via the HTTP header:
X-Robots-Tag: noindex
When Not to Use a robots.txt
File
Since anyone can view your robots.txt
file, it’s important not to use it to block a private page or a page that hasn’t been linked from your website (in which case, bots wouldn’t be able to find it anyway).
Another issue is that not all dynamic URLs have a pattern that allows them to be easily blocked by a robots.txt
file. For such cases, you can use another method by setting an X-Robots-Tag.
Using X-Robots-Tag
Setting an X-Robots-Tag is a more discreet way of blocking a URL. You can test your page header using an HTTP request and response header tool.
With PHP, you can tell bots not to index, archive, show a snippet, or ‘nofollow’ the links on the page:
header("X-Robots-Tag: noindex, nofollow, noarchive, nosnippet", true);
Using a .htaccess
file, you can do the same using FilesMatch:
<FilesMatch "page\.html"> Header set X-Robots-Tag "noindex, noarchive, nosnippet" </FilesMatch>
By using these methods, you can better manage which pages of your website are indexed by search engines and keep private or sensitive information from being accessible through search results.