Robots.txt: What It Is, Why You Need It, How to Set It Up

Getting too many requests from Google crawlers? You need to start utilizing robots.txt (also known as robots exclusion protocol or standard). Tweaking this source code for your website pages is a very easy, simple, and quick way to up your SEO game.

What is Robots.txt?

Robots.txt is a file that tells Google’s crawlers which page they can or can’t request from your website. Using robots.txt files does not hide your page from the search engine results pages (SERPS), it just tells the robot crawlers how to go over your website in order to select content to show users.

What are Google crawlers, or crawler traffic? These robots – the Google crawler traffic – go through your website and mark certain headers and keywords that people search for. I should mention that it isn’t just Google either, all search engines use these files. And spammers sometimes use them too, unfortunately. But we’re only talking about the good guys here.

Why You Need Robots.txt

Robots.txt files are a way to stop an overwhelming amount of requests, and to manage what pages are being looked at in order to be indexed onto the SERPs. Most importantly, it’s a great way to enhance your SEO strategy.

Your site may not be allowing the robots to crawl it and you have no idea, and are sitting there clueless as to why you’re not showing up in the SERPs. This could be why! You want the crawlers to go through your site and pick up your pages and posts so that others can see them.

You don’t want Google to crawl every single URL, such as your backend URLS. Even if you don’t think you have any to worry about, it’s worth it to check, and to utilize this robots.txt in order to stop irrelevant pages from being crawled. Why? Because Google has a cap for how many pages it crawls. This means that in theory, all websites will have an equal share of opportunity for being put on the SERPs. You can’t put a zillion pages and say, “now I’ll definitely be ranked at the top.” Because those with more pages have the same maximum. And, it can actually hurt your ranking because it would take a long time for the bots to crawl your entire site. But, there are times when the limit is increased too…

There’s actually a lot about crawling limits that could take up an entire blog post. If you want to learn all the ins and outs from Google itself, check out their blog post on their crawling budget. Ultimately though, you want the Google robot to use its time on your website wisely, meaning only crawling your top pages.

How to Set Up Robots.txt

All you need to do to see if you have a robots.txt file already on your site is type in the URL for your site and add /robots.txt to see the source code for that page. This may turn up the actual source code, an empty page, or a 404 error. You can also use this Google tool to check it.

Whatever you find, you can change it if you need to. Likely it’s the default settings from when you created your website if you do find something. But if you don’t find that you have a robots.txt (by doing the method above), then you’ll need to set it up.

If you do have a robots.txt file, it’ll be in your root directory. Log in to your host account’s website then go to the file management or FTP section. When you find it, delete all the text within it but keep the file. If you’re using WordPress, you may have something come up when you search on the web for your robots.txt file, but then not see it in the root directory. WordPress sets up a virtual robots.txt when there isn’t one in the root directory, so you still need to create a new one here.

When creating a robots.txt file, use a plain text editor, and only a plain text editor. The simple beginning for your robots.txt file looks like this on the file:

User-agent: *

Disallow:

With this, your entire website is free to be crawled. Now, you want to put items in the disallow section to tell the crawlers what not to look at. For instance, you may not want the search engine robots to look through your login page, or a thank you for signing up page. This might look like:

User-agent: *

Disallow: /wp-admin/

Or

User-agent: *

Disallow: /thank-you/

Then, if you also wanted to block a page from being shown on the SERPs, you can put that on this robots.txt file like so:

User-agent: *

Disallow: /thank-you/

Noindex: /thank-you/

That’s a whole other discussion, but I thought I’d put in a small direction for that. Anyway, in order to understand all the particular coding syntax elements for creating robots.txt files, go to the source: read all the Google Rules on their blog.

If this doesn’t make sense to you, like I’m speaking a different language, or it seems way over your head, feel free to reach out to us at Firon Marketing. Our website developers, designers, and experts can shape up your website from looks to SEO no problem.