How to create a robots.txt file for a website - correct robots

print · Время на чтение: 11мин · - · Опубликовано · Обновлено

playListen to this article

How to create a robots.txt file.

Detailed instructions on how to create a robots.txt file for site. Robots.txt is one of the most essential aspects of a full-fledged search engine optimization of the site and the security of your site. By observing the conditions for the proper use of this file, you can achieve a certain positive effect on the site.

It is possible to specify a variety of instructions for most PSs. Which indicate to the search bot the necessary restrictions or permissions to scan pages, directories or sections of the site.

The content of the article:

Robots.txt File - Basic Definition

Robots.txt - has certain exception standards for search agents (bots), which was adopted in January 1944. The file is somewhat reminiscent of the .htaccess file (rules are also written into it). The rules of this file are voluntarily followed by the most common PS. The file can consist of one or more rules, each of which blocks or allows the crawler to access certain paths on the site.

By default, this file is not on the site - which gives all PSs full permission to index all of the site's content. Such permission can lead to the inclusion of important technical pages of the site in the search engine index, which should not be there.

Why do we need Robots.txt on the site - its impact on promotion in search engines

Robots.txt is the most important factor in search engine optimization of a site. Thanks to a properly written set of rules for search bots, you can achieve a certain increase in the ranking of a site in search. What do these instructions give:

  1. Locked for indexing certain pages, sections, directories of the site;
  2. Exclusion of pages that do not contain useful content;
  3. Elimination of duplicate pages and more.

For most sites, such indexing restrictions are simply necessary; small full-page sites are optional. However, certain directives must be added to each site. For example, bans on indexing:

  1. Registration pages, admin login, password recovery;
  2. Technical catalogs;
  3. Rss - site feeds;
  4. Replytocom and more.

How to create a Robors.txt file yourself plus examples

Difficulties in creating a Robots.txt file can not arise even for beginners. It is enough to follow a certain sequence of actions:

  1. Robots.txt is a text document and is created by any available text editor;
  2. The file extension must be .txt;
  3. The name is mandatory robots;
  4. Per site, only one such file is allowed;
  5. Placed only in the root directory of the site;

You need to use an ordinary text editor (notepad as an alternative). We create a .txt document with the name robots. Then we save and transfer this document using an FTP client to the root directory of the site. These are the main steps to follow.

Examples of standard Robots.txt for popular CMS

Robots.txt example for amiro.cms:

robots.txt for amiro.cms.

An example of robots.txt for bitrix:

robots.txt for bitrix.

robots.txt for bitrix1.

Robots.txt example for dle:

robots.txt for dle.

Drupal robots.txt example:

robots.txt for drupal.

robots.txt for drupal1.

Robots.txt example for hostcms:

robots.txt for hostcms.

Robots.txt example for joomla3:

robots.txt for joomla3.

Example robots.txt for joomla:

robots.txt for joomla.

Robots.txt example for modx evo:

robots.txt for modx evo.

Robots.txt example for modx:

robots.txt for modx.

Robots.txt example for netcat:

robots.txt for netcat.

Robots.txt example for opencat:

robots.txt for opencat.

Robots.txt example for typo3:

robots.txt for typo3.

Robots.txt example for umi:

robots.txt for umi.

Example robots.txt for wordpress:

robots.txt for wordpress.

Here is an example of my WordPress CMS site file:

# robots.txt User-Agent: * Disallow: /wp-json/ Disallow: /wp-login.php Disallow: /wp-register.php Disallow: /xmlrpc.php Disallow: /template.html Disallow: /wp-admin Disallow: */trackback Disallow: */comments* Disallow: *comments_* Disallow: /search Disallow: /author/* Disallow: /users/ Disallow: /*?replytocom Disallow: /*?replytocom* Disallow: /comment-page * Disallow: */tag/* Disallow: /tag/* Disallow: /?s=* Disallow: /?s= Disallow: */feed Disallow: */rss Disallow: */embed Disallow: /?p= Disallow: *.php Disallow: /ads.txt Disallow: */stylesheet Disallow: */stylesheet* Allow: /wp-content/uploads/ Allow: /wp-includes Allow: /wp-content Allow: */uploads Allow: /* /*.js Allow: /*/*.css Allow: /wp-*.png Allow: /wp-*.jpg Allow: /wp-*.jpeg Allow: /wp-*.gif Allow: /wp-admin /admin-ajax.php User-agent: Yandex Disallow: /wp-json/ Disallow: /wp-login.php Disallow: /wp-register.php Disallow: /xmlrpc.php Disallow: /template.html Disallow: /wp -admin Disallow: */trackback Disallow: */comments* Disallow: *comments_* Disallow: /search Disallow: /author/* Disallow: /users/ Disallow: /*?replytocom Disallow: /*?replytocom* Disallow: /comment -page* Disallow: */tag/* Disallow: /tag/* Disallow: /?s=* Disallow: /?s= Disallow: */feed Disallow: */rss Disallow: */embed Disallow: /?s= Disallow: /?p= Disallow: *.php Disallow: /ads.txt Disallow: */amp Disallow: */amp? Disallow: */amp/ Disallow: */stylesheet Disallow: */stylesheet* Allow: /wp-content/uploads/ Allow: /wp-includes Allow: /wp-content Allow: */uploads Allow: /*/*. js Allow: /*/*.css Allow: /wp-*.png Allow: /wp-*.jpg Allow: /wp-*.jpeg Allow: /wp-*.gif Allow: /wp-admin/admin- ajax.php User-agent: Mail.Ru Disallow: /wp-json/ Disallow: /wp-login.php Disallow: /wp-register.php Disallow: /xmlrpc.php Disallow: /template.html Disallow: /wp- admin Disallow: */trackback Disallow: */comments* Disallow: *comments_* Disallow: /search Disallow: /author/* Disallow: /users/ Disallow: /*?replytocom Disallow: /*?replytocom* Disallow: /comment- page* Disallow: */tag/* Disallow: /tag/* Disallow: /?s=* Disallow: /?s= Disallow: */feed Disallow: */rss Disallow: */embed Disallow: /?s= Disallow : /?p= Disallow: *.php Disallow: /ads.txt Disallow: */stylesheet Disallow: */stylesheet* Allow: /wp-content/uploads/ Allow: /wp-includes Allow: /wp-content Allow: */uploads Allow: /*/*.js Allow: /*/*.css Allow: /wp-*.png Allow: /wp-*.jpg Allow: /wp-*.jpeg Allow: /wp-*. gif Allow: /wp-admin/admin-ajax.php User-agent: ia_archiver Disallow: /wp-json/ Disallow: /wp-login.php Disallow: /wp-register.php Disallow: /xmlrpc.php Disallow: / template.html Disallow: /wp-admin Disallow: */trackback Disallow: */comments* Disallow: *comments_* Disallow: /search Disallow: /author/* Disallow: /users/ Disallow: /*?replytocom Disallow: /* ?replytocom* Disallow: /comment-page* Disallow: */tag/* Disallow: /tag/* Disallow: /?s=* Disallow: /?s= Disallow: */feed Disallow: */rss Disallow: */ embed Disallow: /?s= Disallow: /?p= Disallow: *.php Disallow: /ads.txt Disallow: */stylesheet Disallow: */stylesheet* Allow: */?amp Allow: /wp-content/uploads/ Allow: /wp-includes Allow: /wp-content Allow: */uploads Allow: /*/*.js Allow: /*/*.css Allow: /wp-*.png Allow: /wp-*.jpg Allow : /wp-*.jpeg Allow: /wp-*.gif Allow: /wp-admin/admin-ajax.php User-agent: SputnikBot Disallow: /wp-json/ Disallow: /wp-login.php Disallow: / wp-register.php Disallow: /xmlrpc.php Disallow: /template.html Disallow: /wp-admin Disallow: */trackback Disallow: */comments* Disallow: *comments_* Disallow: /search Disallow: /author/* Disallow : /users/ Disallow: /*?replytocom Disallow: /*?replytocom* Disallow: /comment-page* Disallow: */tag/* Disallow: /tag/* Disallow: /?s=* Disallow: /?s= Disallow: */feed Disallow: */rss Disallow: */embed Disallow: /?s= Disallow: /?p= Disallow: *.php Disallow: /ads.txt Disallow: */stylesheet Disallow: */stylesheet* Allow : */?amp Allow: /wp-content/uploads/ Allow: /wp-includes Allow: /wp-content Allow: */uploads Allow: /*/*.js Allow: /*/*.css Allow: / wp-*.png Allow: /wp-*.jpg Allow: /wp-*.jpeg Allow: /wp-*.gif Allow: /wp-admin/admin-ajax.php User-agent: Bingbot Disallow: /wp -json/ Disallow: /wp-login.php Disallow: /wp-register.php Disallow: /xmlrpc.php Disallow: /template.html Disallow: /wp-admin Disallow: */trackback Disallow: */comments* Disallow: *comments_* Disallow: /search Disallow: /author/* Disallow: /users/ Disallow: /*?replytocom Disallow: /*?replytocom* Disallow: /comment-page* Disallow: */tag/* Disallow: /tag/ * Disallow: /?s=* Disallow: /?s= Disallow: */feed Disallow: */rss Disallow: */embed Disallow: /?s= Disallow: /?p= Disallow: *.php Disallow: /ads .txt Disallow: */stylesheet Disallow: */stylesheet* Allow: */?amp Allow: /wp-content/uploads/ Allow: /wp-includes Allow: /wp-content Allow: */uploads Allow: /*/ *.js Allow: /*/*.css Allow: /wp-*.png Allow: /wp-*.jpg Allow: /wp-*.jpeg Allow: /wp-*.gif Allow: /wp-admin/ admin-ajax.php User-agent: Googlebot Disallow: /wp-json/ Disallow: /wp-login.php Disallow: /wp-register.php Disallow: /xmlrpc.php Disallow: /template.html Disallow: /wp- admin Disallow: */trackback Disallow: */comments* Disallow: *comments_* Disallow: /search Disallow: /author/* Disallow: /users/ Disallow: /*?replytocom Disallow: /*?replytocom* Disallow: /comment- page* Disallow: */tag/* Disallow: /tag/* Disallow: /?s=* Disallow: /?s= Disallow: */feed Disallow: */rss Disallow: */embed Disallow: /?s= Disallow : /?p= Disallow: *.php Disallow: */stylesheet Disallow: */stylesheet* Allow: */?amp Allow: */*/?amp Allow: */tag/?amp Allow: */page/? amp Allow: /wp-content/uploads/ Allow: /wp-includes Allow: /wp-content Allow: */uploads Allow: /*/*.js Allow: /*/*.css Allow: /wp-*. png Allow: /wp-*.jpg Allow: /wp-*.jpeg Allow: /wp-*.gif Allow: /wp-admin/admin-ajax.php User-agent: Googlebot-Image Allow: /wp-content /uploads/ User-agent: Yandex-Images Allow: /wp-content/uploads/ User-agent: Mail.Ru-Images Allow: /wp-content/uploads/ User-agent: ia_archiver-Images Allow: /wp-content /uploads/ User-agent: Bingbot-Images Allow: /wp-content/uploads/ Host: https://nicola.top Sitemap: https://nicola.top/sitemap_index.xml Sitemap: https://nicola.top /?feed=googleimagesitemap

I hope that it will be useful to you. Please apply the rules according to your site considerations. Each resource should have its own approach.

At the moment my file is shortened to generic. You can get acquainted with it by going to nicola.top/robots.txt

How to create a Robots.txt file using online services

This method is the easiest and fastest, suitable for those who are afraid to create Robots.txt on their own or are simply lazy. There are a lot of services offering the creation of this file. But it is worth considering some nuances regarding this method. Eg:

  • It is necessary to take into account in advance what exactly you want to prohibit or allow the agent.
  • Mandatory verification of the finished file is required before uploading it to the site.
  • Be careful, because an incorrectly created Robots.txt online file will lead to a deplorable situation. Thus, technical and other pages of the site, which a priori should not be there, can get into the search.
  • All the same, it is better to spend time and effort to create a correct custom robot. In this way, you can recreate a well-grounded structure of prohibitions and permissions appropriate for your site.

Editing and Correct Syntax of the Robots.txt File

After successfully created Robots.txt, you can easily edit and change it as you like. In this case, some rules and competent syntax should be taken into account. Over time, you will change this file repeatedly. But do not forget, after editing, you will need to upload this file to the site. Thus, updating its content for search robots.

Writing Robots.txt is very simple, the reason for this is the rather simple structure of the design of this file. The main thing when writing rules is to use a strictly defined syntax. These rules are voluntarily followed by almost all major PSs. Here is a list of some rules to avoid most errors in the Robots.txt file:

  1. There must not be more than one specified directive on one line;
  2. Each rule starts on a new line;
  3. A space at the beginning of a line has been removed;
  4. Comments are allowed after the # character;
  5. Empty Robots will count as full indexing permission;
  6. The name of this file is only possible in the valid format “robots”;
  7. The file size should not exceed 32kb;
  8. Only one rule is allowed in the Allow and Disallow directives. An empty value after Allow: or Disallow: is equivalent to full permission;
  9. All rules must be written in lower case;
  10. The file must always be available;
  11. An empty line after the specified rules indicates the complete end of the rules of the User-agent directive;
  12. It is desirable to prescribe the rules for each PS separately;
  13. If the rule is a site directory, then be sure to put a slash (/) before its beginning;
  14. There should be no quotes in a string or in a rule;
  15. It is necessary to consider a strict structure of rules that matches your site no more;
  16. Robots.txt should be minimalistic and clearly convey the intended meaning;

Proper configuration of the Robots.txt file - correct spelling of commands

To get a positive result when using robots, you need to properly configure it. All the main commands of this file with instructions are followed by the largest search engines Google and Yandex. Other PSs may ignore some instructions. How to make robots.txt more responsive to most search engines? Here you need to understand the basic rules for working with this file, which were discussed above.
Consider the basic commands:

  • User Agent: * — instructions will apply to absolutely all ps bots. It is also possible to specify certain search engines separately, for example: User-Agent: GoogleBot and User-Agent: YandexBot. Thus, the rules for important PSs are correctly designated.
  • Disallow: - completely prohibits crawling and indexing (of a page, directory or files).
  • allow: - fully allows crawling and indexing (of a page, directory or files).
  • Clean Param: - needed to exclude site pages with dynamic content. Thanks to this rule, you can get rid of duplicate content on the site.
  • Crawl delay: - the rule specifies the time interval for p-bots to download documents from the site. Allows you to significantly reduce the load on the server. For example: “Crawl-delay: 5” will tell the n-robot that downloading documents from the site is possible no more than once every 5 seconds.
  • Host: your_site.ru - Responsible for the main site mirror. In this directive, you must specify the priority version of the site.
  • Sitemap: http://your_site.ru/sitemap.xml - as you might guess, this directive tells the p-bot about the presence of a Sitemap on the site.
  • # - allows you to leave comments. You can comment only after the pound sign. It can be placed both on a new line and as a continuation of the directive. All of these options will be ignored by bots when passing instructions.

How to check Robots.txt using Google or Yandex

Strangely enough, only Google or Yandex webmaster panels are needed to check this file. Which in turn makes it much easier to find errors.

  • Google Webmaster - select "Scanning" in the left menu and then the "Robots.txt File Check Tool" tab. Then, in the bottom line of the window that appears, add the name of the file. Then click on "Check" and see how the Google bot sees your robots.
  • Yandex Webmaster - in the left menu, select "Tools" and "Analysis of Robots.txt". After that, in the window that appears, simply click on the “Check” button.

It is worth noting that there are a lot of online validators for checking this file. I talked about the most affordable ones that are always at hand.

Conclusion

It is impossible to write one perfect robots for all sites. The reason for this is the sites themselves, some of which are made by hand and others are located on different CMS. Absolutely all sites have a different directory structure and other things.

Therefore, each Webmaster is simply obliged to create his own unique set of rules for p-bots. Such a file will meet your priorities and will not allow you to get into the search for confidential information. Thanks to this, the index will contain high-quality content without unnecessary garbage. I recommend that you also set up the necessary redirects on your site. This will avoid duplicates, transfer weight to the necessary pages.

Reading this article:

Thanks for reading: SEO HELPER | NICOLA.TOP

How useful was this post?

Click on a star to rate it!

Average rating 5 / 5. Vote count: 219

No votes so far! Be the first to rate this post.

Читайте также:

5 Responses

  1. Alycia says:

    I enjoy, cause I found exactly what I used to be taking a look for.
    You've ended my four day lengthy hunt! God Bless you man. Have a great day.
    Bye

  2. Dessie says:

    Ahaa, its pleasant dialogue concerning this post at this place at this blog, I have read
    all that, so now I am also commenting at this place.

  3. Nicolas says:

    If you would like to grow your familiarity only keep visiting this website and be updated with the latest news
    update posted here.

  4. Ana says:

    Hello, I enjoy reading all of your article. I wanted to write a little comment to support you.

  5. Luigi says:

    Hello very nice website!! Man.. Excellent.. Wonderful..
    I will bookmark your website and take the feeds additionally?
    I'm glad to search out a lot of useful info here within the post, we need to develop more strategies on this regard,
    thank you for sharing. . . . . .

Добавить комментарий

Your email address will not be published. Обязательные поля помечены *

15 + 8 =