How to create a robots.txt file for a website - correct robots
· Время на чтение: 11мин · - · Опубликовано · ОбновленоDetailed instructions on how to create a robots.txt file for site. Robots.txt is one of the most essential aspects of a full-fledged search engine optimization of the site and the security of your site. By observing the conditions for the proper use of this file, you can achieve a certain positive effect on the site.
It is possible to specify a variety of instructions for most PSs. Which indicate to the search bot the necessary restrictions or permissions to scan pages, directories or sections of the site.
The content of the article:
- Robots.txt file main definition
- Why Robots.txt is needed on the site - its impact on search engine promotion
- How to create a Robors.txt file yourself plus examples
- How to create a Robots.txt file using online services
- Editing and Correct Syntax of the Robots.txt File
- Proper configuration of the Robots.txt file - correct spelling of commands
- Conclusion
Robots.txt File - Basic Definition
Robots.txt - has certain exception standards for search agents (bots), which was adopted in January 1944. The file is somewhat reminiscent of the .htaccess file (rules are also written into it). The rules of this file are voluntarily followed by the most common PS. The file can consist of one or more rules, each of which blocks or allows the crawler to access certain paths on the site.
By default, this file is not on the site - which gives all PSs full permission to index all of the site's content. Such permission can lead to the inclusion of important technical pages of the site in the search engine index, which should not be there.
Why do we need Robots.txt on the site - its impact on promotion in search engines
Robots.txt is the most important factor in search engine optimization of a site. Thanks to a properly written set of rules for search bots, you can achieve a certain increase in the ranking of a site in search. What do these instructions give:
- Locked for indexing certain pages, sections, directories of the site;
- Exclusion of pages that do not contain useful content;
- Elimination of duplicate pages and more.
For most sites, such indexing restrictions are simply necessary; small full-page sites are optional. However, certain directives must be added to each site. For example, bans on indexing:
- Registration pages, admin login, password recovery;
- Technical catalogs;
- Rss - site feeds;
- Replytocom and more.
How to create a Robors.txt file yourself plus examples
Difficulties in creating a Robots.txt file can not arise even for beginners. It is enough to follow a certain sequence of actions:
- Robots.txt is a text document and is created by any available text editor;
- The file extension must be .txt;
- The name is mandatory robots;
- Per site, only one such file is allowed;
- Placed only in the root directory of the site;
You need to use an ordinary text editor (notepad as an alternative). We create a .txt document with the name robots. Then we save and transfer this document using an FTP client to the root directory of the site. These are the main steps to follow.
Examples of standard Robots.txt for popular CMS
Robots.txt example for amiro.cms:
An example of robots.txt for bitrix:
Robots.txt example for dle:
Drupal robots.txt example:
Robots.txt example for hostcms:
Robots.txt example for joomla3:
Example robots.txt for joomla:
Robots.txt example for modx evo:
Robots.txt example for modx:
Robots.txt example for netcat:
Robots.txt example for opencat:
Robots.txt example for typo3:
Robots.txt example for umi:
Example robots.txt for wordpress:
Here is an example of my WordPress CMS site file:
# robots.txt User-Agent: * Disallow: /wp-json/ Disallow: /wp-login.php Disallow: /wp-register.php Disallow: /xmlrpc.php Disallow: /template.html Disallow: /wp-admin Disallow: */trackback Disallow: */comments* Disallow: *comments_* Disallow: /search Disallow: /author/* Disallow: /users/ Disallow: /*?replytocom Disallow: /*?replytocom* Disallow: /comment-page * Disallow: */tag/* Disallow: /tag/* Disallow: /?s=* Disallow: /?s= Disallow: */feed Disallow: */rss Disallow: */embed Disallow: /?p= Disallow: *.php Disallow: /ads.txt Disallow: */stylesheet Disallow: */stylesheet* Allow: /wp-content/uploads/ Allow: /wp-includes Allow: /wp-content Allow: */uploads Allow: /* /*.js Allow: /*/*.css Allow: /wp-*.png Allow: /wp-*.jpg Allow: /wp-*.jpeg Allow: /wp-*.gif Allow: /wp-admin /admin-ajax.php User-agent: Yandex Disallow: /wp-json/ Disallow: /wp-login.php Disallow: /wp-register.php Disallow: /xmlrpc.php Disallow: /template.html Disallow: /wp -admin Disallow: */trackback Disallow: */comments* Disallow: *comments_* Disallow: /search Disallow: /author/* Disallow: /users/ Disallow: /*?replytocom Disallow: /*?replytocom* Disallow: /comment -page* Disallow: */tag/* Disallow: /tag/* Disallow: /?s=* Disallow: /?s= Disallow: */feed Disallow: */rss Disallow: */embed Disallow: /?s= Disallow: /?p= Disallow: *.php Disallow: /ads.txt Disallow: */amp Disallow: */amp? Disallow: */amp/ Disallow: */stylesheet Disallow: */stylesheet* Allow: /wp-content/uploads/ Allow: /wp-includes Allow: /wp-content Allow: */uploads Allow: /*/*. js Allow: /*/*.css Allow: /wp-*.png Allow: /wp-*.jpg Allow: /wp-*.jpeg Allow: /wp-*.gif Allow: /wp-admin/admin- ajax.php User-agent: Mail.Ru Disallow: /wp-json/ Disallow: /wp-login.php Disallow: /wp-register.php Disallow: /xmlrpc.php Disallow: /template.html Disallow: /wp- admin Disallow: */trackback Disallow: */comments* Disallow: *comments_* Disallow: /search Disallow: /author/* Disallow: /users/ Disallow: /*?replytocom Disallow: /*?replytocom* Disallow: /comment- page* Disallow: */tag/* Disallow: /tag/* Disallow: /?s=* Disallow: /?s= Disallow: */feed Disallow: */rss Disallow: */embed Disallow: /?s= Disallow : /?p= Disallow: *.php Disallow: /ads.txt Disallow: */stylesheet Disallow: */stylesheet* Allow: /wp-content/uploads/ Allow: /wp-includes Allow: /wp-content Allow: */uploads Allow: /*/*.js Allow: /*/*.css Allow: /wp-*.png Allow: /wp-*.jpg Allow: /wp-*.jpeg Allow: /wp-*. gif Allow: /wp-admin/admin-ajax.php User-agent: ia_archiver Disallow: /wp-json/ Disallow: /wp-login.php Disallow: /wp-register.php Disallow: /xmlrpc.php Disallow: / template.html Disallow: /wp-admin Disallow: */trackback Disallow: */comments* Disallow: *comments_* Disallow: /search Disallow: /author/* Disallow: /users/ Disallow: /*?replytocom Disallow: /* ?replytocom* Disallow: /comment-page* Disallow: */tag/* Disallow: /tag/* Disallow: /?s=* Disallow: /?s= Disallow: */feed Disallow: */rss Disallow: */ embed Disallow: /?s= Disallow: /?p= Disallow: *.php Disallow: /ads.txt Disallow: */stylesheet Disallow: */stylesheet* Allow: */?amp Allow: /wp-content/uploads/ Allow: /wp-includes Allow: /wp-content Allow: */uploads Allow: /*/*.js Allow: /*/*.css Allow: /wp-*.png Allow: /wp-*.jpg Allow : /wp-*.jpeg Allow: /wp-*.gif Allow: /wp-admin/admin-ajax.php User-agent: SputnikBot Disallow: /wp-json/ Disallow: /wp-login.php Disallow: / wp-register.php Disallow: /xmlrpc.php Disallow: /template.html Disallow: /wp-admin Disallow: */trackback Disallow: */comments* Disallow: *comments_* Disallow: /search Disallow: /author/* Disallow : /users/ Disallow: /*?replytocom Disallow: /*?replytocom* Disallow: /comment-page* Disallow: */tag/* Disallow: /tag/* Disallow: /?s=* Disallow: /?s= Disallow: */feed Disallow: */rss Disallow: */embed Disallow: /?s= Disallow: /?p= Disallow: *.php Disallow: /ads.txt Disallow: */stylesheet Disallow: */stylesheet* Allow : */?amp Allow: /wp-content/uploads/ Allow: /wp-includes Allow: /wp-content Allow: */uploads Allow: /*/*.js Allow: /*/*.css Allow: / wp-*.png Allow: /wp-*.jpg Allow: /wp-*.jpeg Allow: /wp-*.gif Allow: /wp-admin/admin-ajax.php User-agent: Bingbot Disallow: /wp -json/ Disallow: /wp-login.php Disallow: /wp-register.php Disallow: /xmlrpc.php Disallow: /template.html Disallow: /wp-admin Disallow: */trackback Disallow: */comments* Disallow: *comments_* Disallow: /search Disallow: /author/* Disallow: /users/ Disallow: /*?replytocom Disallow: /*?replytocom* Disallow: /comment-page* Disallow: */tag/* Disallow: /tag/ * Disallow: /?s=* Disallow: /?s= Disallow: */feed Disallow: */rss Disallow: */embed Disallow: /?s= Disallow: /?p= Disallow: *.php Disallow: /ads .txt Disallow: */stylesheet Disallow: */stylesheet* Allow: */?amp Allow: /wp-content/uploads/ Allow: /wp-includes Allow: /wp-content Allow: */uploads Allow: /*/ *.js Allow: /*/*.css Allow: /wp-*.png Allow: /wp-*.jpg Allow: /wp-*.jpeg Allow: /wp-*.gif Allow: /wp-admin/ admin-ajax.php User-agent: Googlebot Disallow: /wp-json/ Disallow: /wp-login.php Disallow: /wp-register.php Disallow: /xmlrpc.php Disallow: /template.html Disallow: /wp- admin Disallow: */trackback Disallow: */comments* Disallow: *comments_* Disallow: /search Disallow: /author/* Disallow: /users/ Disallow: /*?replytocom Disallow: /*?replytocom* Disallow: /comment- page* Disallow: */tag/* Disallow: /tag/* Disallow: /?s=* Disallow: /?s= Disallow: */feed Disallow: */rss Disallow: */embed Disallow: /?s= Disallow : /?p= Disallow: *.php Disallow: */stylesheet Disallow: */stylesheet* Allow: */?amp Allow: */*/?amp Allow: */tag/?amp Allow: */page/? amp Allow: /wp-content/uploads/ Allow: /wp-includes Allow: /wp-content Allow: */uploads Allow: /*/*.js Allow: /*/*.css Allow: /wp-*. png Allow: /wp-*.jpg Allow: /wp-*.jpeg Allow: /wp-*.gif Allow: /wp-admin/admin-ajax.php User-agent: Googlebot-Image Allow: /wp-content /uploads/ User-agent: Yandex-Images Allow: /wp-content/uploads/ User-agent: Mail.Ru-Images Allow: /wp-content/uploads/ User-agent: ia_archiver-Images Allow: /wp-content /uploads/ User-agent: Bingbot-Images Allow: /wp-content/uploads/ Host: https://nicola.top Sitemap: https://nicola.top/sitemap_index.xml Sitemap: https://nicola.top /?feed=googleimagesitemap
I hope that it will be useful to you. Please apply the rules according to your site considerations. Each resource should have its own approach.
At the moment my file is shortened to generic. You can get acquainted with it by going to nicola.top/robots.txt
How to create a Robots.txt file using online services
This method is the easiest and fastest, suitable for those who are afraid to create Robots.txt on their own or are simply lazy. There are a lot of services offering the creation of this file. But it is worth considering some nuances regarding this method. Eg:
- It is necessary to take into account in advance what exactly you want to prohibit or allow the agent.
- Mandatory verification of the finished file is required before uploading it to the site.
- Be careful, because an incorrectly created Robots.txt online file will lead to a deplorable situation. Thus, technical and other pages of the site, which a priori should not be there, can get into the search.
- All the same, it is better to spend time and effort to create a correct custom robot. In this way, you can recreate a well-grounded structure of prohibitions and permissions appropriate for your site.
Editing and Correct Syntax of the Robots.txt File
After successfully created Robots.txt, you can easily edit and change it as you like. In this case, some rules and competent syntax should be taken into account. Over time, you will change this file repeatedly. But do not forget, after editing, you will need to upload this file to the site. Thus, updating its content for search robots.
Writing Robots.txt is very simple, the reason for this is the rather simple structure of the design of this file. The main thing when writing rules is to use a strictly defined syntax. These rules are voluntarily followed by almost all major PSs. Here is a list of some rules to avoid most errors in the Robots.txt file:
- There must not be more than one specified directive on one line;
- Each rule starts on a new line;
- A space at the beginning of a line has been removed;
- Comments are allowed after the # character;
- Empty Robots will count as full indexing permission;
- The name of this file is only possible in the valid format “robots”;
- The file size should not exceed 32kb;
- Only one rule is allowed in the Allow and Disallow directives. An empty value after Allow: or Disallow: is equivalent to full permission;
- All rules must be written in lower case;
- The file must always be available;
- An empty line after the specified rules indicates the complete end of the rules of the User-agent directive;
- It is desirable to prescribe the rules for each PS separately;
- If the rule is a site directory, then be sure to put a slash (/) before its beginning;
- There should be no quotes in a string or in a rule;
- It is necessary to consider a strict structure of rules that matches your site no more;
- Robots.txt should be minimalistic and clearly convey the intended meaning;
Proper configuration of the Robots.txt file - correct spelling of commands
To get a positive result when using robots, you need to properly configure it. All the main commands of this file with instructions are followed by the largest search engines Google and Yandex. Other PSs may ignore some instructions. How to make robots.txt more responsive to most search engines? Here you need to understand the basic rules for working with this file, which were discussed above.
Consider the basic commands:
- User Agent: * — instructions will apply to absolutely all ps bots. It is also possible to specify certain search engines separately, for example: User-Agent: GoogleBot and User-Agent: YandexBot. Thus, the rules for important PSs are correctly designated.
- Disallow: - completely prohibits crawling and indexing (of a page, directory or files).
- allow: - fully allows crawling and indexing (of a page, directory or files).
- Clean Param: - needed to exclude site pages with dynamic content. Thanks to this rule, you can get rid of duplicate content on the site.
- Crawl delay: - the rule specifies the time interval for p-bots to download documents from the site. Allows you to significantly reduce the load on the server. For example: “Crawl-delay: 5” will tell the n-robot that downloading documents from the site is possible no more than once every 5 seconds.
- Host: your_site.ru - Responsible for the main site mirror. In this directive, you must specify the priority version of the site.
- Sitemap: http://your_site.ru/sitemap.xml - as you might guess, this directive tells the p-bot about the presence of a Sitemap on the site.
- # - allows you to leave comments. You can comment only after the pound sign. It can be placed both on a new line and as a continuation of the directive. All of these options will be ignored by bots when passing instructions.
How to check Robots.txt using Google or Yandex
Strangely enough, only Google or Yandex webmaster panels are needed to check this file. Which in turn makes it much easier to find errors.
- Google Webmaster - select "Scanning" in the left menu and then the "Robots.txt File Check Tool" tab. Then, in the bottom line of the window that appears, add the name of the file. Then click on "Check" and see how the Google bot sees your robots.
- Yandex Webmaster - in the left menu, select "Tools" and "Analysis of Robots.txt". After that, in the window that appears, simply click on the “Check” button.
It is worth noting that there are a lot of online validators for checking this file. I talked about the most affordable ones that are always at hand.
Conclusion
It is impossible to write one perfect robots for all sites. The reason for this is the sites themselves, some of which are made by hand and others are located on different CMS. Absolutely all sites have a different directory structure and other things.
Therefore, each Webmaster is simply obliged to create his own unique set of rules for p-bots. Such a file will meet your priorities and will not allow you to get into the search for confidential information. Thanks to this, the index will contain high-quality content without unnecessary garbage. I recommend that you also set up the necessary redirects on your site. This will avoid duplicates, transfer weight to the necessary pages.
Reading this article:
Thanks for reading: SEO HELPER | NICOLA.TOP
I enjoy, cause I found exactly what I used to be taking a look for.
You've ended my four day lengthy hunt! God Bless you man. Have a great day.
Bye
Ahaa, its pleasant dialogue concerning this post at this place at this blog, I have read
all that, so now I am also commenting at this place.
If you would like to grow your familiarity only keep visiting this website and be updated with the latest news
update posted here.
Hello, I enjoy reading all of your article. I wanted to write a little comment to support you.
Hello very nice website!! Man.. Excellent.. Wonderful..
I will bookmark your website and take the feeds additionally?
I'm glad to search out a lot of useful info here within the post, we need to develop more strategies on this regard,
thank you for sharing. . . . . .