Robots.txt file

  • 2
  • Question
  • Updated 10 years ago
  • Answered
Google diagnostics have reported that 2 pages are blocked on my site. It states I should add a robots. txt.file to those pages. I have read your earlier answer to a smilar problem where you state do not add a robot txt file you you want the crawlers to crawl the site. This is only happening to childrens jewellery and adults jewllery pages. What can be done to fix the problem?
Photo of Candeez

Candeez

  • 21 Posts
  • 1 Reply Like

Posted 10 years ago

  • 2
Photo of Ed-a-Torials @ Honey Bear Playhomes

Ed-a-Torials @ Honey Bear Playhomes, Champion

  • 2582 Posts
  • 279 Reply Likes
Hi there Candeez

Sorry you are having problems

Have you gone to the Google webmaster tools site and clicked on the (Tools) button on the middle left and then clicked the (Analyze robot.txt) button and then in the very bottom box copy each of your URLs from every one of your pages and paste them one per line in the text box. Leave the drop down box alone for now then press (Check)

If all your URLs are green (Allowed) then you are fine. If the two that you had mentioned are not (Allowed) then you should check that you don't have a robot.txt file in your file manager that may be affecting those two pages. If it is there and you wish you could try removing it from your file manager then update your site and try the robot test once more.

As to the drop down box I said not to touch earlier. You can select from the list and test each one individually if you like.

I hope that offered some help but I am off to sleep now and real tired...

Good luck.

Ed
Photo of Monique

Monique, VP of Customer Support

  • 6294 Posts
  • 446 Reply Likes
Thanks for the suggestions Ed - definitely a good place to start. I have pursued another line of questioning below, because we are seeing a lot of cases where Google Webmaster tries to crawl our preview URL's (which are blocked by SynthaSite's robots.txt file in order to prevent Google from indexing unpublished sites). We usually find that the person has either submitted their preview URL to Google by mistake, or they have a link to their preview URL somewhere on their site. However, in Candeez's situation I don't know for sure if any of this is the case, so I need to get more information. I am very interested to find out the answer both to help in this instance, and in other similar cases!
Photo of Monique

Monique, VP of Customer Support

  • 6294 Posts
  • 446 Reply Likes
Hi Candeez

Does the error message from Google say: "Reason blocked: robots.txt file"? If so, Google is telling you to remove the robots.txt file, not asking you to add one.

What is the URL of the blocked pages? Please check that it is not your preview URL (the long URL you see when you preview your page from the site builder). Make sure it is your published URL - the URL you see in your browser window when you navigate to those pages on your published site.

If it is your preview URL that would explain why the pages are blocked, but I would like to know why Google Webmaster has attempted to crawl those pages.

If it is your published URL then I need to find out why Google is referencing a robots.txt file (unless you have uploaded one, of course)

I would really appreciate it if you could get back to me with the answer to these questions. Either post the exact error message from Google or email it to us. I would like to get to the bottom of exactly which URL's Google is referencing and why you are getting a message about a robots.txt file.
Photo of Candeez

Candeez

  • 21 Posts
  • 1 Reply Like
This is what I have got

Blocked URL Reason Blocked [?] Last Crawl Attempt
http:/ / ide. synthasite. com/ sites/ De77/ D73b/ D197/ D524/ U402881b21909fb5f0119425791b3777e/ 402881b21909fb5f011946fbcbf01a96/ Reason Robots.txt File Last crawled 23 Sep 2008
Adults_jewellery. php Robots.txt File 23 Sep 2008
Childrens_jewellery. php Robots.txt File 23 Sep 2008
Photo of Monique

Monique, VP of Customer Support

  • 6294 Posts
  • 446 Reply Likes
That message is telling you what I explained above: That you have robots.txt file on your site and you need to remove it in order for the crawler to access those pages. The first URL - the long one - is your preview URL and it is expected that this one will be blocked. So that is nothing to worry about.

The other two pages, however, should not be blocked. I also could not find a robots.txt file on your site (so I gather you have not tried to upload one yet? correct me if I am wrong). I am not sure why those two pages are being blocked. I will have to get some advice on this.

In the meantime please do not try to upload a robots.txt file as that is the exact opposite of what you want to be doing! I am sorry that the message is ambiguous. However, for the sake of anyone with the same question the message from Google is saying:
"That page on your site is blocked because you have a robots.txt file instructing the search engines not to crawl it - remove this file in order to allow access to this page."
NOT
"that page is blocked and you need to upload a robots.txt file in order to allow access."
Photo of Candeez

Candeez

  • 21 Posts
  • 1 Reply Like
Thanks for your response. I will await your further instructions.
Photo of Monique

Monique, VP of Customer Support

  • 6294 Posts
  • 446 Reply Likes
I had a colleague take a look at that message. It is a little hard to figure out without being able to see the actual table on your Google dashboard. However, it looks like the message is saying that Google attempted to crawl your preview URL and was blocked by a robots.txt file (as it should have been) and it successfully crawled your adults_jewelry.php and childrens_jewelry.php pages on 23 September. (Indicated by the "Last Crawl Attempt" and "Last Crawled" messages associated with the preview URL and the published URL's respectively.

I checked on this by googling some of the unique text on your page and your site came up first in the search results, which indicates that the page IS being crawled. See:



So basically I have no further instructions for you, other than to confirm that the correct pages on your site are being crawled. I hope this clears things up for you. Once again, I am sorry that it is a bit confusing but from everything that I can see things are functioning exactly as they should.