What does double slash (//) directory mean in robots.txt?











up vote
2
down vote

favorite












You will get the following output with:



curl https://www.ibm.com/robots.txt


I delete many lines, keeping only part of it.



User-agent: *
Disallow: //
Disallow: /account/registration
Disallow: /account/mypro
Disallow: /account/myint

# Added to block site mirroring
User-agent: HTTrack
Disallow: /
#


I understand that / means root directory, but what does double slash // directory mean here in robots.txt?










share|improve this question




















  • 2




    It could be a typo, I can't find a single reference to a double slash in any of the official Robot Exclusion documents.
    – Michael Frank
    Nov 27 at 1:51










  • @MichaelFrank Typo or a coding fluke made by an automated system generating a robots.txt on demand.
    – JakeGould
    Nov 27 at 2:00















up vote
2
down vote

favorite












You will get the following output with:



curl https://www.ibm.com/robots.txt


I delete many lines, keeping only part of it.



User-agent: *
Disallow: //
Disallow: /account/registration
Disallow: /account/mypro
Disallow: /account/myint

# Added to block site mirroring
User-agent: HTTrack
Disallow: /
#


I understand that / means root directory, but what does double slash // directory mean here in robots.txt?










share|improve this question




















  • 2




    It could be a typo, I can't find a single reference to a double slash in any of the official Robot Exclusion documents.
    – Michael Frank
    Nov 27 at 1:51










  • @MichaelFrank Typo or a coding fluke made by an automated system generating a robots.txt on demand.
    – JakeGould
    Nov 27 at 2:00













up vote
2
down vote

favorite









up vote
2
down vote

favorite











You will get the following output with:



curl https://www.ibm.com/robots.txt


I delete many lines, keeping only part of it.



User-agent: *
Disallow: //
Disallow: /account/registration
Disallow: /account/mypro
Disallow: /account/myint

# Added to block site mirroring
User-agent: HTTrack
Disallow: /
#


I understand that / means root directory, but what does double slash // directory mean here in robots.txt?










share|improve this question















You will get the following output with:



curl https://www.ibm.com/robots.txt


I delete many lines, keeping only part of it.



User-agent: *
Disallow: //
Disallow: /account/registration
Disallow: /account/mypro
Disallow: /account/myint

# Added to block site mirroring
User-agent: HTTrack
Disallow: /
#


I understand that / means root directory, but what does double slash // directory mean here in robots.txt?







linux home-directory






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 27 at 1:52









JakeGould

30.8k1093136




30.8k1093136










asked Nov 27 at 1:44









scrapy

1885




1885








  • 2




    It could be a typo, I can't find a single reference to a double slash in any of the official Robot Exclusion documents.
    – Michael Frank
    Nov 27 at 1:51










  • @MichaelFrank Typo or a coding fluke made by an automated system generating a robots.txt on demand.
    – JakeGould
    Nov 27 at 2:00














  • 2




    It could be a typo, I can't find a single reference to a double slash in any of the official Robot Exclusion documents.
    – Michael Frank
    Nov 27 at 1:51










  • @MichaelFrank Typo or a coding fluke made by an automated system generating a robots.txt on demand.
    – JakeGould
    Nov 27 at 2:00








2




2




It could be a typo, I can't find a single reference to a double slash in any of the official Robot Exclusion documents.
– Michael Frank
Nov 27 at 1:51




It could be a typo, I can't find a single reference to a double slash in any of the official Robot Exclusion documents.
– Michael Frank
Nov 27 at 1:51












@MichaelFrank Typo or a coding fluke made by an automated system generating a robots.txt on demand.
– JakeGould
Nov 27 at 2:00




@MichaelFrank Typo or a coding fluke made by an automated system generating a robots.txt on demand.
– JakeGould
Nov 27 at 2:00










1 Answer
1






active

oldest

votes

















up vote
1
down vote



accepted










This seems like a mistake:



Disallow: //


The thing is that the robots.txt spec—as outlined here—clearly states:




Note also that globbing and regular expression are not supported in either the User-agent or Disallow lines. The '*' in the User-agent field is a special value meaning "any robot". Specifically, you cannot have lines like "User-agent: bot", "Disallow: /tmp/*" or "Disallow: *.gif".




But some people claim that is not the case such as this site that states that Google can handle pattern matching:




Pattern matching: At this time, pattern matching appears to be usable by the three majors: Google, Yahoo, and Live Search. The value of pattern matching is considerable. Let’s look first at the most basic of pattern matching, using the asterisk wildcard character.




But regardless of that, the // means a literal directory of a directory with no name attached to that directory since there is no wildcard (*) globbing or anything there. And // just seems odd.



My guess is it’s a mistake of some sort. Yes, an IBM webmaster can make mistakes! But I would also guess that the robots.txt is automatically generated by some system and somehow a path such as /*/ was converted to // when the robots.txt was automatically generated by the system.






share|improve this answer





















  • Either that, or the entry is there specifically to prevent mistake URLs with a redundant slash from being indexed.
    – grawity
    Nov 27 at 5:51










  • @grawity Fair enough but I am not too sure what the benefit would be to have a URL that is example.com//thing as some odd method of obscuring data from crawlers.
    – JakeGould
    Nov 27 at 16:28











Your Answer








StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "3"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});














draft saved

draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fsuperuser.com%2fquestions%2f1378614%2fwhat-does-double-slash-directory-mean-in-robots-txt%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown

























1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes








up vote
1
down vote



accepted










This seems like a mistake:



Disallow: //


The thing is that the robots.txt spec—as outlined here—clearly states:




Note also that globbing and regular expression are not supported in either the User-agent or Disallow lines. The '*' in the User-agent field is a special value meaning "any robot". Specifically, you cannot have lines like "User-agent: bot", "Disallow: /tmp/*" or "Disallow: *.gif".




But some people claim that is not the case such as this site that states that Google can handle pattern matching:




Pattern matching: At this time, pattern matching appears to be usable by the three majors: Google, Yahoo, and Live Search. The value of pattern matching is considerable. Let’s look first at the most basic of pattern matching, using the asterisk wildcard character.




But regardless of that, the // means a literal directory of a directory with no name attached to that directory since there is no wildcard (*) globbing or anything there. And // just seems odd.



My guess is it’s a mistake of some sort. Yes, an IBM webmaster can make mistakes! But I would also guess that the robots.txt is automatically generated by some system and somehow a path such as /*/ was converted to // when the robots.txt was automatically generated by the system.






share|improve this answer





















  • Either that, or the entry is there specifically to prevent mistake URLs with a redundant slash from being indexed.
    – grawity
    Nov 27 at 5:51










  • @grawity Fair enough but I am not too sure what the benefit would be to have a URL that is example.com//thing as some odd method of obscuring data from crawlers.
    – JakeGould
    Nov 27 at 16:28















up vote
1
down vote



accepted










This seems like a mistake:



Disallow: //


The thing is that the robots.txt spec—as outlined here—clearly states:




Note also that globbing and regular expression are not supported in either the User-agent or Disallow lines. The '*' in the User-agent field is a special value meaning "any robot". Specifically, you cannot have lines like "User-agent: bot", "Disallow: /tmp/*" or "Disallow: *.gif".




But some people claim that is not the case such as this site that states that Google can handle pattern matching:




Pattern matching: At this time, pattern matching appears to be usable by the three majors: Google, Yahoo, and Live Search. The value of pattern matching is considerable. Let’s look first at the most basic of pattern matching, using the asterisk wildcard character.




But regardless of that, the // means a literal directory of a directory with no name attached to that directory since there is no wildcard (*) globbing or anything there. And // just seems odd.



My guess is it’s a mistake of some sort. Yes, an IBM webmaster can make mistakes! But I would also guess that the robots.txt is automatically generated by some system and somehow a path such as /*/ was converted to // when the robots.txt was automatically generated by the system.






share|improve this answer





















  • Either that, or the entry is there specifically to prevent mistake URLs with a redundant slash from being indexed.
    – grawity
    Nov 27 at 5:51










  • @grawity Fair enough but I am not too sure what the benefit would be to have a URL that is example.com//thing as some odd method of obscuring data from crawlers.
    – JakeGould
    Nov 27 at 16:28













up vote
1
down vote



accepted







up vote
1
down vote



accepted






This seems like a mistake:



Disallow: //


The thing is that the robots.txt spec—as outlined here—clearly states:




Note also that globbing and regular expression are not supported in either the User-agent or Disallow lines. The '*' in the User-agent field is a special value meaning "any robot". Specifically, you cannot have lines like "User-agent: bot", "Disallow: /tmp/*" or "Disallow: *.gif".




But some people claim that is not the case such as this site that states that Google can handle pattern matching:




Pattern matching: At this time, pattern matching appears to be usable by the three majors: Google, Yahoo, and Live Search. The value of pattern matching is considerable. Let’s look first at the most basic of pattern matching, using the asterisk wildcard character.




But regardless of that, the // means a literal directory of a directory with no name attached to that directory since there is no wildcard (*) globbing or anything there. And // just seems odd.



My guess is it’s a mistake of some sort. Yes, an IBM webmaster can make mistakes! But I would also guess that the robots.txt is automatically generated by some system and somehow a path such as /*/ was converted to // when the robots.txt was automatically generated by the system.






share|improve this answer












This seems like a mistake:



Disallow: //


The thing is that the robots.txt spec—as outlined here—clearly states:




Note also that globbing and regular expression are not supported in either the User-agent or Disallow lines. The '*' in the User-agent field is a special value meaning "any robot". Specifically, you cannot have lines like "User-agent: bot", "Disallow: /tmp/*" or "Disallow: *.gif".




But some people claim that is not the case such as this site that states that Google can handle pattern matching:




Pattern matching: At this time, pattern matching appears to be usable by the three majors: Google, Yahoo, and Live Search. The value of pattern matching is considerable. Let’s look first at the most basic of pattern matching, using the asterisk wildcard character.




But regardless of that, the // means a literal directory of a directory with no name attached to that directory since there is no wildcard (*) globbing or anything there. And // just seems odd.



My guess is it’s a mistake of some sort. Yes, an IBM webmaster can make mistakes! But I would also guess that the robots.txt is automatically generated by some system and somehow a path such as /*/ was converted to // when the robots.txt was automatically generated by the system.







share|improve this answer












share|improve this answer



share|improve this answer










answered Nov 27 at 1:58









JakeGould

30.8k1093136




30.8k1093136












  • Either that, or the entry is there specifically to prevent mistake URLs with a redundant slash from being indexed.
    – grawity
    Nov 27 at 5:51










  • @grawity Fair enough but I am not too sure what the benefit would be to have a URL that is example.com//thing as some odd method of obscuring data from crawlers.
    – JakeGould
    Nov 27 at 16:28


















  • Either that, or the entry is there specifically to prevent mistake URLs with a redundant slash from being indexed.
    – grawity
    Nov 27 at 5:51










  • @grawity Fair enough but I am not too sure what the benefit would be to have a URL that is example.com//thing as some odd method of obscuring data from crawlers.
    – JakeGould
    Nov 27 at 16:28
















Either that, or the entry is there specifically to prevent mistake URLs with a redundant slash from being indexed.
– grawity
Nov 27 at 5:51




Either that, or the entry is there specifically to prevent mistake URLs with a redundant slash from being indexed.
– grawity
Nov 27 at 5:51












@grawity Fair enough but I am not too sure what the benefit would be to have a URL that is example.com//thing as some odd method of obscuring data from crawlers.
– JakeGould
Nov 27 at 16:28




@grawity Fair enough but I am not too sure what the benefit would be to have a URL that is example.com//thing as some odd method of obscuring data from crawlers.
– JakeGould
Nov 27 at 16:28


















draft saved

draft discarded




















































Thanks for contributing an answer to Super User!


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.





Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


Please pay close attention to the following guidance:


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fsuperuser.com%2fquestions%2f1378614%2fwhat-does-double-slash-directory-mean-in-robots-txt%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Probability when a professor distributes a quiz and homework assignment to a class of n students.

Aardman Animations

Are they similar matrix