One of the ways to let search engines know which folders and/or files on a website to avoid from indexing is by using a robots meta tag. However, not every user agent (search engines) can read meta tags, hence robots meta tag can just go unperceived. A more effective way of telling search engines about which folders and file to avoid on a site is by using robots.txt file.
Therefore, what is robots.txt file?
A robots.txt file is a text file (not html) site owners can put on their sites to let search robots know which page they cannot visit during indexing. Though a robots.txt file isn’t compulsory for user agents (search engines) but in general, search engines do comply with what they ask them not to do. However, it is necessary to know that this tool is by no means a way of stopping user agents from indexing or crawling your website; i.e. a robots.txt file is not a password protection or a firewall, but putting a robots.txt on your site is just like attaching a warning note “Please don’t come in” on a door that is not locked. For instance, a house owner cannot stop thieves from entering, but the clever ones won’t be at risk. So if you have sensitive data or other information that you don’t want to make public, it is better you don’t depend on this tool to keep them from indexing by search engines.
Where you put robots.txt file on your site is essential. It should be within the main directory if not it won’t be found by search engines. Search engines don’t dig the entire site for robots.txt file, rather they first search main directory, if it is not there, they assume the site doesn’t contain protecting any file, hence everything they find on the site will be indexed. Therefore, it is important you place robots.txt file in proper location on your site so search engines will not be able to index the sensitive files on your site.