Hiding yourself from Search Engines…
It’s not often you hear someone saying that they want to hide a website which they are working on from Google completely, but that was the scenario which faced me earlier this week. I have a client who runs a couple of websites, each for a specific service his company offers.
While he works on the website content we generally don’t want it indexed, because that could potentially lead search engines to index incomplete, or incorrect information and we have no guarantee how quickly it would be updated once the site was complete. In the past to avoid this I have set up a new website on a different domain name, for instance development.domain.com which allows him to use the site normally, but made it unlikely (but not impossible) for a search engine spider to find it.
The Move To WordPress
However, I want to move him over to using WordPress installations, because they are easier to maintain and with the new custom taxonomies and post types in v3 WordPress really comes of age as a CMS (content management system).
I have set him up a WordPress install for him to work on his next site – but WordPress isn’t good at changing the URL at which it is hosted… because when you upload images into either posts or pages it inserts them with a full reference to the URL which the site is currently at.
So a picture in a page uploaded during development to development.domain.com will stop working when you change the site over to www.domain.com – because it no longer exists.
WordPress touch on this in the final paragraph of their advice on changing URL’s,but not really any more than to say it needs some thought.
There are several plug ins which claim to go through the database and fix this for you – but I can’t afford to have the first Word Press install go anything less than swimmingly for this chap, I don’t want him to lose confidence in it… he’s not technical (thats what he pays me for!).
My solution is therefore temporary in it’s nature. I have installed his development WordPress site in the full URL which it will run at when it goes live – but to prevent it being indexed I have created a robots.txt file which instructs search engines not to index the site:
User-agent: * Disallow: /
I am hoping this is temporary as it’s not the neatest fix in the world – but it will work! Anyone else have any other suggestions?