SpiceSEO

Seo Robots.txt

Q&A: SEO: Robot.txt file: In the user-agent command, what are the names of spefic bots I can specify?

by on Sep.05, 2010, under Seo Robots.txt

Question by b1911: SEO: Robot.txt file: In the user-agent command, what are the names of spefic bots I can specify?
for example, in the user-agent command , i can specify google by type in Googlebot, specify MSN by typing in msnbot

Is there a list for all the search engines? what would it be for yahoo? altavista? etc etc

Best answer:

Answer by strayinma
You can find an up to date list here:

http://www.robotstxt.org/wc/active/html/contact.html

But, mostly you will encounter these:

Search Engine: User-Agent
AltaVista: Scooter
Infoseek: Infoseek
Hotbot: Slurp
AOL: Slurp
Excite: ArchitextSpider
Google: Googlebot
Goto: Slurp:
Lycos: Lycos
MSN: Slurp
Netscape: Googlebot
NorthernLight: Gulliver
WebCrawler: ArchitextSpider
Iwon: Slurp
Fast: Fast
DirectHit: Grabber
Looksmart Web Pages: Slurp

Know better? Leave your own answer in the comments!

1 Comment :, , , , , , , more...

Using Robots.txt to Increase Your Website’s Indexability

by on Aug.31, 2010, under Seo Robots.txt

Using Robots.txt to Increase Your Website’s Indexability

What is website indexability? In a nutshell, indexability means the efficiency in how Internet search engine robots, spiders, crawlers, worms, and/or ants are able to read the web pages for a website and determine their rank in the list of search results that they return to a user. One way to do this is with the addition of a robots.txt file. Add the file to the root of the website to instruct an SEO (Search Engine Optimization) robot which pages to index. This will make sure that only the most relevant pages are indexed.

Not all SEO robots will read a robots.txt file. Unfortunately, most malware and malicious crawlers will blow right by it because their intent is malicious and they don’t care if they’re supposed to only access certain pages or not. The main purpose of the robots.txt file is to tell friendly SEO crawlers which pages they should ignore while indexing the site. This is helpful in the case of an infinite domain space. An example of an infinite domain namespace might be one where users upload files into an online document repository. These documents are considered to be media and not content so the webmaster should add a line to the robots.txt file to disallow access to the root URL for this document repository. The more results returned from a site with the same URL will often degrade the result ranking of a page.

Test pages are another example of parts of a website that should be passed over. Also any content not meant for visitors. This could include web pages that have been added to the site in order to calibrate the appearance of the pages but are not ready to be shared. Pages that can’t be accessed directly by a user conducting a search undermines the credibility of the results and will thus begin to degrade the page ranking.

In order to increase website indexability, the robots.txt file can also be used to provide instructions to specific SEO robots. Different algorithms are used by the search engines to index web pages. For example, one set of pages may need to be restricted for one search engine while allowing access to another. Lines can be added to the robots.txt file that lists out specific instructions by including the actual name of the search agent such as Google, Yahoo!, and Bing. This could actually be very important in determining the rank of your pages for specific keywords, depending on the rules and methods used by the search agent.

The purpose of a robots.txt file is to give important information to the robots & crawlers on how the page should be indexed. The main purpose of the file is to keep the lesser pages from being indexed to the more pertinent pages are. Another important function is to communicate instructions to specific robots on how to proceed while indexing the website. This insures that the most important pages are indexed, which will hopefully increase page rankings.

SEO White Hats is an SEO blog committed to delivering white hat advice on content optimization, SEO robots & crawlers, keyword research, and so much more. For great SEO techniques, free articles, and product reviews, visit SEO White Hats.

Leave a Comment :, , , , more...

The Easy Guide to Making a Robots.txt File

by on Aug.30, 2010, under Seo Robots.txt

The Easy Guide to Making a Robots.txt File

If you have a website you really need to have a robots.txt file. It gives search engine spiders specific commands and it is easy to use and easy to maintain. Here is an easy guide to a robots.txt file in five minutes.

There are times when you don’t want a search engine to index a page or a folder on your website. Maybe you have some information you just don’t want to have show up in google. This may include your statistics page, a page of notes, or a dynamic page. And, importantly, if you use google adsense and the search tool that displays search results on your website google mandates you exclude this page from search engines. Which means they mandate you having a robots.txt file.

A robots.txt file is a simple document named robots.txt and saved in the root folder of your website. Search engines see this and follow any commands it contains. Create a simple text document using any word processor program like notepad and put these two lines it:

User-agent: *

Disallow:

The first line tells all spiders to listen up because the following command is for you. The second line means do not index any of the following pages. And it is here you put the url of any pages you don’t want spidered. So if you wanted the spiders to skip your private page it looks like this:

Disallow:/privatepage.htm

If you want the spiders to skip a whole folder you put the url of that folder with a slash like this:

Disallow:/privatefolder/

Simply place this text file in the root folder of your website and you are done. In the future you can add and remove commands easily.

The robots.txt file is a very easy file to write and maintain and it is a very powerful tool that will help you interact successfully with search engines. This disallow command is the simplest and most used command but there are also many other commands you can use and if you have a website it is well worth your time to have a robots.txt file and even to research it a bit further.

For more interesting insights into being a creative webmaster and making your website work for you visit the authors site at: The Creative Webmaster – Forging the Iron of Creativity on the Anvil of a Website

For more practical advice on how to earn money with your small website visit the authors tutorial website at: Earning Money with your small website

Leave a Comment :, , , , more...

Q&A: SEO question about the robots.txt file?

by on Aug.29, 2010, under Seo Robots.txt

Question by parlanchina: SEO question about the robots.txt file?
Hi, I made the silliest SEO mistake I have ever done. I disabled spiders to come to my website by doing:

User-agent: *
Disallow: /

But I had to do

User-agent: *
Disallow:

I made this mistake last Thursday and discovered it on Sunday. But for these few days the organic ranking of my website dropped significantly.

My question is, how long will it take for Google to crawl back my pages and most importantly to get the same search positions I had before?

Thanks!

Best answer:

Answer by Bhanu P
Hi,

i am sorry for what had happened from your mistake..

any how there are some updates in google algorithm last few weeks so that might be reason for the dropping your sites organic ranking, to get back to raise again you can do some link building and summit your site to search engines ans they will again crawl your site..

Add your own answer in the comments!

5 Comments :, , , more...

Robots.txt Recruiter: Daily Mail Uses Robots.txt File To Find SEO

by on Aug.29, 2010, under Seo Robots.txt

Robots.txt Recruiter: Daily Mail Uses Robots.txt File To Find SEO
# August 12th, MailOnline are looking for a talented SEO Manager so if you found this then you’re the kind of techie we need! # Send your CV to holly dot ward at mailonline dot co dot uk
Read more on Search Engine Land

British Newspaper Daily Mail Plants Job Advert In Robots.txt File
Perhaps unsurprising for a newspaper that probably has more SEO staff than, well, actual journalists, the UK’s Daily Mail is hiring a new Search Engine Optimization manager. Interestingly, however, the job advert itself in fact appears in the newspaper’s website robots.txt file, which isn’t usually designed to be read by humans but is targeted at search engines bots to tell them what content …
Read more on TechCrunch

Leave a Comment :, , , , , , more...

グーグル ウェブマスターツール robots.txt作成機能

by on Aug.27, 2010, under Seo Robots.txt

Some cool seo robots.txt images:

グーグル ウェブマスターツール robots.txt作成機能
seo robots.txt

Image by suzukik
⇒ グーグル ウェブマスターツールにrobots.txt作成機能が追加 – 海外SEO情報ブログ

No Love From Google’s Spiders
seo robots.txt

Image by HubSpot
Getting No Love From Google?

Learn more about how you can optimize your site to rank higher in search engines so you get found by more qualified prospects.

Download our search engine optimization kit.
www.hubspot.com/search-engine-optimization-kit/

Leave a Comment :, , more...

Importance of the Robots.txt File

by on Aug.08, 2010, under Seo Robots.txt

Importance of the Robots.txt File

Despite the importance of the Robots.txt file in getting your website indexed with the major search engines, many webmasters don’t offer one on their site. What is the robots.txt file you ask? If you don’t know, you are far from alone. The robots.txt file is a simple text file (no html) that is placed in your website’s root directory in order to tell the search engines which pages to index and which to skip.

When a search engine sends its webcrawler to your site, one of the first things the webcrawler will do is search the root directory for the robots.txt file. A correctly formated robots.txt file will consist of several records, each providing instructions for a particular search-bot. A record will generally consist of two components, the first is called the user-agent and is where the name of the search-bot is listed. The second line consits of one or more “disallow” lines. These lines tell the webcrawler which files or folders should not be indexed (ie a cgi-bin folder).

If you currently have a website and do not have a robots.txt file, you can create one easily. As mentioned earlier, the files are plain text, so just open up notepad and save the file at robots.txt. Most webmasters can use one record that will apply to all of the search engine crawlers. Once you have opened notepad enter the following:

User-agent: *
Disallow:

The “*” applies this rule to all bots. In this example, there is nothing listed in the disallow line. This tells the robot to index the entire site. You can also enter a folder path here such as “/private” if there is a folder that shouldn’t be indexed. This can be very useful if you are still testing a portion of your website or is a section is still under construction.

Now that you know what should go into your robots.txt file, there are several common mistakes people make when creating these files. Never enter notes or comments into the file as these items can cause confusion for the webcrawler. Also, the format should always be the user-agent on the first line, followed by the disallow(s). Do not reverse the order. Another common mistake made involves using the incorrect case. If the disallowed folder is /private, make sure your robots.txt file does not list the folder as /Private. It seems like a very minor issue, but it will cause problems if done incorrectly. Finally, there is no Allow command. You cannot tell the webcrawler what to look at, only what not to look at.

If you are still curious about the robots.txt file you can find many more complex examples online. Just try one of your favorite websites and look for their robots.txt file. For example you can go to http://www.cnn.com/robots.txt. If you need help creating a robots.txt file for your site, there are plenty of places online that will create the file for you for free. One example is http://www.seochat.com/seo-tools/robots-generator/. Despite its apparently simplicity, this file can make or break your site’s chances with the search engines. Make sure you have your robots.txt file in place and correctly formatted today.

Justin Scarborough is founder of the Affiliate Marketing Linx internet marketing directory . His goal with this website is to create a very selective, human-edited directory that will help others find quality links and information relating to affiliate and internet marketing.

Leave a Comment :, , more...

Online Marketing Quick Tip #1 – Search Engine Optimization – Robots.txt files

by on Aug.05, 2010, under Seo Robots.txt

The first installment of weekly online marketing quick tips hosted by Mike Rynchek of Spyder Trap Online Marketing. To learn more about Online Marketing Quick Tips, or about Spyder Trap Online Marketing, visit www.spydertrap.com

Chris reviewing creating a robots.txt file with Lytico.com
Video Rating: 0 / 5

2 Comments :, , , , , , , more...

Should I block duplicate pages using robots.txt?

by on Jul.31, 2010, under Seo Robots.txt

Halfdeck from Davis, CA asks: “If Google crawls 1000 pages/day, Googlebot crawling many dupe content pages may slow down indexing of a large site. In that scenario, do you recommend blocking dupes using robots.txt or is using META ROBOTS NOINDEX,NOFOLLOW a better alternative?” Short answer: No, don’t block them using robots.txt. Learn more about duplicate content here: www.google.com
Video Rating: 4 / 5

19 Comments :, , , , , more...

Looking for something?

Use the form below to search the site:

Still not finding what you're looking for? Drop a comment on a post or contact us so we can take care of it!

Archives

All entries, chronologically...