Google Spider And PageRank
This article deals with off-page optimization, which means, what we can do outside our webpage that will improve it.
It is amazing how much the Internet borrows for the living world. Take the web for example. When we talk about it, we are so used to the web of the Internet that our mind does not stop once to think about the spider web in the garden. And yet, it is from that garden spider web that it derived its name. And like the spider web in the garden, the cyberspace web also consists of interconnecting threads, or links, holding one web site to another. The only difference is size: the web of the Internet consists of billions and billions of links. And it’s nowhere as neat as its cousin in the garden.
To make sense of this massive entangled heap of threads, we use search engines such as Google to find what we want. We go to the Google search engine, we type in what we want to find, and it serves us a list of results. Have you questioned how Google does it?
Google uses a program that browses through the World Wide Web to find, index and compile the content. Just as a spider crawls through the web in the garden, the program used to crawl through the World Wide Web is also called a “spider”. A spider program, also called a web crawler and a web spider, is used by all the Search Engines to look for content and compile them.
If you want your website or webpage to come up when someone uses the Search Engine, you want to make sure the web spider has crawled your website. To achieve that, there has to be a link from some other website to your website. Websites that are standing alone never appears in the search rankings. That’s not want you want. You want people to find your website, and to find it repeatedly.
On the Internet today, there are millions and millions of websites. Some are linked to another while others are alone. A link is achieved when you link to somebody, when somebody links to you, or when you link both ways. This linkage, called hyperlink, is a very important component of the Internet, for it allows the spider to reach your website or webpage, and ultimately provide a ranking to it. Websites that don’t link to anybody, and nobody links to, are standing alone, and are virtually ignored by the spider. Quite simply, they are out of reach.
What is a web spider? It is a program or automated script that browses through the World Wide Web in a methodical, automated manner. The process of browsing through the pages is called web crawling or web spidering.
All the major search engines such as Google, Yahoo, MSN, etc. have their own web spiders. Google’s spider is called Googlebot. There are two types of Googlebot, in fact, called deepbot and freshbot. The deepbot is a spider that tries to follow every link on your webpage. It brings the information back to the Google indexers to analyze and index. The freshbot is a spider that crawls through the web looking for new content, and may visit your website frequently.
In order to determine the importance of every website on the Internet, Google devised a ranking system, called PageRank. The name PageRank is a patented trademark of Google – the patent went to Stanford University, where the founders of Google, Larry Page and Sergey Brin, developed it. PageRank ensures that the most important websites are duly accorded its place on the Internet. Every page of every website on the web is assigned a PageRank from 0 to 10, with 0 being the least important, and 10 being the most. Every new website starts at 0, and tries to work its way up.
How do the spiders work? I am simplifying things, but basically, the spider starts from the websites with the highest ranking, say PageRank 10, and work its way down towards PageRank 0. Pages with PageRank 10 gets enormous attention from the spiders – the freshbot spider might visit it many, many times in an hour. On the other hand, PageRank 0 sites might not get any attention at all. What the spiders do is, they read through a page, starting from the top left and ending at the bottom right. If they encounter a hyperlink, they’ll follow that link to the next page, and start reading there. What you want, is for the spider to follow a hyperlink from someone’s website to your website. It doesn’t help you if you place a link from your website to someone’s website – the link must come from somebody else to you.
When someone links to your website, that page that carries the link, gives a “ballot” to your webpage. The more vote you collect, the higher your PageRank. And votes are not equal. A webpage that has a high PageRank throws out a higher vote than pages with low PageRank. If you can get important pages to link to you, you earn their strong votes, elevating your webpage’s PageRank position. At the same time, you do not generously link to any websites, because you bleed away your PageRank in doing so. In short, you want incoming links from high PageRank webpages, and you do not want to give outgoing links to anybody.
This is a mistake that I made in the first few years of starting my website AsiaExplorers. I link to anybody and everybody who asks me to link to them. Now I am very careful who I link to.
It is enormously difficult to reach PageRank 10. In fact, the only website I can think of with a PageRank 10 ranking is the Google homepage itself. Yahoo, MSN, Dell and Apple Computer homepages all carry PageRank 9. These are some of the biggest players on the Internet. Most of the important websites have PageRank between 4 and 6. It is your goal therefore to reach that level. And mind you, it gets harder and harder to go from one PageRank to another. PageRank uses a scale similar to the Richter Scale.
A fast way to view the PageRank of any page on the web is to download the Google Toolbar, and the PageRank of the page you load will be displayed. Just use the Google search engine, and search for “Google Toolbar”.
Now that we have understood how Google rank webpages, the next big question is: how do we get important pages to link to us? That’s a good question, and I’ll address that in my next article.