Understanding Crawling and Indexing

Many people understand how Google and Yahoo operate. It’s fairly easy to see what happens—you type in a keyword phrase and the most “relevant” results are returned to you. However, understanding how crawling and indexing work is a bit more complex. Let’s break the process down to simple terms, as this will help SEO clients to understand the necessity of SEO marketing.

First, understand that search sites employ applications known as crawlers and indexers to help them “snapshot” the Internet at any given time. The spider, or crawler, application is a program that “crawls” the Internet, searching new web pages by the links that webmasters insert into their existing pages. If a spider does not crawl a new web page, that page might as well not exist!

Spiders crawl through the entire Internet, though they do not make it a daily or even weekly habit of searching old web pages and new links. So you might have to be patient in waiting for these crawlers to discover your new content! The good news is that these spiders are hungry for new content, because today’s Internet users are always typing in strange keyword phrases just hoping to find new and exciting content.

The index application is another program that is used by search companies, and this is the program that actually helps in creating the search results you rely on every day. The indexer program is responsible for collecting and storing all new data so that it can be easily retrieved for the benefit of users. The search company keeps this index of content, which will often return to and reference when new query searches are made. As new pages are discovered with the spider application, the index is updated. According to Wikipedia, search companies may use automated indexing or even agent-based indexing, the latter of which is done in “real time.” However, meta search engines do not actually store a local index (like “cache” search sites), but instead use the technology of other search sites to create their own results page.

There is only one more missing factor here and that is what determines whether the indexed content is really top 10 worthy? This is the process performed by a search site’s “algorithmic process.” The algorithm method (one unique to each search site) determines which sites are the most relevant, how often to crawl them, and how many pages are necessary for indexing. There are approximately 200 different factors considered by search giants like Google, so don’t underestimate the complexity of this process!

Once you understand how these processes work, you will be in a better position to study your SEO campaign and plan a new direction for your web marketing.