7:29 PM on Jun. 19, 2008
How does a search engine select the pages to show for a given query? How a specific query is processed? How does a search engine find the pages online?
This blog briefly explains how a search engine works
1
. DiscoverySearch engines use automated programs (called spiders or bots) that explore the web, jumping from one page to the other following the links they found.
2.
IndexWhen a page is found, or a known page is re-visited, its content it’s saved in the search engine database, so it can be accessed faster in the future.
3.
Returning ResultsWhen a query is sent to the search engine (i.e. when a user hit the “search” button in the search engine homepage), the matching pages are selected and ranked with a specific algorithm (every search engine has it’s super-secret ranking algorithm), and the pages are returned to the user ordered by descending importance.
Ranking CriterionThere are enormous differences in the ranking algorithms used by the search engines, but all of them are based on relevance and popularity.
This are terms from the Information Retrieval, of which search engines are one of the most visible application.
Basically higher relevance means that the document is more focused on the given search term, and higher popularity means that the document is more cited from other sources.
In terms of search engines, relevance is evaluated analyzing the page textual content the pages that provide inbound links reading the anchor text used to link to the document reading the text surrounding the link evaluating the linking pages This means, for example, that a page can rank well for a phrase or keyword even if that phrase never appear in that page. (One famous case is Bush bio ranking #1 for miserable failure on google… this is the result of a massive use of “miserable failure” as anchor text for www.whitehouse.gov/president/). Popularity is evaluated counting the number of links to the given page (more links means more popularity)
Given this two main criterions, each search engine adds its own interpretations, for example giving more weight to some “trusted” sites (.edu and .gov domains and sites with higher popularity are considered more trusted), or giving different weights to each element (page title, body, heading tags…)
The obvious consequence is that if you want to get higher rankings you have to - allow search engines to find your site
- make easy for the spiders to understand the structure of the pages
- increase your relevance
- increase your popularity
We’ll talk about how to do this in the next blog.