The search engine
Most of poeople use google as a search engine only, google search is certainly the most used and known of google's services across the world.
Most of poeople use google as a search engine only, google search is certainly the most used and known of google's services across the world.
Index size
At its start in 1998, Google claimed to index 25,000,000 web pages. By June 2005, this number had grown to 8,058,044,651 web pages, as well as 1,187,630,000 images, 1 billion Usenet messages, 6,600 print catalogs, and 4,500 news sources.
Web:
January 1998: 25,000,000
August 2000: 1,060,000,000
January 2002: 2,073,000,000
February 2003: 3,083,000,000
September 2004: 4,285,000,000
November 2004: 8,058,044,651
June 2005: 8,058,044,651
February 2006: 20,000,000,000
May 2006: 25,270,000,000
Images:
May 2006: 15,400,000
Physical structure
Google employs data centers full of low-cost commodity computers running a custom Red Hat Linux in several locations around the world to respond to search requests and to index the web. The server farms in the data centers are built using a shared nothing architecture. The indexing is performed by a program named Googlebot, which periodically requests new copies of web pages it already knows about. The more often a page updates, the more often Googlebot will visit. The links in these pages are examined to discover new pages to be added to its internal database of the web. This index database and web page cache is several terabytes in size. Google has developed its own file system called Google File System for storing all this data.
PageRank
Google uses an algorithm called PageRank to rank web pages that match a given search string. The PageRank algorithm computes a recursive figure of merit for web pages, based on the weighted sum of the PageRanks of the pages linking to them. The PageRank thus derives from human-generated links, and correlates well with human concepts of importance. Previous keyword-based methods of ranking search results, used by many search engines that were once more popular than Google, would rank pages by how often the search terms occurred in the page, or how strongly associated the search terms were within each resulting page. In addition to PageRank, Google also uses other secret criteria for determining the ranking of pages on result lists.
Google not only indexes and caches HTML files but also 13 other file types, which include PDF, Word documents, Excel spreadsheets, Flash SWF, plain text files, among others. Except in the case of text and SWF files, the cached version is a conversion to HTML, allowing those without the corresponding viewer application to read the file.
Users can customize the search engine somewhat. They can set a default language, use "SafeSearch" filtering technology (which is on 'moderate' setting by default), and set the number of results shown on each page. Google has been criticized for placing long-term cookies on users' machines to store these preferences, a tactic which also enables them to track a user's search terms over time. For any query (of which only the 32 first keywords are taken into account), up to the first 1000 results can be shown with a maximum of 100 displayed per page.
Despite its immense index, there is also a considerable amount of data in databases, which are accessible from websites by means of queries, but not by links. This so-called deep web is minimally covered by Google and contains, for example, catalogs of libraries, official legislative documents of governments, phone books, and more.
Google optimization
Since Google is the most popular search engine, many webmasters have become eager to influence their website's Google rankings. An industry of consultants has arisen to help websites raise their rankings on Google and on other search engines. This field, called search engine optimization, attempts to discern patterns in search engine listings, and then develop a methodology for improving rankings.
One of Google's chief challenges is that as its algorithms and results have gained the trust of web users, the profit to be gained by a commercial website in subverting those results has increased dramatically. Some search engine optimization firms have attempted to inflate specific Google rankings by various artifices, and thereby draw more searchers to their client's sites. Google has managed to weaken some of these attempts by reducing the ranking of sites known to use them.
Search engine optimization encompasses both "on page" factors (like body copy, title tags, H1 heading tags and image alt attributes) and "off page" factors (like anchor text and PageRank). The general idea is to affect Google's relevance algorithm by incorporating the keywords being targeted in various places "on page," in particular the title tag and the body copy (note: the higher up in the page, the better its keyword prominence and thus the ranking). Too many occurrences of the keyword, however, cause the page to look suspect to Google's spam checking algorithms.
One "off page" technique that works particularly well is Google bombing in which websites link to another site using a particular phrase in the anchor text, in order to give the site a high ranking when the word is searched for.
Google publishes a set of guidelines for a website's owners who would like to raise their rankings when using legitimate optimization consultants.
Uses of Google
A corollary use of Google -- and other internet search engines - is that it can help translators to determine the most common way of expressing ideas in the English language (and other languages). This is generally done by doing a 'count' of different variants, thereby establishing which expression is more common. While this approach requires careful judgement, it does improve the ability of non-native translators to use more idiomatically correct English expressions.
At its start in 1998, Google claimed to index 25,000,000 web pages. By June 2005, this number had grown to 8,058,044,651 web pages, as well as 1,187,630,000 images, 1 billion Usenet messages, 6,600 print catalogs, and 4,500 news sources.
Web:
January 1998: 25,000,000
August 2000: 1,060,000,000
January 2002: 2,073,000,000
February 2003: 3,083,000,000
September 2004: 4,285,000,000
November 2004: 8,058,044,651
June 2005: 8,058,044,651
February 2006: 20,000,000,000
May 2006: 25,270,000,000
Images:
May 2006: 15,400,000
Physical structure
Google employs data centers full of low-cost commodity computers running a custom Red Hat Linux in several locations around the world to respond to search requests and to index the web. The server farms in the data centers are built using a shared nothing architecture. The indexing is performed by a program named Googlebot, which periodically requests new copies of web pages it already knows about. The more often a page updates, the more often Googlebot will visit. The links in these pages are examined to discover new pages to be added to its internal database of the web. This index database and web page cache is several terabytes in size. Google has developed its own file system called Google File System for storing all this data.
PageRank
Google uses an algorithm called PageRank to rank web pages that match a given search string. The PageRank algorithm computes a recursive figure of merit for web pages, based on the weighted sum of the PageRanks of the pages linking to them. The PageRank thus derives from human-generated links, and correlates well with human concepts of importance. Previous keyword-based methods of ranking search results, used by many search engines that were once more popular than Google, would rank pages by how often the search terms occurred in the page, or how strongly associated the search terms were within each resulting page. In addition to PageRank, Google also uses other secret criteria for determining the ranking of pages on result lists.
Google not only indexes and caches HTML files but also 13 other file types, which include PDF, Word documents, Excel spreadsheets, Flash SWF, plain text files, among others. Except in the case of text and SWF files, the cached version is a conversion to HTML, allowing those without the corresponding viewer application to read the file.
Users can customize the search engine somewhat. They can set a default language, use "SafeSearch" filtering technology (which is on 'moderate' setting by default), and set the number of results shown on each page. Google has been criticized for placing long-term cookies on users' machines to store these preferences, a tactic which also enables them to track a user's search terms over time. For any query (of which only the 32 first keywords are taken into account), up to the first 1000 results can be shown with a maximum of 100 displayed per page.
Despite its immense index, there is also a considerable amount of data in databases, which are accessible from websites by means of queries, but not by links. This so-called deep web is minimally covered by Google and contains, for example, catalogs of libraries, official legislative documents of governments, phone books, and more.
Google optimization
Since Google is the most popular search engine, many webmasters have become eager to influence their website's Google rankings. An industry of consultants has arisen to help websites raise their rankings on Google and on other search engines. This field, called search engine optimization, attempts to discern patterns in search engine listings, and then develop a methodology for improving rankings.
One of Google's chief challenges is that as its algorithms and results have gained the trust of web users, the profit to be gained by a commercial website in subverting those results has increased dramatically. Some search engine optimization firms have attempted to inflate specific Google rankings by various artifices, and thereby draw more searchers to their client's sites. Google has managed to weaken some of these attempts by reducing the ranking of sites known to use them.
Search engine optimization encompasses both "on page" factors (like body copy, title tags, H1 heading tags and image alt attributes) and "off page" factors (like anchor text and PageRank). The general idea is to affect Google's relevance algorithm by incorporating the keywords being targeted in various places "on page," in particular the title tag and the body copy (note: the higher up in the page, the better its keyword prominence and thus the ranking). Too many occurrences of the keyword, however, cause the page to look suspect to Google's spam checking algorithms.
One "off page" technique that works particularly well is Google bombing in which websites link to another site using a particular phrase in the anchor text, in order to give the site a high ranking when the word is searched for.
Google publishes a set of guidelines for a website's owners who would like to raise their rankings when using legitimate optimization consultants.
Uses of Google
A corollary use of Google -- and other internet search engines - is that it can help translators to determine the most common way of expressing ideas in the English language (and other languages). This is generally done by doing a 'count' of different variants, thereby establishing which expression is more common. While this approach requires careful judgement, it does improve the ability of non-native translators to use more idiomatically correct English expressions.
From Wikipedia, the free encyclopedia
Links:
0 commentaires:
Post a Comment