Unix Formula - UNIX Pro

Get the latest news and get the most out of Unix.


Recent Stories...
Alfresco, MySQL and Red Hat Deliver First Open Source JSR-170 Benchmark

8 December 2006

OpenLogic Expert Community Successfully Brings Together Open Source Community and Enterprise Users

7 December 2006

Jive Software Launches IgniteRealtime.org Website in Response to Explosive Community Support for Its Open Source EIM Products

7 December 2006

CA Advances Virtual Platform Management With Newest Release of Unicenter ASM

5 December 2006

Vonage Selects EnterpriseDB

5 December 2006

Communications Platforms Trade Association Adds Three Members

5 December 2006

AccessIT's Christie/AIX Deployment Program Hits Average Monthly Screen Installs of 266

5 December 2006

Liberty Alliance Announces 'Open Source Identity for the Web 2.0 Era' Webcast November 29

5 December 2006

VA Software Reports First Quarter Fiscal 2007 Results

5 December 2006

Leading Telco Carriers Tout Versatility of Sun Microsystems' Java(TM) System Content Delivery Server

5 December 2006

XBRL Enhances Performance Management Applications and Offers Immediate External Benchmarking Business Benefits

5 December 2006

Azul Systems Gains Major Market Support for Its Approach to Delivering Business Critical Java Solutions

5 December 2006

DigitalFX International, Inc. Announces Q4 '06 and Full Year '06 Revenue Guidance, as Well as Initial 2007 Revenue and Margins Guidance

5 December 2006

$0 Net Cost Promotion With Sprint Activation for CDU-550 Mobile Broadband USB Modem Supporting Windows, Windows Mobile, Linux, Mac Systems

4 December 2006

Acronis Joins Red Hat ISV Partner Program; Becomes Red Hat Ready Partner

4 December 2006

Future Media Concepts Targets IT Professionals and Systems Administrators with New Mac OS X Training Options

2 December 2006

OSDL Mobile Linux Initiative Gains Another Heavy Hitter

2 December 2006

Trusted Computer Solutions' Cross Domain Linux Product Now Part of Accredited System at U.S. Coast Guard

29 November 2006

Xandros Desktop - Professional Spearheads New Line of Enterprise Linux Management Solutions

29 November 2006

AIX Group, Inc. Names Peter Soloway Vice President of Program Business Development

29 November 2006

Novell Launches First 'Desktop-to-Data Center' Management Solutions That Deliver on Interoperable, Cross-Platform Vision

29 November 2006

HPC4U Fault Tolerant Middleware - Open Source Version Released

29 November 2006

Unisys Predicts 2007 Open Source Trends: Architectural Approaches and Specialized Stacks Will Dominate

29 November 2006

MapGuide Open Source and Autodesk Mark Record Downloads in First Year

29 November 2006

Novell Launches First 'Desktop-to-Data Center' Management Solutions That Deliver on Interoperable, Cross-Platform Vision

29 November 2006

Report: IBM #1 in Global Server Revenue, Blades and UNIX in 3Q 2006

29 November 2006

University-Built Election System Raises Bar and Released Open Source

29 November 2006

Montilio and Open Source Systems to Provide Fastest File Servers Available on the Market

29 November 2006

IBM Open Source Application Server Gains Support of More Than 600 Partners in Six Months

29 November 2006

Report: IBM Open Source-Based Application Server Growing Nearly Three Times Faster Than JBoss

24 November 2006

Sun Open Sources Java Platform and Releases Source Code Under GPL License Via NetBeans and Java.net Communities

23 November 2006

IBM Introduces Linux and Grid Implementation Service Products

22 November 2006

Internet Systems Consortium Turns to Sun's Solaris(TM) 10 Operating System and Sun Fire(TM) x64 Servers to Power the Internet

18 November 2006

Novell Announces Amendment and Extension of Consent Solicitation

18 November 2006

Microsoft and Novell Announce Broad Collaboration on Windows and Linux Interoperability and Support

15 November 2006

Novell Releases Mono 1.2 With Enhanced Support for .NET on Linux

15 November 2006

The Sage Group plc Announces Global Partnership With MySQL AB

15 November 2006

Linux Networx Announces Performance Tuned Supercomputing

15 November 2006

XenSource Announces High Performance Virtualization of Microsoft Windows and Linux Based on the Xen(TM) Hypervisor

15 November 2006

vtiger Upgrades its Enterprise-class Open Source CRM Solution With New Features

15 November 2006


Archive
April 2007
February 2007
January 2007
December 2006
November 2006
October 2006
September 2006
August 2006
July 2006
June 2006
May 2006
April 2006
March 2006
February 2006
January 2006
December 2005
November 2005
October 2005
September 2005
August 2005
July 2005
June 2005
May 2005
April 2005
March 2005
February 2005
January 2005
December 2004
November 2004
October 2004
September 2004
August 2004


Unix Formula - UNIX Pro RSS Feed
RSS Feed

We support:

Linux Intrusion Detection System
Linux Gazette



 

Is Google Broken?

1 January 1970

Between Aug 04, 2003 and Aug 25, 2003 (just 21 days), Google added a little over 1.2 billion Web pages to their index. But since Aug 25, 2003 and today, Google hasn't added one single Web page to their index (at least according to Google they haven't).

Today, Google's home page states:

2004 Google - Searching 4,285,199,774 web pages

Now let's look at the history of Google's home page using "web.archieve.org" (they archive Web pages which you can review on-line as they were back when):

Aug 25, 2003

2003 Google - Searching 4,285,199,774 web pages

Now isn't this strange? The exact same number of Web page one year ago as it is today and next week will be the same. Let's go back and see when this number was different:

Aug 04, 2003

2003 Google - Searching 3,083,324,652 web pages

So what does this mean? It means either Google is lying to us all or they have been dropping as many pages as they have been adding them.

My guess is that in Aug 25, 2003 Google's index was full. Why do I say this? Because Google's white papers were freely available to anyone. This meant that you could access the actual documents publish by Google founders before Google became public and get a glimpse of how Google was created. According to these documents, Google was written in C and C++ using ANSI C and Linux.
The database was constructed using a Document_ID that is associated with each Web page. This document_ID was published as being a 4-byte unsigned long integer. This means that for every single Web page Google has in their index, an ID was created to identify this Web page. But like everything, there is a limit and a 4-byte unsigned long integer has a maximum value of
4,294,967,296. So if no changes are made to their database structure, it would mean Google has probably reached this threshold. And as new pages are added, old pages are removed (disappear). Quite alarming isn't it?

So Google may have a serious flaw in their database structure and design. Google has used an 4-byte unsigned long integer to store the document ID (every page in Google's index). In Linux (which is what Google uses), this variable is 4-bytes long, and has a maximum of 4.2 billion (4,294,967,296) before it rolls over to zero. This may also be one of the reasons pages
appear to be dropping from Google's index at an alarming rate (tens of thousands of search results where I can prove this is happening). They may have already run out of space and the document_ID is no longer associated with the content stored in the database which in turn will return empty results for a particular URL.

Can this problem be corrected? Sure it can, but Google has 15,000+ Linux servers and 4.2 billion document_IDs to convert. This is not going to be an easy task at this point. Also every single word in their inverted index is associated with a document ID so the conversion will probably take months if not a great deal longer.

In addition to this major problem, there are other major flaws with Google. One of these is with their PageRank algorithm.

According to a recent study, 75% of keyword searches on the Web are handled by Google. First off let me say that while Google may indeed handle 75% of keyword searches, you also have to consider how many of these people looked elsewhere as well. Yahoo! claims an Internet reach of over 80% so they too are handling these same requests, but probably delivering better less biased results.

Given that Google returns currently "popular" pages at the top of search results, only proves Google is unfairly penalizing newly created pages that are not yet "popular". While this statement may be an exaggeration, it does contain an alarming bit of truth. To find a web page, many users go to Google (or another search engine using Google's index like AOL) and issues a keyword query.

If the users cannot find relevant pages after several different keyword queries, they are likely to give up and stop looking further. So a Web page not indexed by Google or ranked poorly by Google (low PageRank - poor Google popularity) will not likely be viewed by many users. And because of this will never become popular according to Google's own admissions.

While Google takes more than 100 different factors into account in determining the final ranking of a Web page, the core of their ranking algorithm is based on a metric called PageRank. PageRank is nothing more than a "link popularity" metric, where a page is considered more important if the page is linked by many other pages on the web that Google also considers important (popular and already in Google's index). Google puts a
page at the top of search results that contain the keywords the searcher is looking for or by keywords found in the anchor text of those pages linking to it. The more popular the links pointing to this Web page, the more popular this Web page will be. So the popular continue to get "more" popular and the less fortunate ones that are new to the Web continue to be held back from this popularity game Google is playing.

It is important to understand the distinction between the "importance or quality" of a Web page and the relevance of "popularity". What do you want I ask? Do you want to see the same old popular sites day in and day out or would you like to see relevant content rich newly discovered Web pages? As long as you continue to use Google you will be promoting this popularity game and your competition will continue to rise above you not to mention you
will be missing out on the new stuff.

Since popular pages are repeatedly returned by Google as top results, they are also the easiest for users to discover, which increases their popularity even further. In contrast, a currently "unpopular" page is often not returned by Google, so few new links will be created to the page, keeping the page's ranking down. This "rich-get-richer" scheme can and does destroy the quality of search results.

PageRank is an unfortunate algorithm for both users and Web page authors and useful information is being ignored by Google simply because a new page or site has not had a chance to get noticed and under the PageRank algorithm, will never get noticed.

A recent article at motleyfool.com stated that 98% of Google's revenues come from their advertisers. This would mostly consist of Adwords and Adsense. But all it would take is a firewall company, Virus protection company, AOL or Microsoft to simply create a Google ad blocker and it will be the end of Google over night. These companies as well as Google already provide pop up and pop under blockers and writing a Google ad blocker would be even more
simple to do.

I have months of research to prove my statements.

Just my two cents for today!
Anthony Federico

Source: W3Reports


All trademarks and copyrighted information contained herein are the property of their respective owners.

 
Telecom News
Voip News
Hardware News
Wireless News



A   B   C   D   E   F   G   H   I   J   K   L   M   N   O   P   Q   R   S   T   U   V   W   X   Y   Z