Share via


Search Engines

Hi...

I just found an article on problems regarding relevancy in search results and how it is resolved (see https://www.channelpartner.de/news/245588/index.htmlin German).

As widely known relevancy of pages are measured by counting the number of references to that page and - flattened by a little multiplier - the rank of the pages pointing to the one in question (which is called the Page Rank Algorithm and you may find here https://pr.efactory.de/e-pagerank-algorithm.shtml a little introduction).

This is at least an enormous successful algorithm, no question. Still it is based on pure probabelistic approaches. Why is this a problem? Do you know Gibbs?

I mean Josiah Willard Gibbs (see https://en.wikipedia.org/wiki/Josiah_Willard_Gibbs ) is now called the greatest thermodynamicist of them all. For those of you not really handy with physics: You will find applied Thermodynamics in every steam engine or refrigiator or... to make it simple: There would be no modern live without it.

Gibbs published a 300 pages (!!) paper called On the Equilibrium of Heterogeneous Substances in a somewhat obscure magazine called Transactions of the Connecticut Academy (see https://en.wikipedia.org/wiki/On_the_Equilibrium_of_Heterogeneous_Substances ) which according to what I have found so far even the Connecticut Academy didn't read.

You see the problem of Page Ranking here. Since nobody would reference this, nobody would take notice until those guys not using a search engine start to reference it. OK, it is a special domain and Gibbs is also known for his hard to read language Whewbut in the end the pure probabelistic approach is a start... but nothing more. Search engines must become "combiner" in a sense that they combine different sources of information (which is already in use or at least in research). Since the semantic web is still some time away the combination will still work with probabelistic approaches any time soon. Because finding the real nuggets means discovering new...

Coming back to the article: Knowing how this algorithm works give the opportunity to spam it. Having paid webpages only linking others to increase their rank is the result. Refining the algorithm will lead to refining the appearance of the web page. Until the computer will "understand" what it found Devil

CU

0xff