The full name of PR is PageRank, and it was granted a US patent in September of 200 1 year. The patentee is Larry Page, one of the founders of Google. Therefore, the Page in PageRank refers not to a webpage, but to a page, that is, this ranking method is named after the page.
Algorithm introduction
PageRank
Basic idea: If there is a link from Web page T to Web page A, it means that the owner of T thinks A is more important, thus giving a part of T's importance score. The value of this importance score is: PR(T)/C(T)
Where PR(T) is the PageRank value of T and C(T) is the chain number of T, then the PageRank value of A is the accumulation of a series of page importance scores similar to T..
PR(A)=( 1-d)+d(PR(t 1)/C(t 1)+…+PR(TN)/C(TN))
A stands for page a.
PR(A) represents the pr value of page a.
D is the damping index. It is generally believed that D=0.85.
T 1…tn stands for page t1to tn linked to page a.
C represents the number of external links on the page. C(t 1) is the number of external links on the page t 1.
As can be seen from the calculation formula, the calculation of PR value must be obtained through iterative calculation.
Advantages: it is a static algorithm unrelated to the query, and the PageRank values of all web pages are calculated offline; It effectively reduces the amount of calculation in online query and greatly reduces the query response time.
Disadvantages: people's queries have thematic characteristics, and PageRank ignores the thematic relevance, which leads to the decrease of the relevance and topicality of the results; In addition, PageRank discriminates seriously against new web pages.
Topic-sensitive
(Topic Sensitive Page Ranking)
Basic idea: It is put forward in view of PageRank's neglect of the theme. Core idea: calculate a PageRank vector set offline, and each vector in the set is related to a topic, that is, calculate the score of a page on different topics. It is mainly divided into two stages: the calculation of PageRank vector set related to the topic and the determination of the topic during online query.
Advantages: According to the user's query request and related context, accurately judge the topic (user's interest) related to the user's query and return the query result.
Disadvantages: the relevance of topics is not used to improve the accuracy of link scoring.
Hilltop
Basic idea: The difference with PageRank is that only the links of expert pages are considered. It mainly includes two steps: expert page search and target page sorting. Advantages: strong correlation and accurate results. Disadvantages: the search and determination of expert pages play a key role in the algorithm, and the quality of expert pages determines the accuracy of the algorithm, while the quality and fairness of expert pages are difficult to guarantee; Ignoring the influence of a large number of non-expert pages can not reflect the public opinion of the whole internet; When there are not enough expert pages, it returns empty, so Hilltop is suitable for refining the query ranking.