PageRank, web page ranking, also known as page level, Google left ranking or Page ranking, is a web page ranking technology calculated by search engines based on mutual hyperlinks between web pages. It was founded by Google. Named after Larry Page. This technique is often associated with search engine optimization, and is used by Google to demonstrate the relevance and importance of web pages. Google founders Larry Page and Sergey Brin invented the technology at Stanford University in 1998. [1]
PageRank determines the rank of a page through the vast hyperlink relationships on the Internet. Google interprets a link from page A to page B as page A voting for page B. Google determines the new rank based on the source of the vote (or even the source of the source, i.e. the page linking to page A) and the rank of the vote target. Simply put, a high-level page can improve the level of other low-level pages.
PageRank lets links "vote"
The "votes" of a page are determined by the importance of all pages linking to it. A hyperlink to a page is equivalent to Cast a vote on this page. The PageRank of a page is obtained by a recursive algorithm based on the importance of all pages that link to it ("linked pages"). A page with more links will have a higher rank, whereas a page without any links will have no rank.
In early 2005, Google launched a new attribute nofollow for web links, allowing webmasters and bloggers to make links that Google does not count, which means that these links do not count as "votes." . The nofollow setting can combat comment spam.
The PageRank indicator on the Google Toolbar ranges from 0 to 10. It appears to be a logarithmic scaling algorithm, details unknown. PageRank is a trademark of Google and its technology is patented.
The click algorithm in the PageRank algorithm was proposed by Jon Kleinberg.
PageRank Algorithm
Simple
Assume a small group of 4 pages: A, B, C and D. If all pages link to A, then the PR (PageRank) value of A will be the sum of B, C and D.
PR(A) = PR(B) + PR(C) + PR(D)
Continue to assume that B is also linked to C, and D is also linked to 3 including A. pages. You cannot vote twice on one page. So B gives each page half a vote. Using the same logic, only one-third of the votes cast by D count towards A's PageRank.
PR(A)= \frac{PR(B)}{2}+ \frac{PR(C)}{1}+ \frac{PR(D)}{3}
In other words, the PR value of a page is divided equally according to the total number of links.
PR(A)= \frac{PR(B)}{L(B)}+ \frac{PR(C)}{L(C)}+ \frac{PR(D)} {L(D)}
Finally, all of this is converted into a percentage and multiplied by a coefficient q. Due to the algorithm below, no page will have a PageRank of 0. Therefore, Google gives each page a minimum value of 1?q through a mathematical system.
PR(A)=\left( \frac{PR(B)}{L(B)}+ \frac{PR(C)}{L(C)}+ \frac{PR( D)}{L(D)}+\,\cdots \right) q + 1 - q
So the PageRank of a page is calculated from the PageRank of other pages. Google continuously recalculates the PageRank of each page. If you give each page a random PageRank value (non-zero), then after repeated calculations, the PR values ??of these pages will tend to be normal and stable.
That's why search engines use it.
Complete
This equation introduces the concept of random browsing, that is, someone is bored online and randomly opens some pages and clicks on some links. The PageRank value of a page also affects the probability of it being randomly viewed. For ease of understanding, it is assumed here that the surfer keeps clicking on links on the webpage and finally reaches a webpage without any linked pages. At this time, the surfer will randomly go to another webpage and start browsing.
In order to be fair to those pages with links, the algorithm of q = 0.15 (see above for the meaning of q) is used on all pages to estimate the probability that the page may be bookmarked by surfers.
So, the equation is as follows:
{\rm PageRank}(p_i) = \frac{q}{N} + (1 -q) \sum_{p_j} \ frac{{\rm PageRank} (p_j)}{L(p_j)}
p1,p2,...,pN are the pages under study, M(pi) is the number of pages linked to pi , L(pj) is the number of pages linked out by pj, and N is the number of all pages.
PageRank value is an eigenvector in a special matrix. This feature vector is
\mathbf{R} = \begin{bmatrix} {\rm PageRank}(p_1) \\ {\rm PageRank}(p_2) \\ \vdots \\ {\rm PageRank }(p_N) \end{bmatrix}
R is the answer to the equation
\mathbf{R} = \begin{bmatrix} {q / N} \\ {q / N} \\ \vdots \\ {q / N} \end{bmatrix} + (1-q) \begin{bmatrix} \ell(p_1,p_1) & \ell(p_1,p_2) & \cdots & \ell (p_1,p_N) \\ \ell(p_2,p_1) & \ddots & & \\ \vdots & & \ell(p_i,p_j) & \\ \ell(p_N,p_1) & & & \ell(p_N, p_N) \end{bmatrix} \mathbf{R}
If pj is not linked to pi, and it is true for every j, \ell(p_i,p_j) is equal to 0
\sum_{i = 1}^N \ell(p_i,p_j) = 1,
The main disadvantage of this technique is that the old page rank will be higher than the new page. Because even a really good new page won't have many upstream links unless it's a subsite of a site.
This is why PageRank requires a combination of multiple algorithms. PageRank seems to favor Wikipedia pages, always before most or all other pages in the search results for the entry name. The main reason is that there are many links to each other in Wikipedia, and there are many sites linking to it.
Google often punishes behaviors that maliciously increase PageRank. How it distinguishes between normal link exchange and abnormal link accumulation is still a commercial secret.