What does SP3 mean?

PageRank, also known as page ranking, Google left ranking or page ranking, is a page ranking technology calculated by search engines according to hyperlinks between pages, named after Larry Page, the founder of Google. This technology is usually related to search engine optimization, and Google uses it to reflect the relevance and importance of web pages. Google founders Larry Page and sergey brin invented this technology at Stanford University in 1998. [ 1]

PageRank determines the level of a page through the huge hyperlink relationship of the network. Google interprets the link from page A to page B as page A voting for page B. Google determines the new level according to the voting source (even the source of the source, that is, the page linked to page A) and the level of the voting object. Simply put, a high-level page can improve the level of other low-level pages.

PageRank requires the link to "vote"

The number of votes for a page is determined by the importance of all the pages linked to it. A hyperlink on a page is equivalent to voting for that page. PageRank of a page is obtained from the importance of all pages linked to it ("linked pages") by recursive algorithm. Pages with more links will have higher rankings. On the contrary, if a page has no links, it has no ranking.

At the beginning of 2005, Google introduced a new attribute nofollow for web links, which enabled webmasters and bloggers to make links that Google did not vote, that is, these links were not "voting". Nofollow settings can resist comment spam.

The PageRank indicator on Google toolbar ranges from 0 to 10. It seems to be a logarithmic scaling algorithm, and the details are unknown. PageRank is a trademark of Google, and its technology has been patented.

Click algorithm in PageRank algorithm was put forward by jon kleinberg.

PageRank algorithm

simple

Suppose there is a small group of four pages: a, b, c and d. If all pages are linked to A, then the PR(PageRank) value of A will be the sum of B, C and D. ..

PR(A) = PR(B) + PR(C) + PR(D)

Continue to assume that B is also linked to C, and D is also linked to three pages including A. You can't vote twice on one page. So b gives half a ticket per page. By the same logic, only one-third of the votes cast by D are counted on A's PageRank.

PR(A)= \ frac { PR(B)} { 2 }+\ frac { PR(C)} { 1 }+\ frac { PR(D)} { 3 }

In other words, the PR value of a page is divided equally according to the total number of links.

PR(A)= \ frac { PR(B)} { L(B)}+\ frac { PR(C)} { L(C)}+\ frac { PR(D)} { L(D)}

Finally, convert all these into percentages and multiply them by a coefficient q, because the PageRank without pages will be 0 in the following algorithm. So, Google gives each page a minimum value of 1 through the mathematical system? Ask.

PR(A)= \ left(\ frac { PR(B)} { L(B)}+\ frac { PR(C)} { L(C)}+\ frac { PR(D)} { L(D)}+\，\cdots \right) q + 1 - q

So the PageRank of one page is calculated by the PageRank of other pages. Google repeatedly calculates the PageRank of each page. If you give each page a random PageRank value (non-0), then after repeated calculations, the PR values of these pages will tend to be normal and stable. This is why search engines use it.

accomplish

This equation introduces the concept of random browsing, that is, when someone is bored on the Internet, he randomly opens some pages and clicks some links. The PageRank value of a page will also affect the probability that it is randomly browsed. For the sake of understanding, suppose that the surfer keeps clicking on the links on the web page and finally arrives at a web page without any linked pages. At this time, surfers will randomly go to another web page and start browsing.

In order to be fair to those pages with links, the algorithm of q = 0.15 (see the meaning of q above) is applied to all pages to estimate the probability that pages may be bookmarked by Internet users.

So, this equation is as follows:

{ \ RM page rank }(p _ I)= \ frac { q } { N }+( 1-q)\ sum _ { p _ j } \ frac { { \ RM page rank }(p _ j)} { L(p _ j)}

P 1, p2, ..., pN are the learned pages, M(pi) is the number of pages linked to pi, L(pj) is the number of pages linked out of pj, and n is the number of all pages.

PageRank value is a feature vector in a special matrix. This eigenvector is

\mathbf{R} = \begin{bmatrix}

R is the answer to the equation

\ mathbf { R } = \ begin { b matrix } { q/N } \ \ { q/N } \ \ \ v dots \ \ { q/N } \ end { b matrix }+( 1-q)\ begin { b matrix } \ ell(p _ 1，p _ 1)& amp； \ell(p_ 1，p _ 2)& amp； \ cdots & amp\ell(p_ 1，p _ N)\ \ ell(p _ 2，p _ 1)& amp； \ ddots & amp& amp\ \ \ vdots & amp& ampell(p_i，p _ j)& amp； \\ \ell(p_N，p _ 1)& amp； & amp& amp\ell(p_N，p_N) \end{bmatrix} \mathbf{R}

\ell(p_i, p_j) is equal to 0, if pj is not linked to pi, and it holds for every J..

\sum_{i = 1}^N \ell(p_i，p_j) = 1，

The main disadvantage of this technology is that the old page level will be higher than the new page. Because even a very good new page will not have many upstream links, unless it is a sub-site of a site.

This is why PageRank needs a combination of multiple algorithms. PageRank seems to favor Wikipedia pages, always ahead of most or all other pages in the search results of item names. The main reason is that there are many links between Wikipedia and many sites.

Google often punishes malicious behavior to improve PageRank, but how to distinguish between normal link exchange and abnormal link accumulation is still a trade secret.