Understanding Trust Rank
Understanding
TrustRank
Disclaimer: This is our INTERPRETATION of the data that
we read in this paper. As we are not programmers, mathematicians,
or IR specialists, we have used our knowledge of marketing
and SEO to extrapolate meaning and ideas from the information
in this paper. Feel free to email us if you disagree with
our conclusions or if you would like to give us your own
ideas.
Abstract
”Web spam pages use various techniques to achieve
higher-than-deserved rankings in a search engine’s
results. While human experts can identify spam, it is too
expensive to manually evaluate a large number of pages. Instead,
we propose techniques to semi automatically separate reputable,
good pages from spam.
We first select a small set of seed pages to be evaluated
by an expert. Once we manually identify the reputable seed
pages, we use the link structure of the web to discover other
pages that are likely to be good. In this paper we discuss
possible ways to implement the seed selection and the discovery
of good pages. We present results of experiments run on the
World Wide Web indexed by AltaVista and evaluate the performance
of our techniques. Our results show that we can
effectively filter out spam from a significant fraction of
the web, based on a good seed set of less than 200 sites.”
- Preliminaries
a. Web Modeli. Web is modeled as a graph consisting of pages and a set of directed links that connect pages.
ii. Self links and multiple links from same site are removedb. PageRank
i. The proposed algorithm relies on pagerank (the importance of a page
influences and is being influenced by the importance
of other pages)
ii. PageRank assigns a static score to
each page, but a biased Page Rank version may break this
rule. A non-zero static score can be assigned to a set
of special pages only. The score of these pages is then
spread during the iterations to the pages they point
to. - Assessing Trust
a. Oracle and Trust Functionsi. Oracle assigns value: 0 if page is bad, 1 if page
is good
ii. As this is expensive and time consuming, the oracle should only review a
subset of pages
iii. Approximate isolation of the good set: good pages seldom link to bad pages
iv. Trust Function: yields a range of values between 0 and 1. It should give
probability that a page is good or not.b. Ordered Trust Property: The Trust function should
predict the likelihood of a page being good, so
the results can be ranked by their trust value (high
probability
of being good means pages get ranked higher in a
list, and vice versa)c. Threshold Trust Property: if a page receives
a score above a - Evaluation Metrics
a. Pairwise Orderedness: signals if a bad page received
an equal or higher trust score than a good page (violation
of ordered trust property). This evaluates the accuracy
of Tb. Precision: fraction of good among all pages in X that
have a trust score above a thresholdc. Recall: ratio between the number of good pages with
a trust score above a threshold and the total number
of good pages in X - Computing Trust
a. Ignorant Trust Function: For pages not reviewed and
given a value by an oracle, they are given a value of ½ which
means that no data is known for those pagesb. Trust Propagation: The oracle is invoked to check a
random selection of L pages. Then, expecting that good
pages only link to good pages, we assign a score of 1
to all pages that are reachable from a page with positive
trust in M or fewer steps ( 1 and 2 steps gave the best
results)
i. The problem with this is that sometimes
good pages link to bad pages. The further away we are
from good pages, the less certain we are that a page
is good.
c. Trust Attenuation: Essential to remove trust the
further we are from seed pages
i. Trust Dampening: the trust factor is reduced the
further away we are from a good site. So if good seed
A has a score of 1, site B has a score of b < 1,
and site C has a score of b * b (reduced more the further
away you are from good site)ii. Trust Splitting: This handles pages with multiple outlinks. That is, if a
good page has only a handful of outlinks, then it is likely that the pointed
pages are also good. However, if a good page has hundreds of outlinks, it is
more probable that some of them will point to bad pages.
1. Trust score is split amongst the outbound links
based on the amount. So if a good seed has 2 outbound
links, its trust score of 1 is split into 2, so each
page gets .5 trust points.2. The actual score of the page will be the sum of the score fractions received
through its inlinks. The more “credit” , the likelier it is to be
good.
iii. Trust splitting can be combined with trust
dampening.
a. Select Seeds: used to identify desirable pages for the
seed set (the most useful in identifying good pages).
Needs to be relatively small.
i. Inverse PageRank: Number of outbound links
(the higher the outlinks, the more likely of
getting picked) – importance of a page depends
on its outlinks, not on inlinks.
ii. High PageRank : Preference is given to pages
with high page rank as they are more likely to
link to other high page rank pages.
b. Generate a corresponding order of the seeds according
to their desirability as seeds
c. Select Good Seeds: invokes oracle...so person reviews
those sites and gives them value.
d. Normalize static score distribution vector: this only
allows to have a trust of max 1
e. Compute TrustRank scores: uses biased pagerank computation,
with the uniform distribution factor being replaced.
i. Uses trust dampening and splitting where trust
score is split amongst its neighbors and dampened by
a factor
ii. TrustRank “refines” the original
scores given by the oracle according to the structures
of links, since it has more information to use..
f. Unreferenced pages have score of 0, unless
they are selected as seeds.g. Pages can be organized first by PageRank,
and only pages with high enough pagerank
are used to
compute TrustRank,
otherwise its’ a waste of resources.Conclusion:
Basically, this enables them to modify PageRank. PageRank
can be easily manipulated as it doesn’t care about
quality. By using a combination of both, the basic PageRank
formula can be used (is cheap to use and works well), then
modified according to trust factors.Adding human interaction enables them to then compute
scores automatically. People manually review sites and
assign a trust score. Then, this trust score is split amongst
its outbound links using the algorithm. So the trust score
of other sites, even if they are not manually reviewed,
is then based upon the trust score received from other
sites (with a max of 1). Sites with a higher trust score
can then rank higher.What this means for SEO’s
- Try to identify good sites in your industry. These
sites were chosen by number of outbound links as well
as by high page rank scores. Remember that those pages would’ve
been reviewed by a person, so only select sites that
are genuinely valuable. - Good sites are bound to be ranking in the serp’s
as they will have high TrustScores, thus modifying their
pagerank and excluding spam sites - Try to get links from those good sites, or at
least from pages that are receiving links from good sites. - Use up to 3 levels of separation from the good
sites - The more links you receive from good sites, the
higher your TrustScore. - If you have too many links from bad sites, it’ll
lower your score. Bad sites can be sites considered “unworthy” by
human reviewers, or sites that received low points from
other sites - Avoid having too many links from bad sites, as
the more you have, the more it’ll work against you
based on your “trust score”. - Try to only have links from trusted sites, and
to have as few links as possible from bad sites - The higher your trust score, the higher you rank, as
it’ll modify your Page Rank score positively. - Page Rank is still used, so you still need to
get links, but try to get links mainly from good sites. - They are “collapsing” multiple links
from one URL, and only counting it as one link - Self links are removed and only links from external
sites are taken into account
Make sure this is an important aspect of your SEO campaign,
for these trusted links enable you to get past the sandbox
and to boost your rankings significantly.
Related posts:



