Understanding Kleinberg’s Hubs and Authorities

0

Articles

Understanding
Kleinberg’s Hubs and Authorities

Definitions:

Abundance Problem: The number of pages that could reasonably
be returned as relevant is far too large for a human user
to digest. To provide effective search methods under these
conditions, one needs a way to filter from a huge collection
of relevant pages, a small set of the most authoritative
or definitive ones.

Abstract

“Hyperlinks encode a considerable amount of latent
human judgment. By creating links to another page, the creator
of that link has “conferred authority” on the
target page. Links afford us the opportunity to find potential
authorities purely through the pages that point to them.

In this work we propose a link-based model for the conferral
of authority, and show how it leads to a method that consistently
identifies relevant, authoritative www pages for a broad
search topics. Our model is based on the relationship that
exists between the authorities for a topic and those pages
that link to many related authorities | we refer to pages
of this latter type as hubs. We observe that a certain natural
type of equilibrium exists between hubs and authorities in
the graph developed by the link structure, and we exploit
this to develop an algorithm that identifies both types of
pages simultaneously. The algorithm operates on focused
subgraphs
of the www that we construct from the output of a text-based
www search engine; our technique for constructing such subgraphs
is designed to produce small collections of pages likely
to contain the most authoritative pages for a given topic.”

Constructing a Focused Subgraph
of the WWW

  1. query string q
  2. build subset of Sq with:
    a. relatively small
    b. rich in relevant pages
    c. contains most of the strongest authorities
  3. collect the t highest ranked pages for query q from
    a search engine (root set Rq)
  4. Expand Rq along the links that enter and leave it
  5. Sq = expanding Rq by including any page pointed
    to by a page in Rq, and any page that points to a page
    in Rq (setting a max amount of pages any given page can
    bring in to the set Sq)
  6. Sq is base set of q
  7. Intrinsic links (links from within the same site)
    are deleted from the graph, keeping only links to and
    from external domains
  8. Gq is graph Sq minus intrinsic links
  9. Pages are then given a value as an authority weight
    or a hub weight. If a page points to many pages with
    large authority values, then it receives a high hub value;
    if a page is pointed to by many pages with large hub
    values, then it should receive a large authority value.
  10. Filter out the top C authorities and the top C
    hubs

Summary

Basically, they are creating small subsections used for link analysis to rank
pages according to their authority status. To do this, they match a query with
X amount of top ranked pages. Then, instead of ranking those, they use it to
build a subset of relevant pages. Then, they add to that subset of pages Y
amount of pages that link to or from the previous subset. Then, based on how
these pages link to each other, they determine the hub & authority status
of each of the pages. This is defined by linkage structure – if a page
has a high number of links OUT to a page that has a large number of Links IN,
it’s considered a hub. If a page has a large number of links IN from
a large number of pages that have many links OUT, it’s considered an
Authority. Then, rankings are organized by the amount of links in from hubs
to the authorities, and this score is used to rank the authorities.

What
this means for SEO’s

  • Make sure your site has links from sites that are relevant to your
    keywords.
  • Identify the “hubs” for your keywords – sites
    that have a lot of outbound links, and get as many links
    as possible from those sites.
  • If possible, identify “authorities” in
    your keyword sector – sites with many links in from
    various hubs – and get links from those sites.

Related posts:

  1. Understanding Block Level Link Analysis
  2. Understanding Local Rank
  3. Understanding Hilltop
  4. Understanding Trust Rank

Comments are closed.