ABSTRACT
“Link Analysis has shown great potential in improving the performance
of web search. PageRank and HITS are two of the most
popular algorithms. Most of the existing link analysis algorithms
treat a web page as a single node in the web graph. However, in
most cases, a web page contains multiple semantics and hence the
web page might not be considered as the atomic node. In this
paper, the web page is partitioned into blocks using the visionbased
page segmentation algorithm. By extracting the page-toblock,
block-to-page relationships from link structure and page
layout analysis, we can construct a semantic graph over the
WWW such that each node exactly represents a single semantic
topic. This graph can better describe the semantic structure of the
web. Based on block-level link analysis, we proposed two new
algorithms, Block Level PageRank and Block Level HITS, whose
performances we study extensively using web data.”
Introduction
Many engines use link based algorithms, like PageRank and HITS, to evaluate websites and rank them in the search results pages. Link based algorithms are used because links are seen to convey human endorsement, and because it is assumed that co-citations are related (so a site about dogs would be likely to link to other sites about dogs).
The problem, however, is that in many cases these assumptions are not valid. Many websites, like news sites, link to non-related content, plus they contain many navigational and advertisement links.
These problems are caused because web pages contain multiple semantics, and are broken down into units, some more important than others. So the problem is that if a web page is considered the smallest unit, you have to treat all semantics as having the same level of importance, which is often not the case.
In this paper, the authors propose two link analysis algorithms called Block Level PageRank and Block Level HITS, which treat the semantic blocks as information units. This is accomplished by using the Vision-Based Page Segmentation algorithm to extract page-to-block and block-to-page relationships. These algorithms can improve the relevance of search as it eliminates the problems mentioned above.
Vision Based Page Segmenation
The VIsion-based Page Segmentation (VIPS) algorithm aims to extract the semantic structure of a web page based on its visual presentation. It works by:
1. extracting all the blocks from the html code
2. finding the separators between these blocks
Based on these separators, the semantic tree of the web page is constructed. Thus, a web page can be represented as a set of blocks.
This segmentation allows information such as advertisements, navigation, and decoration to be identified and removed from the quality semantic blocks. Content with different topics can be seen as separate blocks and treated separately.
Block level PageRank
This algorithm is similar to the original PageRank algorithm, but the key difference is that BLPR models web structure in the block level.
The key difference is that web pages that are linked by many advertisement links may not be assigned a high value since the ad links would come from less important blocks, so they would receive fewer points. Thus, ad links would not significantly impact Block Level Page Rank.
Block Level HITS
HITS assigns two values to each page (authority value and hub value). Hubs and authorities thus exhibit a mutually reinforcing relationship, where hubs are identified based on links from authorities, and vice versa.
As has been established, there are always multiple semantic regions in one page. Some hyperlinks such as banners, navigation panels, and advertisements in a page do not convey human endorsement. Thus equally mutually reinforcing all the links in a page might not be suitable.
In BLHITS, the authority hub reinforcing idea is the same as the original HITS. The main difference is that in BLHITS, a page will have only authority score and a block will have only hub score. By assigning hub scores to blocks, it allows authority scores to be modified based on the importance of the blocks of links where they are contained.
In Block-Level HITS, the importance values of different parts of the page are treated differently. Thus, the links in these hubs are treated differently, which can affect the authority-hub reinforcing process.
Conclusion
Based on web page segmentation (VIPS) techniques, web pages are treated as a set of blocks and the links
are from blocks to pages rather than from pages to pages. From the page to block relationship (page layout analysis) and block to page relationship (link analysis), a new page to page graph and block to block graph can be constructed. Based on these new graphs, Block Level PageRank and Block Level HITS algorithms can be implemented.
Experiments show that Block Level PageRank outperforms PageRank and Block Level HITS outperforms HITS.
What this means for SEO’s
These algorithms give the search engines the ability to break up the pages into individual blocks, and to assign different levels of importance to each block. Thus, inbound link scores are modified based on the placement of the links on the page.
Obviously, this has tremendous impact on your link building campaign. It proves that the search engines have the computational strategy and power to identify placement of links on a page, and then give more value to certain links based on where they’re placed.
Additionally, it allows the engines to assign a semantic topic to each block, and to use this when weighing semantic relatedness. So if your link about dogs is placed in a “block” that is about houses, the relevance weight would be diminished, as they are not semantically relevant.
It is simple to infer that the blocks that would receive the most importance are the areas where the actual text of the page is placed, or the main body of the page. This is where the most content can be identified. The first paragraphs would have the most importance, then diminishing in importance, so advertising and navigational units would receive the least importance points.
How this should modify your SEO campaign
1. Your links should be in the main body of the page
2. Your links should NOT be within navigational or advertisement blocks
3. Your links should be in areas that are contextually relevant
4. Make sure people use descriptions with your links (instead of just a few words in anchor text) so that the description makes the block contextually relevant
5. Avoid having your links in footers, or other areas that do not content important content
Remember, links will still count if they are within undesirable areas, but they will receive fewer points and will be considered less important.
To get the most out of your link or ad campaigns, try to make sure you are receiving links from contextually relevant sites, and in prominent placements.


