How Web Pages Are Ranked

UT's search engine is constantly spidering through UT web space, indexing new pages and re-indexing already existing pages that have been changed or edited. When indexing a page, the engine not only catalogues the words it encounters, but also assigns relative weights to what it sees, affecting the ranking of the corresponding document. Here, we describe how the engine weights words, an important consideration for webmasters who want their sites to be appropriately ranked on results pages.




A word's weight is proportional to its relative frequency in a document. This is only natural and is the primary basis for the specificity of search results. However, words are given more heft depending on where they are located. For instance, instances of a word found inside the title tags of a document are weighted 8 times more than instances of the same word found inside the body. Consequently, a word's weight is a function of both its raw frequency and its tag association. A complete listing of weights and tags is shown below.

tagweight
title8
description4
keywords4
alt1
body1



To illustrate, we generated three simple web pages, each containing 3 instances of the nonsensical string "w89z32a12," and then allowed the search engine to index them. Shown here are their raw codings:

PAGE 1PAGE 2PAGE 3
<html>
<title>PAGE 1:w89z32a12 w89z32a12</title>
<body>
This page has three instances of w89z32a12:
one in the body and two
in the title.
</body>
</html>
<html>
<title>PAGE 2:w89z32a12</title>
<body>
<meta name="description"
content="This page has three instances
of w89z32a12: one in the body,
one in the title and one in the description tag">
w89z32a12
</body>
</html>
<html>
<title>PAGE 3</title>
<body>
This page has three instances
of w89z32a12, all in the body.
w89z32a12 w89z32a12
</body>
</html>


Note that all three pages have the same number of instances of "w89z32a12" (namely three), but they differ as to where exactly the strings are located. In page 1, observe that "w89z32a12" occurs twice in the title and once in the body. So, according to the rules outlined above, page 1 should exhibit a weight of 17 with respect to this particular string. In page 2, the string is found once in the title, once in the description tag and once in the body. Hence, it should rate a weight of 13. In page 3, the string is found all three times inside the body. Consequently, it warrants a 3. Therefore, upon being queried with "w89z32a12", the search engine should rank the pages according to their decreasing respective weights, i.e., page 1 should be ranked first, page 2 should be ranked second and page 3 should be ranked last.

Go ahead and submit "w89z32a12" and see if we're right: