Unlike Web pages, there are no links in books for Google to index and
use to calculate PageRank. There’s no anchor text in links to use as
if it were meta data about pages being pointed towards. Books aren’t
broken down into separate pages that have a somewhat independent
existence of their own the way that Web pages do, with unique title
elements and meta descriptions and headings. There isn’t a structure of
internal links in a book, with file and folder names between pages or
sections that a search engine might used to try to understand and
classify different sections of a book, like it might with a website.
A Google patent granted today describes some of the methods that
Google might follow to index content found in books that people might
search for. It’s probably not hard for the search engine to perform
simple text based matching to find a specific passage that might be
mentioned in a book. It’s probably also not hard to find all of the
books that include a term or phrase in their title or text or which were
written by a specific author. But how do you rank those? How do you
decide which to show first, and which should follow?
Google was granted a patent on Query-independent entity importance in books
today, originally filed on July 25, 2010. The inventors include David
Petrou, Chiu-Ki Chan, Daniel Loreto, Jeffrey C. Reynar, and Nikola
Jevtic.
Google’s indexing of books explores and collects information about
entities, or specific people, places, dates, events and things mentioned
in those books.
An importance score might be created about each of those entities based upon a number of factors, such as:
1. How much information about a specific entity is included in the book and where
The patent tells us that the appearance of an entity in different
sections of a book may influence how much weight each entity might
carry, such as inclusion of the entity in places like:
- Front and back covers,
- Book flap,
- Copyright page,
- Table of contents,
- Forward or afterward,
- Index,
- Reference section,
- Chapter heading,
- Chapters,
- Special pages within chapters (such as the first page of the chapter), and
- Atypical pages (e.g., such as pages that do not contain much text).
The patent does provide some hints as to which locations might carry
more weight (such as a mention in the first part of the first chapter
being very significant) and which parts might carry much less weight,
such as in the copyright notice.
2. Whether there are third party references point to a particular book and to its mention of specific entities.
These references can include things such as:
- Book reviews,
- “About the book” information,
- Book citations,
- Scholarly citations, and
- World-Wide Web references
If these references are frequently referenced themselves elsewhere, they may carry more weight. As we’re told in the patent:
For example, if a scholarly article cites a particular chapter of a book, and the article mentions an entity that is also mentioned in the cited chapter, the references module elevates the importance score of the entity.
In one embodiment, the third party references considered by the references module have a greater influence on the importance score than the intra-book references considered by the book context module. Third-party references are considered less partial and, therefore, are considered better signals of the importance of a section or entity in the book.
3. Whether or not the sections of a book that include that entity are accessed more than other sections of the book
If people search for the book online and access different parts of
it, what do they look at? Are there some sections that get visited more
than others? If so, what are those?
4. How frequently that entity is mentioned in the book
compared to how frequently the entity is mentioned in other books in the
collection of books.
So, for instance, a book that mentions New York City more frequently
than other books that mention New York City might be seen as having a
higher importance score for the entity “New York City.”
Other Ranking Considerations
The patent also describes some different ways that this kind of
information might be presented, such as showing location information on a
map, or event information on a timeline, or facts about a person in
text or a table.
A search engine might also look at metadata about books that it
indexes, which is often presented as structured data such as Author’s
name, publisher, year published, number of pages, edition, Dewey Decimal
Classification, Library of Congress classification, ISBN number, and
more.
Other query independent factors that a search engine might consider
can include a sales volume for a book, or current position on a best
seller’s list.
An overall ranking for a book in response to a query might include
both these query independent scores as well as query dependent signals,
such as the number of terms in a query that match those in a book,
synonym matching, and other information retrieval techniques.
Conclusion
I’m a believer in the idea that if you want to understand something
well, you need to be able to step a little outside of it and look at it
from a different perspective. If you want to be a good writer, it helps
to learn a differnt language and see how its rules and manners of
expression differ from your own. If you want to learn about how an
operating system works well, it helps to install and learn about a
different operating system so that you can understand the similarities
and differences between the two.
I don’t think that people will start thinking about these types of
“ranking signals” when they start drafting a book anytime soon. I have
brushed over a number of the approaches that Google’s patent describes
on how they might rank books in a book search. I pointed out in the
beginning of this post a number of the differences between books and web
pages, and how those might make ranking those different. But there are
also many similarities, and it helps to understand both the how and why
of those differences.
Google has been using information extraction approaches to collecting
information about entities that it finds on the Web, and it wouldn’t be
surprising to see more of the ideas behind how books might be indexed
flowing over into how web pages and websites are indexed.
your web site is very good and glorious too.i like your website. you posted many important thing about seo.which is very important to all of us.who do seo works. you have also posted many important document is content in it.
ReplyDelete