Title: Amazon

Publisher: Amazon, Inc.

URL: http://www.amazon.com

Tested: continuously

Price: free

In the December issue of my column in every year I try to review ready reference sources which also help the librarians in their quest for holiday gift shopping. Few are as widely appropriate as the world’s undoubtedly best virtual bookstore (and arguably the best virtual department store), which keeps getting better and better again, after Amazon has re-focused its  attention on its flagship department.



There are many web shops which sell books, but none of them does it as well as Amazon. Indeed, “nobody does it better”. Amazon provides the most pleasant total bookstore (and library) experience from browsing around to thumbing through pages, to reading reviews, to  getting the books delivered in a gift packaging if that’s what you want. It has also the best prices, the best tracking and account management system. I keep buying a variety of products through Amazon, but books are the closest to my heart.

Amazon spread itself thin a few years ago when it started to sell everything from baby cribs to dumbbells and yoga mats to pet food. It has been a pleasure to see the new enhancements related to books introduced since my last review.



A few years ago Barnes and Noble challenged Amazon for using the tag line: the world largest bookstore. Only the lawyers involved in the matter enjoyed and benefited from the issue. It reminded me of the argument during the cold war era that the length of all the bookshelves in the Lenin Library  of the Soviet Union (as it was called then) exceeds that of the Library of Congress. It was akin to the claim that the Soviet basketball team had the tallest player. It was not mentioned that he was the least skilled players of all the teams.  Barnes and Noble is miles behind Amazon in this marathon competition in functionality, grace and dexterity.

Amazon changed and trademarked its tag line to Earth’s Biggest Selection.  I can’t tell you how many books Amazon has now, but beyond tens of millions, it does not really matter when comparing web bookstores. What does matter much more (among other things to be discussed later) is that Amazon certainly has the largest  variety of additional and useful components and content features (both  from a bibliophile and a buyer’s point of view), such as the largest collection of  digitally browsable, legitimately digitized  and accessible pages from copyrighted books, far the most full text or excerpted book reviews from  respected traditional book review sources, and now the best-implemented book-to-book citation index

If you look up the same edition of the same book, “In Search of Excellence” in Amazon and B&N, only some of the differences are apparent, such as the $10.87 price tag at Amazon, versus the $15.99 price ($14.39) at B&N). Amazon alerts you of  the many different editions, and of the lowest used copy price (1 cent, but you must add shipping to this) at Amazon partners. B&N does not indicate the lowest used copy partner price. When you look up the used copy alternatives at B&N there is only one for $9.60 – a rather steep price for a used copy  in acceptable quality as described by the seller. Although sellers’ descriptions are more accurate than landlord’s euphemistic adjectives about their apartment (where cozy usually means so small  that you may get into it only sideways, and preferably on all fours if you are taller than 6 foot high), the acceptable quality is not very encouraging.


Traditional enhancement elements

Until recently, most Web bookstores had more information than the basic, subscription-based Books in Print. Bowker has had such a strong position for decades in this market that it needed a loud wake-up call, and a long time to wake up and offer enhancements to the classic bibliographic descriptions, such as table of content pages, back of the book indexes, and reviews (featured in the more expensive edition, BIP with Reviews). These enhancements have become by now rather standard (to different extent), in the largest web book stores.

Based on the first impression,  B&N may seem to have the advantage, showing the availability of an annotation, a publisher blurb, critiques, customer reviews, a sample chapter, and the table of contents page . On a closer look, however, the advantages disappear in comparison with Amazon, which has the same publisher blurb (actually a quotations from a Wall Street Journal review), and better description than B&N.  Amazon’s back cover has a more informative publisher blurb and the photos of the authors. Both have the same cover page, and neither has full book reviews (which in other cases is typically  a forte of Amazon for more current books).

B&N claims to have 4 critiques but two of them are duplicates as shown in the earlier screenshots to start with, and they are just the punch lines from reviews, one of which appears on the front page of the book, but Amazon does not count it among the editorial materials as B&N does for this book. Both have a sample chapter, but in B&N it is the 1-page preface, while in Amazon, it is 4 pages of Part I of the book.

Both Amazon and B&N offer the  Table of Contents, but once again, scratching the surface reveals a big difference: B&N omitted the titles of chapters 5-10, another really cheap trick. Amazon shows the entire Table of Contents. B&N does not have the back cover page, and very importantly the useful back of the book index.

I don’t care too much about customer reviews, but sometimes a useful one may show up. I understand that reading other customers’ review is a possible social link for many users to reach out and touch someone with seemingly similar opinion and taste as reflected by the review. With that said, there are 6 customer reviews in B&N, and 39 in Amazon].


Novel enhancements in Amazon

The traditional content enhancements represent only the tip of the iceberg for Amazon. Once again, it has novel and unique features, some of them light-hearted, even frivolous, others are very useful and informative.  All these are made possible by the large-scale digitization of the full-text of books in cooperation with  publishers who hold the copyright and participate in the Search Inside the Book project. This idea was recently embraced also by Google, Yahoo, and MSN, but the Google Book Search (formerly Google Print) project is very controversial, and AAP, the Association of American  Publishers filed a lawsuit against Google a month ago for copyright infringement by unauthorized digital copying.

Amazon started the full text digitization project  several years ago on a very selective basis by launching the Look Inside the Book feature, then enhanced it by 2003 for introducing the Search Inside the Book feature. I had a review of the splendid features of this enhanced service two years ago (Tim, I could not find it, got an Internal server error  message Wednesday and Thursday, can you link).

Earlier this year several new features were launched, which seem now to be available for about every second  book that I looked up. And I looked up a lot because I tried to reconstruct the most essential part of my professional library collection in my on campus office which was destroyed along with all of our offices, classrooms and computer labs a year ago by a flash flood which caused tremendous damage in the basement of the central library of the University of Hawaii.

There is a concordance list  of the 100 most often occurring words in the text of the book at hand. The most common stop words (such as the, a, from) are excluded, but Amazon should eliminate many additional words, prepositions, adjectives and numbers which have no content meaning by themselves (without context), such as almost, another, day, does, down - to name a few from the beginning of the alphabet. Many of the words are useful to give -indirectly at least- a sense of the major topics of the book, in a visually interesting style. The larger the font the more often the word occurs.

If you want to know the absolute score of a word, you just need to hover above it, and the number will appear, such as companies (787 times), people (469 times), product (305 times), and business (292 times). Clicking on a word will show their occurrences in context – which can be very informative.

Looking up the pages where IBM is mentioned brings up 94 pages which show kind of a KWIC (keyword-in-context) index with the matching word highlighted. Clicking  on the quick index entry will display the page itself. This is very convenient and often better then even the best back-of-the-book index (BOBI). I know the difference, I teach an abstracting and indexing course, and appreciate the good traditional BOBIs, still I enjoy this approach.   A plain list version with the scores and hotlinks, may be more scholarly, but Amazon should not be denied the right to play around a bit.

The function of showing the statistically most improbable (SIP) and the capitalized (CAP) terms is somewhat similar. Based on the relative rarity of the such terms executive champion or Harvard Business  School in the entire full-text digital collection, the rarest ones appearing in the book at hand  are displayed, hinting at  the specialty aspects of the book. Clicking on the term will display a KWIC-list of other books in which the term appears the most often. This playfulness applies even more to another new feature as well, which shows how much bang you get for your money, buy calculating  how many words and ounces of text you get for one dollar.

The readability statistics of the text are undoubtedly serious and highly useful. These have been used in judging the grade level of classroom readings, and are appreciated in the school-oriented databases of EBSCO by teachers and librarians (including myself). All the three classic readability indexes (Fog, Flesch, Flesch-Kinkaid are shown) are shown along with a small graph that indicates the standing of this book in terms of readability against the other books in full text digital format. This should be limited to such books on the same subject, and Amazon could easily do this filtering based on the subject headings.The details of the components which are considered in calculating the readability level are also shown, such as the complex words, syllables per word, words per sentences, as well as the number of characters, words and sentences.

Within a few years B&N will also offer such functions, but it will be applauded by most journalists and web bloggers when Google borrows the idea, and then many other information services will hop on the bandwagon.


Cited and Citing References

This feature belongs to the ones mentioned in the previous section, but because it is a hot and often misunderstood issue, and brutally ill-implemented feature, it deserves special treatment because Amazon does it well (with few exceptions) even in the early days of its debut. Web of Science does not have books as source documents because developers at the Institute for Scientific Information are aware of the difficulties of correctly matching citations given to various editions of the same book, in various formats and styles, along with many of the other vagaries of book citations – much worse than the inaccuracies and inconsistencies in citations of journal articles and conference papers (which are far less good and consistent than the syntax-wise enforced, but non-standard  citation styles of publishers’ guidelines  would suggest)  Scopus has records for nearly 20,000 books at the end of November, but only about 65 had cited references  – probably for the same reasons mentioned above. The smartest ones recognize their own limitations.  PsycINFO has 223,675 records for books, book chapters, and has cited references for 34,444 of them but there is none with the “cited by” feature, not even in its two best implementations by CSA Illumina and Web of Knowledge. I have seen only CiteSeer to be able to handle book citations with aplomb, and show the context of the citations given in an efficient manner for many but not all of the citing documents along with the full text of those documents.

I know that I expose myself to the wrath of millions of Google Scholar fans who take its hit counts and citation counts at face value. Still, one must recognize that Google Scholar plays fast and loose with its numbers, Boolean operations and citation matching  algorithms as I illustrated in an impromptu interview with an editor of The Scientist, and experience it day in day out. You need to scratch the surface of Google Scholar to see the grave problems before you write your comparative evaluation article or post an “I feel lucky and happy and so does my auntie with Google Scholar” message   on a blog.

With few exceptions (to be discussed in the software section), Amazon seems to match very well  the citing and cited books, and presents the results in an impressively informative format. For the book I used as a simple illustration for this column, Amazon lists  51 books that is cited by In Search of Excellence, and a whopping 576 books which cites In Search of Excellence. Amazon is the first to point out that this is not a comprehensive list. “Only” those books are listed which are part of the Search Inside the Book sub-collection of Amazon, i.e. the ones whose full text is digitally available. But trust me, it is huge. In my ballpark estimate, there may be 22-25 million full text books in Amazon, and about 10-12% of those items have cited/citing references.

The number of citing references which cited the book varied from 1  to 3,842. The top cited book by books in the SIB sub collection is the one about the Declaration of Independence. This is an obvious outlier, but there were many books in Amazon with a citedness score in the mid-hundreds.

The scores cannot be compared directly with the scores reported by Google Scholar for several reasons. One is that the citedness scores in Google Scholar include journal articles, conference papers, books, PowerPoint presentations, course reading lists, student papers, and a variety of other ephemeral materials gathered from the Web.

For example, the In Search of Excellence book in Google Scholar has a citedness score of 969. However, only 46 of the citing sources are books, and that’s what needs to be compared against Amazon’s more than 500 citations from books. (There is a reason I don’t use here the reported citedness score of 576.). The citing books are presented in a highly efficient format. If you want to see more details, just a click on the page numbers listed will bring up the page with the citation – no muss, no fuss. Amazon always shows you the money, Goog Scholar does not.

In Google Scholar, the citations in the purportedly citing articles, books and conference papers cannot be looked up unless one has access privilege to them through their library’s subscription.  Citations received from a book must not  to be taken for granted. They can be looked up (if the book is in the Google Book Search program), but you need to make an additional search for the cited title in the book to show the citations – if for nothing more but corroboration. (Inclusion of results from GBS in Google Scholar is a new feature. After an in-depth testing of GBS I will do a review of it. Based on samples, the GSB scores and matches are much more credible than the ones offered by Google Scholar.



This is one of the most appealing and intuitive software I have ever used. It really leads the users by the hand through the myriads of functions it offers, and always provide the comfort of jumper links and signs to go to another function or to trace back one’s step after lured to follow some interesting, and often rewarding paths. It adds to Amazon’s appeal that as opposed to most library automation systems it is flexible. For example it does not require that you go to the Power Mode template to type  the ISBN in the ISBN cell, and drop or reproduce exactly the dashes in the identifier. No matter how you enter the ISBN 0395938473 for David Macaluay’s splendid book The New Way Things Work, they work with or without dashes in the right or the wrong place. (The leading zero, however, is still required. Well, no one is perfect ). 

The citedness scores are excellently traceable (a huge advantage over Google Scholar). This  establishes their credibility and provide the perfect path to explore the citations in both directions through  the cited and citing references. Exactly for credibility reason, Amazon should not inflate its citedness scores by double listing in the “Books that cited this book” list the hardcover and paperback  editions of the same work. It does not always do that (which adds to the confusion), and makes it appear as inconsistent. Of course, it is easier said than done, but Amazon has enough experience with books not to be fazed by such a task.

In the above case of the 576 reported citations, there are 46 duplicate entries, and one triplicate - so the correct number is 552. Not a big deal in this case, but such duplicates also occur (although rarely) when there are two citations from the same book.  The label with the claim “2 books that cite this book”, definitely puzzles  the users who see the same book listed twice showing, for example, one citation from the back cover, and another from the back flap.

On the other hand, when the details of the books are  shown, the citedness score should be shown consistently – irrespective of the particular edition. There were some examples when only the paperback edition showed the list of citing books but not the hardcover edition. These are intimidating tasks but not for Amazon which tackles similar problems well, for example when the hardcover edition has no cover image or foreword, but the paperback edition has.


Seven years ago I wrote a piece in my Internet Insights column for Information Today. I outlined what would I do “If I Were Jeff Bezos”. The essence of the column was that with some adaptation the Amazon software could very well replace at a fraction of the cost the variety of Online Public Access Catalogs, computerized acquisition and circulation systems used in all public and school  libraries (also in most college and special libraries).

Much of the functionalities are there, some just need to be renamed from buying to borrowing, from wish list to reserve list, etc. ALA could have coordinated the deal on behalf of libraries. As you know this did not happen. I wish today even more than seven years ago to see such state of the art software in every library. It would especially be a godsend for libraries in the developing countries which cannot even think about library automation.

At least on the OPAC front there is an excellent alternative, that many librarians  have discovered and recommend for patrons. Using the ISBN bookmarklet developed by Jon Udell, you can enjoy all the benefits of searching Amazon, and once you find what you are looking for but don’t want to buy it, you click on your bookmarklet which automatically extracts the ISBN and passes it on to your preferred library or libraries (such as your college library and your nearest public library through two clicks and bookmarklets).

This is the third time that I review Amazon, but I am not apologizing for it. For a librarian it is an especially precious open access reference source and  gift hunting tool. I bet even the employees  at Google and Bowker have been using Amazon when they need as much information as swiftly as possible about books at the best price and the best shipping costs (free for most orders above $25). Now there are additional reasons to do so and perhaps even the most committed Google Scholar bloggers and aficionados would recognize that when it comes to matching cited and citing references and calculating citedness scores not all the rays of sun shine from  behind GooglePlex. I am sure that Eugene Garfield is also delighted to see how well his idea of citation indexing published 50 years ago is implemented in a widely used open access resource. 

back to "Peter's Digital Reference Shelf" GaleNet