|
Title: Windows Live Academic Publisher:
Microsoft Tested:
April 14-18, 2006 Cost: Free |
|
This
beta version of a free indexing/abstracting service to purportedly 6
million (but in reality only 4 million) records about the journal articles
and conference papers in the subscription-based digital archives
of some academic publishers is an effort which is
too little, too late from an industry leader. While Windows Live
Academic makes it easy to find the bibliographic records for some
scholarly papers on a topic in computer science, electric engineering and
physics, it is nothing new, not evolutionary, let revolutionary. Its
software fails to provide appropriate options for even the most elementary
search functions, such as searching by journal names and publication year,
and can't even do well many operations that it promises to perform.
. THE
CONTEXT Indexing/abstracting
(I/A) print publications and databases have been the staple of every
libraries to provide information about the literature. Their size range
from 15,000 to 17,000,000 records with prices
from $500 to $5,000 per year. Seemingly, it is good news that there
is now a new I/A database, Windows Live Academic (WLA) with purportedly 6
million records about journal articles and conference papers primarily in
physics, electrical engineering
and computer science (and related fields, such as information science) for
free, often with links to the full text of articles and conference papers
(mostly accessible to members of subscribing libraries). In reality, this
is not a big deal in 2006,
especially from Microsoft, for several reasons. There
are many large, open access I/A databases by the government in several
disciplines (medicine, agriculture, education, criminal justice,
transportation ). Some of these have better software and many have better
content, such as large integrated or linked open access full-text
collections than their
subscription-based counterparts). More
importantly from the perspective of WLA, there are huge
open access databases in physics, computer science, economics,
library and information science, and –to a lesser extent- in engineering
- often with substantial open access full text collections. In addition,
the free Scirus database of Elsevier, has been offering for years a
multidisciplinary I/A service for journal archives, repositories, and
(less successfully, for purportedly scientific information of
individuals' websites) , which is far larger and far better in finding
open access bibliographic records (with a higher ratio of open access
abstracts) on every subject than WLA. In Scirus, there are more records in
the ScienceDirect subset alone than in WLA, and the total number of
records for journal articles alone is 25 million. THE
CONTENT As
for the content of WLA, I looked at the subject scope, size of the
database, the record content provided, the publishers which offered access
to their archives, as well as the number and type of journals and
conference proceedings covered. Subject
scope Microsoft
deserves credit for admitting that in the initial beta version of WLA the
fields of physics, electric engineering and computer science are covered.
Actually, the scope of coverage is broader. You will find records for
publications in the field of medicine, nursing, life sciences, psychology,
sociology, economy, women studies, and
in a variety of fields within arts and humanities. This is obvious
even if you only look up the list of journals covered, whether you glance at
the beginning, the middle or
the end of the journal list. Sample
searches confirm the multidisciplinary coverage. You will find more than
500 records for articles and conference papers about toxoplasmosis, more
than 7,000 about multiple sclerosis,
nearly 40,000 for the word psychology, 300,000 for the word
education, and more than 10,000 for the word honesty. Some of these may be
in the context of physics, engineering and education, but the vast
majority are not from those fields. Database
size WLA
claims to have about 6 million records. There is no fool-proof way in WLA
to determine the exact number of records (as is possible in most of the
professional databases), but it seems that Microsoft increased the size of
WLA by 50% - in its announcement. My
tests searches using the most
common words in the full text
of the WLA records clearly indicate that the actual number of records is
more likely to be below 4 million. If I extrapolate the number of
duplicate records I found in my first tests, the total number of unique
records is even less. There
are many duplicates (as there
are also in Scirus and Google Scholar). They are not easy to spot because
they are scattered in the result list (which is strange for allegedly
relevance-ranked records). They
may not seem obviously duplicates because of incomplete data and wrong
data in the record-pair, such as the omission of one author in the second
record, and the wrong publication
year in the first record in
this pair which are shown juxtaposed by only because I did an exact
known item search. Looking
at the source information of these two records,
one can’t help to get concerned, how could the software extract
the wrong year of the first
record, ignore the second author in the other record and extract only his
surname alone. These represent a
problem in Scirus and Google Scholar, but not in the CiteSeer database
whose crawlers do
the best job in every regard. Record
Content The
records include the usual bibliographic information,
chronological-numerical designations of the source documents, and very
laudably the Digital Object Identifier of the articles and conference
papers (when available). It
is deeply disappointing that the indexing software apparently can’t
reliably determine the availability of abstracts. Seeing very often the
false claim that the "abstract is not available", is alarming,
knowing, for example, that the majority of articles in the Journal of
the American Society for Information Science & Technology do have
abstracts, and they are clearly labeled as such in the source.
Once again, as you could see in a previous screenshot of the
CiteSeer database, it correctly recognizes, collects and identifies
the abstracts. Source
coverage Microsoft
claims to have collected in the fields of computer science, electrical
engineering, and physics the "more than 6 million records from
approximately 4300 journals and 2000 conferences". Microsoft
does not specifically mention the number of publishers whose archives it
crawled for collecting data, but the page about publishers, journals and
conferences at http://academic.live.com/journals
has about 120 entries in the publisher section. However, these include a
lot of weird combinations of publisher names. True, there are publications
which are jointly published by two or more publishers, such as the volumes
of the Joint Conference on Digital Libraries, a cooperation between ACM
and IEEE, or by a commercial publisher on behalf of a scholarly society,
such as the Journal of Digital Information, which originated from
the British Computer Society and Oxford University Press. True,
there are journals which were published first, say, by Elsevier, then sold
to Kluwer, such as Scientometrics.
For articles in such journals and conference proceedings the joint
listing would be understandable, but none of the above mentioned sources
are covered by WLA. What
I am referring to is the non-sense pairing of publishers, such as the one
for Science magazine. In more than 10,225 records the publisher
field includes the Nature Publishing Group (NPG)
and the American Association for the Advancement of Science (AAAS)
as the publishers. Science is published by AAAS. NPG has nothing to
do with it. These two
archrival publishers of the most cited journals (Nature and Science,
respectively), form a really odd
couple as presented by WLA. In
the source list there are more than 4,300 journals identified, but this
number is also grossly exaggerated. There hundreds of identical journals
appearing twice in slightly different spelling, such as Scandinavian
Journal of Medicine and Science in Sports and Scandinavian Journal
of Medicine & Science in Sports or Planning Theory and Practice
versus Planning Theory & Practice; in British versus American
spelling such as Paediatric Anaesthesia versus Pediatric
Anesthesia which are not that apparent duplicates. There are
duplicates for reasons of typographical errors, such as this journal whose
misspelled variant is automatically corrected
by Word. The combination of the above errors and inconsistencies makes
some journals appear three times,
four times or even five times in
the journal list. The variety is well demonstrated by these nicely
juxtaposed journal names. There
are also journals listed which are not covered at all. It
is as much interesting and discouraging to see how many high impact
factor, influential journals are not included in WLA. For an obvious
example: Key research journals of
IBM are entirely ignored. It adds insult to injury that the all the
articles in the 45 volumes of IBM Systems Journals, and in the 50
volumes of IBM Journal of Research & Development are offered by
IBM in full text format. As
for the conference proceedings, indeed there are more than 2,000 listed.
Once again, the numbers can be easily misunderstood. Counting each of
yearly the proceeding of the 3rd, 4th, 5th
of a conference, inflates the number of sources, and is akin to counting
the volumes of journals. THE
SOFTWARE This
is the home turf of Microsoft, but it does not show in WLA. Essential
search features are missing from the repertoire of the WLA software. I
could not find truncation operation. There is no way to refine a search by
limiting it to a publication year or year range. There
is no good way to search by journal for two reasons. One is that there is
no option to search for the word(s) or exact name of a journal title. You
may use the journal name as a search criteria, but the result will include
every item which matches the search term(s) anywhere in the record. This
is especially frustrating for journals whose name is a single word, like
Science, or although consists of multiple words it is still not
distinctive enough, such as Evidence-based Cardiovascular Medicine. There
is no way to make a distinction in searching for a journal as the source
journal, and you would get many hits where your journal is the cited
journal. For example, searching for items published the Annual Review
of Information Science and Technology (ARIST), the currently #1
periodical publication in the field of Library and Information Science,
you get a list of 235 hits. However, about 200 of them aren’t records
for chapters published in ARIST, but for articles citing a chapter
published in ARIST. You have to scroll through 80 hits before the first
record appears for an ARIST chapter in the result list. The items in the
hit list are not numbered. If
you believe that this can be improved by sorting the result list by
journal name, give up your hopes. The sort puts to the top of the list the
records for articles published in the Journal of the American Society
for Information Science and Technology, an obviously incorrect sort
procedure. In
addition, it is also to be noted, how few ARIST chapters there are records
for in WLA. For perspective, Web of Science has 484 records for chapters
published in ARIST. The reason for this enormous difference is that the
publisher has the digital versions of only 10% of this high impact
periodical that could be collected and indexed by Microsoft, whereas ISI
has been indexing all the 40 volumes of ARIST. There
is no citedness score listed yet for the items retrieved, but Microsoft
promises to work on it. It is good that the results can be downloaded The
output is a continuous stream, which appeal to some reviewers. I find it
distracting as you have no idea after a time where you are in the result
list. More frustratingly, if
you click on an item to get to the source document, then you return to the
result list, you will be positioned to the top of the result list, not
where you left it. Finding your jump-off point item and again is annoying. You
may choose from three different output formats (short, medium and full)
– using a slider. This is a gimmick, and using three small clickable
icons would be as good, if not better. There is a side panel to show the
complete record parallel to the list. Its most serious deficiency is that
it does not show the abstract, even if there is an abstracts. It claims
that abstract is not available. Someone should have caught this glitch,
but apparently everyone was working on the gimmicky gizmos, and no one
paid enough attention to what is displayed. You
may limit your search to the title field, but it doesn’t work
consistently. For example, searching for “odd couple” in the title
field finds no matching record.
Searching for the same term without the title field restriction retrieves 29
records. The first dozen items have the exact term
"odd couple" in the title field. Similarly, the query in title:”medical
informatics” finds only 9 records. None of them have the phrase in the
title. Four of them have the database name BioMed Central in the title field
instead of the actual title of the article. The
much-touted side panel also offers the options to display the record in
BibTex and Endnote format. The latter is an insult. When the software can
identify and retrieve the abstract, it includes the first 50 or so
character of it. It does not include the title, the journal name, the
authors, the DOI. This is the record in the default mode,
and this is in the EndNote format.
This is a useless feature that Microsoft should hide instead of bragging
about it. A
good feature is the use of the Digital Object Identifier which links the
user to the most authentic version, the one posted by the publisher. The
full text is available if it is an open access document, or if your
library subscribes to the journal and qualifies for access to the specific
issue. Windows
Live Academic is a deeply disappointing product and service even for a
beta release. There may have been more time and effort spent on
PR-materials (which many journalists happily gobble up and parrot in their
journals and newspapers) than on testing WLA for functionality. The
sloppiness and incompetence of the programming work is appalling, and
undermines the reputation of good Microsoft products. The propaganda
material is as accurate as statements of spoke persons of malfunctioning
government agencies. If this is what Microsoft is capable of doing in
2006, the company is in big trouble.
|
back to "Peter's Digital Reference Shelf" GaleNet