Title: Windows Live Academic

Publisher: Microsoft

URL: http://academic.live.com

Tested: April 14-18, 2006

Cost: Free

This beta version of a free indexing/abstracting service to purportedly 6 million (but in reality only 4 million) records about the journal articles and conference papers in the subscription-based digital archives  of some academic publishers is an effort which is  too little, too late from an industry leader. While Windows Live Academic makes it easy to find the bibliographic records for some scholarly papers on a topic in computer science, electric engineering and physics, it is nothing new, not evolutionary, let revolutionary. Its software fails to provide appropriate options for even the most elementary search functions, such as searching by journal names and publication year, and can't even do well many operations that it promises to perform.  .

 

THE CONTEXT

Indexing/abstracting (I/A) print publications and databases have been the staple of every libraries to provide information about the literature. Their size range from 15,000 to 17,000,000 records with prices  from $500 to $5,000 per year. Seemingly, it is good news that there is now a new I/A database, Windows Live Academic (WLA) with purportedly 6 million records about journal articles and conference papers primarily in physics, electrical  engineering and computer science (and related fields, such as information science) for free, often with links to the full text of articles and conference papers (mostly accessible to members of subscribing libraries). In reality, this is not a big deal  in 2006, especially from Microsoft, for several reasons.

There are many large, open access I/A databases by the government in several disciplines (medicine, agriculture, education, criminal justice, transportation ). Some of these have better software and many have better content, such as large integrated or linked open access full-text collections  than their subscription-based counterparts).

More importantly from the perspective of WLA, there are huge  open access databases in physics, computer science, economics, library and information science, and –to a lesser extent- in engineering - often with substantial open access full text collections. In addition, the free Scirus database of Elsevier, has been offering for years a multidisciplinary I/A service for journal archives, repositories, and (less successfully, for purportedly scientific information of individuals' websites) , which is far larger and far better in finding open access bibliographic records (with a higher ratio of open access abstracts) on every subject than WLA. In Scirus, there are more records in the ScienceDirect subset alone than in WLA, and the total number of records for journal articles alone is 25 million.

 

THE CONTENT

As for the content of WLA, I looked at the subject scope, size of the database, the record content provided, the publishers which offered access to their archives, as well as the number and type of journals and conference proceedings covered.

 

Subject scope

Microsoft deserves credit for admitting that in the initial beta version of WLA the fields of physics, electric engineering and computer science are covered. Actually, the scope of coverage is broader. You will find records for publications in the field of medicine, nursing, life sciences, psychology, sociology, economy, women studies, and  in a variety of fields within arts and humanities. This is obvious even if you only look up the list of journals covered, whether you glance at the beginning, the middle or the end of the journal list.

Sample searches confirm the multidisciplinary coverage. You will find more than 500 records for articles and conference papers about toxoplasmosis, more than 7,000 about multiple sclerosis,  nearly 40,000 for the word psychology, 300,000 for the word education, and more than 10,000 for the word honesty. Some of these may be in the context of physics, engineering and education, but the vast majority are not from those fields.

 

Database size

WLA claims to have about 6 million records. There is no fool-proof way in WLA to determine the exact number of records (as is possible in most of the professional databases), but it seems that Microsoft increased the size of WLA by 50% - in its announcement.

My tests searches  using the most common words in the  full text of the WLA records clearly indicate that the actual number of records is more likely to be below 4 million. If I extrapolate the number of duplicate records I found in my first tests, the total number of unique records is even less.

There are many  duplicates (as there are also in Scirus and Google Scholar). They are not easy to spot because they are scattered in the result list (which is strange for allegedly relevance-ranked records).  They may not seem obviously duplicates because of incomplete data and wrong data in the record-pair, such as the omission of one author in the second record, and the  wrong  publication year in the first record in this pair which are shown juxtaposed by only because I did an exact known item search.

Looking at the source information of these two records,  one can’t help to get concerned, how could the software extract the wrong year of the first record, ignore the second author in the other record and extract only his surname alone. These represent  a problem in Scirus and Google Scholar, but not in the CiteSeer database whose crawlers  do the best job in every regard.

 

Record Content

The records include the usual bibliographic information, chronological-numerical designations of the source documents, and very laudably the Digital Object Identifier of the articles and conference papers (when available). 

It is deeply disappointing that the indexing software apparently can’t reliably determine the availability of abstracts. Seeing very often the false claim that the "abstract is not available", is alarming, knowing, for example, that the majority of articles in the Journal of the American Society for Information Science & Technology do have abstracts, and they are clearly labeled as such in the source.  Once again, as you could see in a previous screenshot of the  CiteSeer database, it correctly recognizes, collects and identifies the abstracts.

 

Source coverage

Microsoft claims to have collected in the fields of computer science, electrical engineering, and physics the "more than 6 million records from approximately 4300 journals and 2000 conferences".

Microsoft does not specifically mention the number of publishers whose archives it crawled for collecting data, but the page about publishers, journals and conferences at http://academic.live.com/journals has about 120 entries in the publisher section. However, these include a lot of weird combinations of publisher names. True, there are publications which are jointly published by two or more publishers, such as the volumes of the Joint Conference on Digital Libraries, a cooperation between ACM and IEEE, or by a commercial publisher on behalf of a scholarly society, such as the Journal of Digital Information, which originated from the British Computer Society and Oxford University Press.

True, there are journals which were published first, say, by Elsevier, then sold to Kluwer, such as Scientometrics.  For articles in such journals and conference proceedings the joint listing would be understandable, but none of the above mentioned sources are covered by WLA.

What I am referring to is the non-sense pairing of publishers, such as the one for Science magazine. In more than 10,225 records the publisher field includes the Nature Publishing Group (NPG)  and the American Association for the Advancement of Science (AAAS) as the publishers. Science is published by AAAS. NPG has nothing to do  with it. These two archrival publishers of the most cited journals (Nature and Science, respectively), form a really odd couple as presented by WLA.

 

In the source list there are more than 4,300 journals identified, but this number is also grossly exaggerated. There hundreds of identical journals appearing twice in slightly different spelling, such as Scandinavian Journal of Medicine and Science in Sports and Scandinavian Journal of Medicine & Science in Sports or Planning Theory and Practice versus Planning Theory & Practice; in British versus American spelling such as Paediatric Anaesthesia versus Pediatric Anesthesia which are not that apparent duplicates. There are duplicates for reasons of typographical errors, such as this journal whose misspelled variant is automatically corrected by Word. The combination of the above errors and inconsistencies makes some journals appear three times, four times or even five times in the journal list. The variety is well demonstrated by these nicely juxtaposed journal names.

There are also journals listed which are not covered at all.

It is as much interesting and discouraging to see how many high impact factor, influential journals are not included in WLA. For an obvious example: Key research journals of IBM are entirely ignored. It adds insult to injury that the all the articles in the 45 volumes of IBM Systems Journals, and in the 50 volumes of IBM Journal of Research & Development are offered by IBM in full text format.

As for the conference proceedings, indeed there are more than 2,000 listed. Once again, the numbers can be easily misunderstood. Counting each of yearly the proceeding of the 3rd, 4th, 5th of a conference, inflates the number of sources, and is akin to counting the volumes of journals.   

 

THE SOFTWARE

This is the home turf of Microsoft, but it does not show in WLA. Essential search features are missing from the repertoire of the WLA software. I could not find truncation operation. There is no way to refine a search by limiting it  to a publication year or year range.

There is no good way to search by journal for two reasons. One is that there is no option to search for the word(s) or exact name of a journal title. You may use the journal name as a search criteria, but the result will include every item which matches the search term(s) anywhere in the record. This is especially frustrating for journals whose name is a single word, like Science, or although consists of multiple words it is still not distinctive enough, such as Evidence-based Cardiovascular Medicine.

There is no way to make a distinction in searching for a journal as the source journal, and you would get many hits where your journal is the cited journal. For example, searching for items published the Annual Review of Information Science and Technology (ARIST), the currently #1 periodical publication in the field of Library and Information Science, you get a list of 235 hits. However, about 200 of them aren’t records for chapters published in ARIST, but for articles citing a chapter published in ARIST. You have to scroll through 80 hits before the first record appears for an ARIST chapter in the result list. The items in the hit list are not numbered.

If you believe that this can be improved by sorting the result list by journal name, give up your hopes. The sort puts to the top of the list the records for articles published in the Journal of the American Society for Information Science and Technology, an obviously incorrect sort procedure.

In addition, it is also to be noted, how few ARIST chapters there are records for in WLA. For perspective, Web of Science has 484 records for chapters published in ARIST. The reason for this enormous difference is that the publisher has the digital versions of only 10% of this high impact periodical that could be collected and indexed by Microsoft, whereas ISI has been indexing all the 40 volumes of ARIST.

There is no citedness score listed yet for the items retrieved, but Microsoft promises to work on it. It is good that the results can be downloaded

The output is a continuous stream, which appeal to some reviewers. I find it distracting as you have no idea after a time where you are in the result list. More frustratingly, if you click on an item to get to the source document, then you return to the result list, you will be positioned to the top of the result list, not where you left it. Finding your jump-off point item and again is annoying.

You may choose from three different output formats (short, medium and full) – using a slider. This is a gimmick, and using three small clickable icons would be as good, if not better. There is a side panel to show the complete record parallel to the list. Its most serious deficiency is that it does not show the abstract, even if there is an abstracts. It claims that abstract is not available. Someone should have caught this glitch, but apparently everyone was working on the gimmicky gizmos, and no one paid enough attention to what is displayed.

You may limit your search to the title field, but it doesn’t work consistently. For example, searching for “odd couple” in the title field finds no matching record. Searching for the same term without the title field restriction retrieves 29 records. The first dozen items have the exact term  "odd couple" in the title field. Similarly, the query in title:”medical informatics” finds only 9 records. None of them have the phrase in the title. Four of them have the database name BioMed Central in the title field instead of the actual title of the article.

The much-touted side panel also offers the options to display the record in BibTex and Endnote format. The latter is an insult. When the software can identify and retrieve the abstract, it includes the first 50 or so character of it. It does not include the title, the journal name, the authors, the DOI. This is the record in the default mode, and this is in the EndNote format. This is a useless feature that Microsoft should hide instead of bragging about it.  

A good feature is the use of the Digital Object Identifier which links the user to the most authentic version, the one posted by the publisher. The full text is available if it is an open access document, or if your library subscribes to the journal and qualifies for access to the specific issue.

 

Windows Live Academic is a deeply disappointing product and service even for a beta release. There may have been more time and effort spent on PR-materials (which many journalists happily gobble up and parrot in their journals and newspapers) than on testing WLA for functionality. The sloppiness and incompetence of the programming work is appalling, and undermines the reputation of good Microsoft products. The propaganda material is as accurate as statements of spoke persons of malfunctioning government agencies. If this is what Microsoft is capable of doing in 2006, the company is in big trouble. 

 

back to "Peter's Digital Reference Shelf" GaleNet