Title: PubMed Central  

Publisher: National Institutes of Health

Cost: free

Tested: continuously

UR: http://pubmedcentral.nih.gov

This outstanding open access full text database of articles from high impact factor medical and other life science journals is another jewel in the crown of the National Institutes of Health.



PubMed Central has been around for several years, and I should have reviewed it earlier. Others should have reviewed it too, but even cursory references to  this essential open access resource are still few and far between. I try to make up for the belatedness now.

There are only two other databases in the open access category which are in the same league as PMC. One is Biomed Central (BMC), the other is HighWire Press (HWP). The former is the digital host of about 160 journals (including the 60 titles of the BMC series, ranging from BMC Anesthesiology to BMC Women’s Health) .

BMC  (which I will review later) should get particularly strong acknowledgement as journals published by BMC make up a significant part (more than 60%) of the journal base of PMC. However, it must be emphasized that in terms of the  number of documents the proportion of the BMC journals in PMC is far-far lower (about 5%). The reason for these seemingly strange ratios  is that many of the BMC journals started publication only a few years or in some cases only a few months ago - therefore have very few articles. BMC still stands on its own as a peer for PMC with its several services, including a database with about 30,000 full text articles, most of which are open access - and available through PMC.

HighWire Press is the digital facilitator for more than 1,000 medical, technology, science, and social science journals. Of the 3.5 million article collection  it hosts,  more than a third are open access on HWP. There is no way to determine exactly the number of free full text life science articles, but I estimate it to be more than half a million - very comparable to the size of the PMC collection. Some journals are available both through HWP and PMC, although not always for the same time span. PMC usually has longer open acces retrospectivity than HWP, as is the case with the Journal of Clinical Investigation which has comprehensive coverage in PMC but not in HWP. The same is tru for The EMBO Journal.



PMC has a total of 654, 948 items.  649, 677 of them have the  free  full text of the documents, and 460,085 of them have abstract.  630,763 of the items are in pdf format. These include manuscripts of articles accepted for publication in a peer reviewed journal. They are  submitted to PMC primarily by  authors whose research is sponsored by the National Institutes of Health. Currently, there are only 3,256 such items, but ths number will certainly grow rapidly.

There are also supplemental materials supporting the articles usually in a separate “background” file not published  with the articles in print format. In a stricter sense, there are 588, 233 articles published in print and/or online. Most of the items are research articles but there are also short communications, correction notes, and letters to the editor.

Either way you look at the numbers, PMC is quite impressive by its size alone. While some of the materials were digitally deposited by cooperating publishers, the bulk of the content (406,359 items) were scanned by NIH with permission of the publishers, the copyright holders. That’s quite a homework when you consider that tens of thousands of pages were scanned from printed articles published in the late 1890s and early 1900s. For example, the Bulletin of the Medical Library Association started publication in 1911, so much of its content had to be scanned, OCR-ed, and converted to PDF format in order to have the entire run of this excellent journal. 

The exact number of journals depends on how you count the journals which changed title, such as the Journal of the Medical Library Association from Bulletin of the Medical Library Association (an understandable change), the Annals of General Psychiatry from Annals of General Hospital Psychiatry (changed after its 3rd volume), or CMAJ: Canadian Medical Association Journal which changed its title from Canadian Medical Journal 20 years ago (go figure the logic behind this change).

The number of journals  may not be that many (especially considering that the bulk of its source base is the mostly very new set of journals published by BMC), but there are two aspects which make PMC a very precious archive. One is that many of the journals are in their fields of specialization among the most respected, most cited, highest impact factor journals (I am not just piling up these qualifiers, these traits are related but not identical). Take as an example the American Journal of Human Genetics. It has an impact factor of 12.649, in the group of 124 journals within the Genetics & Heredity category of the most current edition of the ISI Journal Citation Reports. The median impact factor in this group is 2.626.

It is the 4th highest impact factor journal, among the 41 others with the word genetics in their journal title.  This is admittedly a simplified quick-and-dirty clustering methods but it avoids the distortion caused by the super-journals, like Science and Nature which appear not only in the multidisciplinary category but also in the Genetics & Heredity category).  

The journal is ranked 3rd in this cluster by the number of total citations received in the previous two years. It gets the #4th position  by the Immediacy Index in the group which indicates how "hot" is the journal (by measuring the ration between the number of citations received by the articles in the journals in the year they were published). In this case the index value close to 3, as the 193 articles published in 2005, were cited 571 times in 2005.

In the past 5 years the journal’s impact factor steadily rose, partly due to the fact that all of the articles in its  issues of the last seven years are available freely through PMC – except for the most current 6 months (more about this later). At the publisher sites only the abstracts are open access, not the full text articles.

The other important aspect in judging the value  of PMC is that very often the full text coverage goes back to the very first issue of the journal ever published, providing complete digital coverage.  Actually this is the case with 200 journals (plus with some others which changed title but continued the numerical designation of the volumes and issues).

The opposite end of the spectrum, how soon the articles become accessible in PMC, is also important. The good news is that the vast majority of the journals become freely accessible at the time publication. This is obvious, of course, in case of the open access Web journals, like all the ones in the BMC series. 13 journals have a 6-month, an other 13 have a 1 year moratorium, ten have 4 months, two have 3 months, one has 1 month, and Health Services and Research stands out with a 2 year delay.

Some of the most influential journals have the largest collection of articles. The absolute leader is the Proceedings of the National Academy of Sciences of the U.S.A. (PNAS). This  is of key importance for several reasons. It is a multidisciplinary journal, and among the 48 journals in that category it is ranked #3 by an impact factor of just above 10, right after Science and Nature . Right after may be a bit misleading, as the two journals have an impact factor which has been hovering around 30  for several years.

But then again, the  #4 journal by impact factor, Section A of the Philosophical Transactions of the Royal Society  (dedicated to practical issues in math, physics and engineering in spite of its title) is not right after PNAS with its impact factor of 2.224. PNAS is #2 by the total number of citations received, preceded by Nature and followed by Science (in this regard quite closely). Quite tellingly, the citedness scores of the 4th and 5th ranked journals are an order of magnitude lower than the citedness scores of Nature, PNAS and Science.

PNAS is far the most productive multidisciplinary periodical due to its fortnightly publication pattern (and scholarly popularity and reasonable price). This, along with the fact that it is covered from its very first issue in 1915  makes it precious to have 84,232 open access articles in PMC. For fairness, HighWire Press has 87,631 items from PNAS at the end of July, 2006. A simple test search for documents with the word genetic in the title found 1,349 records in HWP, and only 1,302 in PMC. A similar search for the word toxoplasma in the title found 25 items in PNAS through HWP, and 23 through PMC. In this case, it was obvious that two articles of the most current issue were not yet added to PMC. (The larger discrepancies  in the numbers may have to deal with how the corrections and author auxiliary materials are counted as my preliminary test  suggests, but it requires further testing)   

At any count, PNAS was the first journal in PMC, and remained far the largest content contributor. It was a very smart move by the proponents of PMC, as after PNAS, publishers of other top ranking journals may have felt much more inclined to deposit their archive fully or partially, with or without significant restrictions.   

In addition to PNAS the following ten journals have the largest collection of articles in PMC: Biochemical Journal (48,342), Journal of Bacteriology (43,034), BMJ (29,999), Journal of Virology (29,572), Nucleic Acids Research (27,922), Annals of Surgery (25,838), Infection and Immunity (24,882), Journal of Clinical Investigation (24,093), Plant Physiology (23,847), The Journal of Physiology (20,672) 

The publishers with the largest contribution of articles include Oxford University Press, Lippincott, Williams and Wilkins, the BMJ Publishing Group, the Nature Publishing Group, and a variety of scholarly associations and societies. Far the largest contribution comes from the American Society for Microbiology (ASM). It has 15 journals (11 of them with complete coverage from the first issue), with close to a total of 190,000 articles. Four of the ASM journals make the  articles available immediately when they are published, others have 4, 6 and 12 month delay. This is a highly commendable approach.



The PubMed software is used for searching the PMC database. It is designated as one of the databases on the pull-down menu. It may be a better option to choose the PubMed database and check the PubMed Central box to limit the search to the full text archive. This way, it is faster to run the search also in MEDLINE and in the old MEDLINE databases if the PMC search does not yield enough result. It is also  more convenient to use other limits, such as restricting the records to those that have abstract.

The software belongs to the increasingly rare species of information retrieval software (at least in the open access sphere) which offers a browsable index for practically all the data elements which can be searched, as well as for the limit criteria which are used to filter the result set.

Not all the documents are deposited within PMC. About 22,500 articles are not stored or not yet stored within PMC, but the software links to the free full text document and even indicates if and when the publisher will provide delayed deposit copy.

The most powerful features of the software are the ones which work behind the scene. For example, there are links to nearly 577,369 PubMed records for the full text PMC articles, and 541,224 links to  PubMed records for the references which appear in the bibliography of the full text articles. It would be very  interesting to see which are the most cited PubMed items even if this is limited to  articles in journals covered by PMC. Hopefully, it will be the next step in the development of the outstanding PubMed line within the family of the open access NIH products.    

PMC shows excellent examples after examples for the best way to use taxpayers’ money on worthy projects. I still can't get it out of my head what could have NIH do with the few cool millions that Congress earmarked for the fatally  mismanaged PubScience project of the Department of Energy a few years ago.


back to "Peter's Digital Reference Shelf" GaleNet