|
Title:
PubMed Central Publisher:
National Institutes of Health Cost:
free Tested:
continuously |
|
This
outstanding open access full text database of articles from high impact
factor medical and other life science journals is another jewel in the
crown of the National Institutes of Health. THE
CONTEXT PubMed
Central has been around for several years, and I should have reviewed it
earlier. Others should have reviewed it too, but even cursory references
to this essential open access
resource are still few and far between. I try to make up for the
belatedness now. There
are only two other databases in the open access category which are in the
same league as PMC. One is Biomed Central (BMC), the other is HighWire
Press (HWP). The former is the digital host of about 160 journals
(including the 60 titles of the BMC series, ranging from BMC
Anesthesiology to BMC Women’s Health) . BMC
(which I will review later) should get particularly strong
acknowledgement as journals published by BMC make up a significant part
(more than 60%) of the journal base of PMC. However, it must be emphasized
that in terms of the number
of documents the proportion of the BMC journals in PMC is far-far lower
(about 5%). The reason for these seemingly strange ratios
is that many of the BMC journals started publication only a few
years or in some cases only a few months ago - therefore have very few
articles. BMC still stands on its own as a peer for PMC with its several
services, including a database with about 30,000 full text articles, most
of which are open access - and available through PMC. HighWire
Press is the digital facilitator for more than 1,000 medical, technology,
science, and social science journals. Of the 3.5 million article
collection it hosts,
more than a third are open access on HWP. There is no way to
determine exactly the number of free full text life science articles, but
I estimate it to be more than half a million - very comparable to the
size of the PMC collection. Some journals are available both through HWP
and PMC, although not always for the same time span. PMC usually has
longer open acces retrospectivity than HWP, as is the case with the Journal
of Clinical Investigation which has comprehensive coverage in PMC but
not in HWP. The same is tru for The EMBO Journal. THE
CONTENT PMC
has a total of 654, 948 items.
649, 677 of them have the free
full text of the documents,
and 460,085 of them have abstract.
630,763 of the items are in pdf format. These include
manuscripts of articles accepted for publication in a peer reviewed
journal. They are submitted to PMC primarily by
authors whose research is sponsored by the National Institutes of
Health. Currently, there are only 3,256 such items, but ths number will
certainly grow rapidly. There
are also supplemental materials supporting the articles usually in a
separate “background” file not published
with the articles in print format. In a stricter sense, there are
588, 233 articles published in print and/or online.
Most of the items are research articles but there are also short
communications, correction notes, and letters to the editor. Either
way you look at the numbers, PMC is quite impressive by its size alone.
While some of the materials were digitally deposited by cooperating
publishers, the bulk of the content (406,359 items) were scanned by NIH
with permission of the publishers, the copyright holders. That’s quite a
homework when you consider that tens of thousands of pages were scanned
from printed articles published in the late 1890s and early 1900s. For
example, the Bulletin of the Medical Library Association started
publication in 1911, so much of its content had to be scanned, OCR-ed, and
converted to PDF format in order to have the entire run of this excellent
journal. The
exact number of journals depends on how you count the journals which
changed title, such as the Journal of the Medical Library Association
from Bulletin of the Medical Library Association (an understandable
change), the Annals of General Psychiatry from Annals of General
Hospital Psychiatry (changed after its 3rd volume), or CMAJ:
Canadian Medical Association Journal which changed its title from Canadian
Medical Journal 20 years ago (go figure the logic behind this change).
The
number of journals may not be
that many (especially considering that the bulk of its source base is the
mostly very new set of journals published by BMC), but there are two
aspects which make PMC a very precious archive. One is that many of the
journals are in their fields of specialization among the most respected,
most cited, highest impact factor journals (I am not just piling up these
qualifiers, these traits are related but not identical). Take as an
example the American Journal of Human Genetics. It has an
impact factor of 12.649, in the group of 124 journals within the Genetics
& Heredity category of the
most current edition of the ISI Journal Citation Reports. The median
impact factor in this group
is 2.626. It
is the 4th highest impact
factor journal, among the 41 others with the word genetics in their
journal title. This is
admittedly a simplified quick-and-dirty clustering methods but it avoids
the distortion caused by the super-journals, like Science and Nature
which appear not only in the multidisciplinary category but also in
the Genetics & Heredity category).
The
journal is ranked 3rd in this cluster by the number of total
citations received in the previous two years. It gets the #4th
position by the Immediacy
Index in the group which indicates how "hot" is the journal (by
measuring the ration between the number of citations received by the
articles in the journals in the year they were published). In this case
the index value close to 3, as
the 193 articles published in 2005, were cited 571 times in 2005. In
the past 5 years the journal’s impact
factor steadily rose, partly due to the fact that all of the articles
in its issues of the last seven years are available freely through
PMC – except for the most current 6 months (more about this later). At
the publisher sites only the abstracts are open access, not the full text
articles. The
other important aspect in judging the value
of PMC is that very often the full text coverage goes back to the
very first issue of the journal ever published, providing complete digital
coverage. Actually this is
the case with 200 journals (plus with some others which changed title but
continued the numerical designation of the volumes and issues). The
opposite end of the spectrum, how soon the articles become accessible in
PMC, is also important. The good news is that the vast majority of the
journals become freely accessible at the time publication. This is
obvious, of course, in case of the open access Web journals, like all the
ones in the BMC series. 13 journals have a 6-month, an other 13 have a 1
year moratorium, ten have 4 months, two have 3 months, one has 1 month,
and Health Services and Research stands out with a 2 year delay. Some
of the most influential journals have the largest collection of articles.
The absolute leader is the Proceedings of the National Academy of
Sciences of the U.S.A. (PNAS). This
is of key importance for several reasons. It is a multidisciplinary
journal, and among the 48 journals in that category it is ranked #3 by an impact
factor of just above 10, right after Science and Nature
. Right after may be a bit misleading, as the two journals have an impact
factor which has been hovering around 30
for several years. But
then again, the #4 journal by
impact factor, Section A of the Philosophical Transactions of the Royal
Society (dedicated to
practical issues in math, physics and engineering in spite of its title)
is not right after PNAS with its impact factor of 2.224. PNAS is #2 by the
total number of citations received, preceded by Nature
and followed by Science (in this regard quite closely). Quite
tellingly, the citedness scores of the 4th and 5th
ranked journals are an order of magnitude lower than the citedness scores
of Nature, PNAS and Science. PNAS
is far the most productive multidisciplinary periodical due to its
fortnightly publication pattern (and scholarly popularity and reasonable
price). This, along with the fact that it is covered from its very first
issue in 1915 makes it
precious to have 84,232 open access articles in PMC. For fairness,
HighWire Press has 87,631 items
from PNAS at the end of July, 2006. A simple test search for documents
with the word genetic in the title found 1,349 records in HWP, and only
1,302 in PMC. A similar search for the word toxoplasma in the title found
25 items in PNAS through HWP, and 23 through PMC. In this case, it was
obvious that two articles of the most current issue were not yet added to
PMC. (The larger discrepancies in
the numbers may have to deal with how the corrections and author auxiliary
materials are counted as my preliminary
test suggests, but it requires further testing)
At
any count, PNAS was the first journal in PMC, and remained far the largest
content contributor. It was a very smart move by the proponents of PMC, as
after PNAS, publishers of other top ranking journals may have felt much
more inclined to deposit their archive fully or partially, with or without
significant restrictions.
In
addition to PNAS the following ten journals have the largest collection of
articles in PMC: Biochemical Journal (48,342), Journal of
Bacteriology (43,034), BMJ (29,999), Journal of Virology
(29,572), Nucleic Acids Research (27,922), Annals of Surgery
(25,838), Infection and Immunity (24,882), Journal of Clinical
Investigation (24,093), Plant Physiology (23,847), The
Journal of Physiology (20,672) The
publishers with the largest contribution of articles include Oxford
University Press, Lippincott, Williams and Wilkins, the BMJ Publishing
Group, the Nature Publishing Group, and a variety of scholarly
associations and societies. Far the largest contribution comes from the
American Society for Microbiology (ASM). It has 15 journals (11 of them
with complete coverage from the first issue), with close to a total of
190,000 articles. Four of the ASM journals make the
articles available immediately when they are published, others have
4, 6 and 12 month delay. This is a highly commendable approach. THE
SOFTWARE The
PubMed software is used for searching the PMC database. It is designated
as one of the databases on the pull-down
menu. It may be a better option to choose the PubMed database and
check the PubMed Central box
to limit the search to the full text archive. This way, it is faster to
run the search also in MEDLINE and in the old MEDLINE databases if the PMC
search does not yield enough result. It is also
more convenient to use other limits, such as restricting the
records to those that have abstract. The
software belongs to the increasingly rare species of information retrieval
software (at least in the open access sphere) which offers a browsable
index for practically all the data elements which can be searched,
as well as for the limit criteria which are used to filter the result
set. Not
all the documents are deposited within PMC. About 22,500 articles are not
stored or not yet stored within PMC, but the software links to the free full
text document and even indicates if and when the publisher will
provide delayed deposit copy. The
most powerful features of the software are the ones which work behind the
scene. For example, there are links to nearly 577,369 PubMed records for
the full text PMC articles, and 541,224 links to
PubMed records for the references which appear in the bibliography
of the full text articles. It would be very
interesting to see which are the most cited PubMed items even if
this is limited to articles
in journals covered by PMC. Hopefully, it will be the next step in the
development of the outstanding PubMed line within the family of the open
access NIH products.
PMC shows excellent examples after examples for the best way to use taxpayers’ money on worthy projects. I still can't get it out of my head what could have NIH do with the few cool millions that Congress earmarked for the fatally mismanaged PubScience project of the Department of Energy a few years ago.
|
back to "Peter's Digital Reference Shelf" GaleNet