Comparing All-author and First-author Co-citation
Analyses of Information Science
Dangzhi Zhaoa*,
Andreas Strotmannb
Email: dzhao@ualberta.ca.
Email: andreas.strotmann@ualberta.ca.
Ever since its
introduction by White & Griffith (1981), author co-citation analysis (ACA)
has been a primary research tool for the study of the intellectual structure of
research fields, and of the social structure of the underlying communities of
researchers. Studies have since then usually applied to a variety of scholarly
fields the general steps and techniques of classic ACA as defined there,
although a few studies have proposed new techniques for mapping author clusters
(White, 2003b), different statistical methods for processing co-citation counts
(Ahlgren, Jarneving & Rousseau, 2003), or variations on the statistical
procedures used in ACA such as MDS (Leydesdorff & Vaugh, 2006). Few studies
have dealt with the problematic definition of co-citation counts themselves – a
much more fundamental aspect of ACA as it defines the raw data from which all
statistical author cocitation analyses and mappings are derived.
We recently
reported on a preliminary study that aimed to contribute to filling this gap
(Zhao, 2006). The present study was conducted as a follow-up verification study
that seeks to further contribute to understanding the role of different
co-citation counts in ACA.
ACA defines two
authors as co-cited when at least one document from each author’s oeuvre occurs
in the same reference list, and their co-citation count as the number of
different publications that co-cite them in this sense. Classic ACA defines
author’s oeuvre as all the works with the author as the first author (McCain,
1990), and thus only uses information on the first author of a publication when
calculating author co-citation counts.
This definition
stems to a considerable degree from the relative ease with which these first-author
cocitation counts can be retrieved directly from the main data source for
ACA studies – the citation indexes of the Institute for Scientific Information
(ISI) (now Thomson Scientific) because these citation indexes only index the
first authors in cited references. The only way to find the full set of authors
of a cited document in these citation indexes is by matching the reference code
to a corresponding source paper – an operation that is laborious and
error-prone, and only possible if the cited reference refers to a document that
also happens to have been indexed as a source paper (citing paper). As a
result, it has been practically impossible to go beyond first-author
co-citation counting using these databases.
Rousseau &
Zuccala (2004) explored different ways of defining author cocitation counts and
strategies for retrieving these counts from the ISI databases, but did not use
them in actual ACA studies. A brief study by Persson (2001) is the only study
we know of that attempts to compare first-author and all-author co-citation
analysis using pure ISI data – all others, including our own, have had to rely
on other sources of citation data.
Persson (2001)
retrieved all the 7001 articles in the LIS field that were indexed in one of
the ISI databases between 1986 and 1996, and calculated how these articles
co-cited each other. It observed similar subfield structures derived by
first- or all-author co-citation analysis, and noticed that many well-known
authors on the all-author co-citation map were excluded from the map based on
first-author co-citation counts. Unfortunately, more than 90% of cited
references in that dataset refer to documents outside it, whose authors
the study therefore disregards, which severely limits the reliability of the
results of his courageous first attempt. In addition, this brief communication
provided little detail on how co-citation counts were defined and calculated.
Recently,
alternatives to the ISI citation indexes have become available that provide
better support for collecting co-citation data that allows us to perform true all-author
co-citation analysis studies, i.e., studies that rely on a dataset that
includes all authors of all cited references.
Zhao (2006) used
one of these new databases, namely CiteSeer,
for this purpose, and conducted a preliminary study that compared first-author
and all-author[1] co-citation analyses of
the XML research field, a subfield of computer science, the research field
covered by that citation index. It was found that, at least in the XML research
field, the all-author ACA resulted in a clearer picture of the intellectual
structure than the classic first-author ACA did, at the price of identifying
fewer specialties if the same number of most highly cited authors was selected.
Two possible
definitions were identified in Zhao (2006) that extend first-author to
all-author co-citation counts, and results of the corresponding ACAs were
compared.
(a)
Inclusive
all-author co-citation derives from classic first-author co-citation by
simply redefining an author’s oeuvre as everything that lists the author as one of its authors rather than as
its first author, and by retaining the definition of co-citedness
between two authors as the number of papers that reference both authors'
oeuvres. This definition therefore includes cited co-authorship in
co-citation, because two authors are also considered co-cited when a paper they
co-authored is cited.
(b)
Exclusive
all-author co-citation derives from inclusive all-author co-citation by excluding
pure cited co-authorship and re-defining co-citedness between two authors as
the number of papers that contain distinct references to both authors'
oeuvres, i.e., references to these authors' oeuvres that do not both refer to
the same document co-authored by these two authors.
Note that
inclusive and exclusive author co-citation counts are identical when calculated
for distinct authors who both only ever wrote single-authored papers.
Same-author co-citation counts, however, will differ – inclusive counts would
result in an author's number of citations, while exclusive counts strictly
count the number of times that that author is co-cited with him- or herself.
The latter therefore provides a meaningful diagonal even for a first-author
co-citation matrix. Zhao (2006) found in favor of exclusive co-citation counts
when studying intellectual structures, although inclusive co-citation analysis
showed some promise for studying social relationships using ACA.
Schneider,
Larsen, & Ingwersen (2007) built on Zhao (2006), and further explored
inclusive all-author co-citation analysis using a large citation index
generated from a XML version of the IEEE Computer Society journals for the
years 1995-2004. They found that all-author-based ACA can help produce MDS maps
that better fit the underlying data, and may lead to stronger concentration in
the maps.
Eom (2008) used
a hand-made citation database that covered 692 citing papers in the decision
support systems area during 1971-1990, in an attempt to compare first-author
and all-author-based ACA results. Instead of controlling for the effect of the
total number of authors included in these two types of ACA, which, experience
tells us, tends to be correlated significantly with the number of specialties
identified as a result of an ACA, Eom's study used the same absolute citedness
threshold for both types of ACA, resulting, of course, in a significantly
smaller number of authors in his first-author ACA study than in his all-author
ACA study. By failing to control for such a significant variable, his study was
unfortunately not able to actually show the differences resulting from
different author co-citation counting methods.
The present
study revisits the question of first- vs. all-author co-citation analysis and
of different types of all-author co-citation counting in a different, less
collaborative, field (i.e., Information Science) than those targeted by
previous studies, using yet another citation index to collect data, namely,
Elsevier’s Scopus. The research fields studied previously, namely subfields of Computer
science and engineering, have a considerably higher level of collaboration as
measured by the average number of coauthors in their literature than the IS
field studied here. Unlike our previous preliminary study, which only counted
up to five authors of a cited reference, and possibly like some of the other
previous studies reviewed above, the present study strictly counts every author
of every cited reference in order to eliminate any potential problems with
previous simplified approaches. Based upon more than 3,000 citing papers, the
scale of the present study is also considerably larger than the preliminary
study and most of the previous studies. In addition, we test in practice a way
of treating diagonal values of author co-citation matrices in ACA that has been
identified as theoretically optimal but has not been empirically studied.
Specifically,
the research questions addressed in this study are
a)
Does all-author-based ACA produce a clearer
picture of the intellectual structure of the IS field with higher concentration?
b)
What are the differences between different ways
of calculating author co-citation counts in terms of the intellectual structure
of the IS field revealed?
Addressing these
research questions will contribute to a better understanding of ACA in general,
and to all-author-based ACA in particular. The latter is particularly important
because all-author-based ACA, on the one hand, is theoretically optimal, and on
the other hand, can be expected to be necessary to do justice to researchers in
the highly collaborative “big sciences”, which are of particular interest to
science and technology policy makers.
The research
area we analyze in the present study is Information Science (IS). As in White
& McCain (1998), we define the IS research field by its 12 core journals
listed in Table 1 (White & McCain,
1998, p. 330). Our citation window is a ten-year period (i.e., 1996 –
2005) following that studied in White & McCain (1998).
We use
Elsevier’s Scopus database to retrieve citing papers in the IS field thus
defined, along with their reference lists. At the time of data collection,
Scopus indexed journal articles published in 1996 or later, and thus was able
to provide us with the intended dataset very well. It is similar to the ISI citation
indexes in that it indexes cited references of the (citing) papers it covers,
but it also provides, directly, more information on cited references than the
ISI citation indexes do, including their full titles and the names of up to
eight authors (the first seven and the last author) in each cited reference.
Journal by
journal, we retrieved this information for all papers published in the 12 IS
journals during 1996-2005 that are indexed in Scopus as “articles”, and
exported them in “RIS format” as “Full Documents” to a local computer. This
way, we collected 3828 records of citing papers that have references. These
papers included 110,785 references altogether, i.e., 29 references per citing
paper on average.
For the few
cited references that had more than 8 authors, we manually completed the author
lists, by searching for these papers as citing papers in a number of data
sources. This is only feasible in a research field like IS where large-group
collaboration is uncommon, of course, but it does eliminate completely any
potential side effects of disregarding any authors at all in cited references.
In more intensely collaborative research fields, the limitations imposed by
Scopus may either be entirely acceptable to those who try to study it using ACA,
or it may be necessary to develop additional methodology to match cited
references to metadata in the same or other bibliographical databases that do
include all their authors.
We developed
software to parse the downloaded and hand-completed records, and to store the
resulting data fields such as author names, publishing sources, and years of
publication of both source papers and cited references in a structure
(essentially, a pair of tables, one for citing papers, one for cited
references) that simplified the subsequent data analysis procedures, e.g.,
counting citations and co-citations.
We noticed that
the number of papers retrieved from Scopus is slightly smaller than that from
the ISI databases across the 12 journals. For example, in the journal Information Processing & Management
we retrieved 479 and 506 papers, respectively, from Scopus and from the ISI
databases (through Web of Science); for JASIST, we retrieved 976 and 1022,
respectively, from these sources. There was no immediately discernible pattern
as to which articles are indexed in one but not in another, but the differences
appeared minor. In addition, a comparison between the classic
first-author-based ACA results using Scopus on the one hand and Web of Science
data on the other, as represented in Figure 1 in this paper and Figure 2 in
Zhao & Strotmann (2008) respectively, indicates that the specialty
structures revealed from these two data sources are very similar. We are therefore
confident that using Scopus for ACA studies can provide us with a view of the
intellectual structure of the IS field that matches closely results that the
ISI databases might have provided.
We conducted
ACAs using factor analysis based on first-author, inclusive all-author, and
exclusive all-author co-citation counts, as perceived by the authors of these
3828 publications as citers.
We followed
commonly accepted steps and techniques of ACA (McCain, 1990; White &
McCain, 1998; Zhao, 2003) except for the different definitions of co-citation
as discussed earlier. Core sets of
authors were selected based on “citedness” – the number of citations they
received. Two sets of highly visible authors were thus selected using two
different citation counting methods – first-author counts and complete counts.
Simply put, when a paper with N authors is cited, with first-author counts,
only the number of citations of the first author of this paper increases by 1,
and with complete counts, full credit is given to all authors of the paper,
i.e. the number of citations of each of its N authors increases by 1.
There are no
strict rules regarding thresholds for citation-based author selection in author
co-citation analysis studies (McCain, 1990). Assuming that the more authors the
better a research field is represented, the present study used an arbitrary
number of 165 as the number of authors to be included in the final factor
analyses, a number that is larger than the 120 considered in White & McCain
(1998) as adequate for the purpose of this type of studies.
Software was
developed to count co-citation frequencies by the three methods discussed
above, and to record them in three separate matrices. These co-citation
matrices were then used as input to the Factor Analysis (FA) procedure in SPSS.
As usual, the diagonal values in these matrices were deleted from the input
files to the FA routine in SPSS, and were treated as missing values and
replaced by the mean in SPSS.
In addition, we
performed a successful experiment with more meaningful diagonal values of the
co-citation matrix in the case of exclusive all-author co-citation counts. In
this case, instead of treating the diagonal as missing values, its values were
determined as follows: the diagonal value for author A (i.e., author A’s
auto-co-citation count) increases by 1 when at
least two different articles with author A as one of the authors
appear in the same reference list. This definition is consistent with the
off-diagonal values because, according to the definition of exclusive
all-author co-citation counts, the co-citation count of author A and author B
increases by 1 when a citing article’s reference list contains at least one
article with author A as one of the authors and at least one additional article with author B as one
of the authors. Otherwise, authors A and B would just be co-authors of a single
cited paper, which per our definition does not count. In fact, in our computer
program, the code for counting exclusive all-author co-citations is identical
for all cells of the matrix, including the diagonal. This way of treating
diagonal values of cocitation matrices in ACA is ideal in theory (Ahlgren,
Jarneving & Rousseau, 2003; White, 2003a), but has not before been
empirically studied as to whether it is indeed better than other ways of
dealing with diagonal values in the study of the intellectual structure of
research fields.
As usual, we
used the resulting exclusive all-author co-citation matrix with meaningful
diagonal values as input to the FA routine in SPSS. As shown below, this matrix
of co-citation counts produces good results.
In all cases,
factors were extracted by Principal Component Analysis (PCA) with an oblique
rotation (SPSS Direct OBLIMIN). An oblique rotation was chosen because it is
often more appropriate than orthogonal rotations when it can be expected
theoretically that the resulting factors (in this case, specialties) would in
reality be correlated (Hair et al., 1998). The number of factors extracted was
determined based on Kaiser’s rule of eigenvalue greater than 1 because the
resulting model fit was good in all four cases as represented by total variance
explained, communalities, and correlation residuals (Hair et al., 1998). The
factor models produced this way are shown in Table 2 along with their model
fits. For example, a factor analysis of the first-author cocitation matrix
resulted in a 14-factor model which explains 84% of the total variance, and the
differences between observed and implied correlations were smaller than 0.05
for the most part (almost 100%). The communalities ranged from 0.60 to 0.94,
only 6 (or 4%) of which were below 0.7 and 28 (or 17%) of which below 0.8.
The numbers in
Table 2 indicate that exclusive all-author-based ACA appears to produce
statistically more significant results in terms of the amount of variance explained
by factor models extracted according to identical criteria, especially when
meaningful diagonal values are used for the factor analysis.
Recently a
novel way of visualizing factor analysis results in ACA was introduced in Zhao
& Strotmann (2008) that provides very informative two-dimensional visual
maps to aid the interpretation of results, and that allows us to report results
of multiple ACA studies in a single paper as we do here. To prepare these maps,
the factor labels were assigned, as usual, upon examining the frequently cited
articles written by authors in each factor that load highly on them. In these
graphs, authors are represented by square nodes, and factors are represented by
circular nodes. Factor and author nodes are connected by lines if the author
loads sufficiently on the factor (i.e., 0.3 or higher in this study). The width
of such a line is proportional to the value of the author's loading on this
factor, as is its gray-scale value. Node sizes are accumulated from loadings,
and are intended to increase with the importance of a node in the factor
structure map.
The layout of
these maps is an automatically generated Kamada-Kawai graph layout using
loadings as similarity measures between author and factor nodes, produced by
Pajek (Batagelj & Mryar, 2007). The result is an intellectual structure map
of the field that is visually informative and true to the factorization it
represents, but it is also quite different from the MDS maps with author clusters
from cluster analysis in a traditional ACA.
We will first
discuss the intellectual structure of the Information Science field based on
the factor analysis results from classic first-author-based ACA (Figure 1).
Then, we will examine how this structure compares with the intellectual
structures shown from other types of ACA, i.e., results from an exclusive
all-author co-citation analysis with meaningful diagonal values (Figure 2),
those from an inclusive all-author co-citation analysis (Figure 3), and those
from an exclusive all-author co-citation analysis with missing diagonal values
(Figure 4). All these maps are visualizations of the pattern matrix of the
corresponding factor analysis with an oblique rotation.
A factor
analysis of the first-author co-citation matrix of the 165 most highly-cited
authors in the IS field in the years 1996-2005 identifies 14 specialties
(Figure 1). The four specialties that were most active during this time period
are Experimental retrieval, Information behavior, Scientometrics/citation
analysis, and Webometrics. More than half (56%) of the authors analyzed belong
to these four specialties. The smaller specialties identified are OPACs and
online retrieval, Bibliometrics, Science communication, Co-citation mapping,
E-resources organization and retrieval, IS theories, Relevance, Interface
design, Knowledge Management, and Research methods.
Comparing with
the situation of the field 10 years earlier as described in White & McCain
(1998) using equivalent methodology, we can see some trends in the development
of the IS field. Both the Experimental retrieval specialty and the
Scientometrics/citation analysis specialty remain active research areas.
Research on information behavior or user theory has grown dramatically, and
Webometrics has emerged as a new, active and distinct specialty of study.
The
Bibliometrics and Science communication specialties remain largely unchanged.
OPACs and Online retrieval have merged into a single specialty which has been
joined by authors who study Web searching, indicating perhaps the trend of
integrating OPACs and bibliographic databases into library Web portals. By
contrast, research on Co-citation mapping has been separated out from the
general Scientometrics/citation analysis specialty as a new research focus that
applies co-citation analyses to the mapping of science.
It appears that
the Imported ideas group of White & McCain (1998) has split into two
groups: IS theories (e.g.,
In addition to
Webometrics, two other new specialties have emerged: Relevance and Knowledge
management. The Knowledge management specialty, although small, is quite clear,
reflecting the research reality in the IS field quite well. Another small group
composed of Budd, Cohen and McClure is quite vague, but appears to be about
Research methods.
It appears that
the ACA results reflect the development of the IS field over the past 10 years quite
well. It would be interesting to examine how individual authors have changed
their specialty concentrations. We will leave it to readers who are familiar
with the authors on the map to see how well the ACA results match their
perceptions in this regard, as our focus here is the comparison between
different types of ACA.
13 specialties
are identified from a factor analysis of an exclusive all-author co-citation
matrix of 165 authors with meaningful diagonal values (Figure 2). The four
specialties identified in this analysis that were most influential in the past
ten years are the same as those from first-author-based ACA as shown in Figure
1. In this analysis, about 60% of the authors analyzed represent these four
specialties. Most of the smaller specialties identified are also the same as in
the first-author co-citation analysis, including the Co-citation mapping,
Interface design, Science communication, Bibliometrics, Relevance, and Knowledge
Management specialties.
Here in the
results from exclusive all-author-based ACA, only one of the five authors who
primarily loaded on the IS theories specialty in the results from
first-author-based ACA made the list of top 165 authors (i.e., Buckland), and
many authors in the Bibliometrics specialty did not show up either (e.g., Zipf,
Bradford, Lotka, Burrell). As a result, these two groups merged into a single
group led by Brookes who had done significant research in both areas. It
appears that this group of authors represent foundations and history of IS,
especially of its quantitative aspects. We thus tentatively label it as “IS
foundations”.
This and the disappearance
of the small and vague group on Research methods identified by the first-author-based
ACA indicate that first-author-based ACA may better represent the theoretical
and methodological aspects of the field. It is possible that this is due in
part to the use of full citation counts for ranking highly-cited authors in the
field. This method disadvantages authors who publish alone, as theorists are
wont to do. It remains to be seen in later studies if fractional citation count
rankings would retain these fields.
The focus of
the OPAC/online/Web searching specialty seen in the results from
first-author-based ACA has shifted from OPACs/online retrieval systems to Web
search engines in the results from exclusive all-author ACA, resulting in the
distinct Web searching specialty. This suggests that first-author ACA
represents older research better whereas all-author ACA reflects current
studies more clearly.
An examination
of the highly cited publications by authors in the Web searching shows that
several authors in this specialty co-authored papers, such as Wolfram, Jansen,
Spink, Bateman, and Saracevic. A similar observation applies to the
Evaluation/Policy specialty where McClure and Hernon co-authored highly cited
works. It therefore appears that all-author-based ACA can pick out some tightly
connected research groups or projects.
The results
from an inclusive all-author-based ACA as shown in Figure 3 are very similar to
those from exclusive all-author-based ACA with meaningful diagonal values as
shown in Figure 2, except that 12 rather than 13 specialties are identified and
that the Information behavior specialty is much smaller after several authors
moved to a specialty that is a mix of authors from the Evaluation/Policy group,
the Knowledge management specialty, and Science Communication in Figure 2. The resulting
mix does not make much sense to us at first glance. An examination of their
highly cited works suggests that this group of authors appear to have been
cited as relevant to the study of Knowledge management in a wider sense, such
as Davenport and Rogers’ works on innovation, Hernon and McClure’s studies on
evaluation of library services, Savolainen’s everyday life information seeking,
Choo’s environment scanning, Taylor’s information use environment, Garvey’s
science communication, and Strauss’ grounded theory. Readers who have deeper
knowledge of these authors’ research areas may make better sense of it. We thus
tentatively label this group Knowledge management and list its authors here:
Davenport.T, Hermon.P, Savolainen.R, Rogers.E, McClure.C, Choo.C, Taylor.R,
Garvey.W, Line.M, Dervin.B, and Strauss.A.
The results
from an exclusive all-author-based ACA with missing diagonal values (Figure 4)
are very similar to those from inclusive all-author-based ACA shown in Figure
3, except that 11 rather than 12 specialties are identified, with the
Information behavior specialty splitting into two groups which merge into the Relevance
specialty (BrooksH, CoolC, JarvelinK, VakkariP, OddyR, KuhlthauC, BelkinN) and
into the Web searching specialty (AllenB, EllisD, BatesM), respectively. The
group tentatively labeled Knowledge management appears here again as a mix of
authors from several of the specialties identified in the results from
exclusive all-author-based ACA with meaningful diagonal values: Knowledge
management (Davenport.T, Rogers.E, BrownJ), Evaluation/Policy (Hermon.P,
McClure.C), Information behavior (Savolainen.R, Choo.C, Taylor.R, Dervin.B,
WilsonT), and Science communication (Garvey.W).
Comparing
results from the two types of exclusive all-author-based ACA, that with
meaningful diagonal values (Figure 2) vs. that without diagonal values (Figure
4), we find the factor analysis results become less informative as information
is discarded with the deletion of diagonal values in the co-citation matrix.
The Evaluation/Policy group merged into the Knowledge management specialty; the
Information behavior specialty in Figure 2 disappears into three other
specialties: Relevance, Web searching, and Knowledge management. As most of
these authors load low (less than 0.5) on the factors they merge into, they do
not add much information to the picture. Deleting the diagonal values and
letting SPSS replace them with the mean therefore appears to cause a noticeable
loss of information from the co-citation matrix.
The major
specialty structure of the Information Science field produced by author
cocitation analysis is largely the same whether it is first-author- or
all-author-based, confirming findings from Persson (2001)’s simplified study.
This is due most likely to the relatively low number of co-authored papers in
this field, which generally makes for relatively small differences between first-
and all-author co-citation counts. Nevertheless, even in this field with a low
level of collaboration, there are noticeable differences in results from
different types of ACAs. First-author ACA appears to highlight the theoretical
and methodological aspects of the field, while all-author ACA results appear to
be clearer when it comes to representations of
current trends (e.g., Web searching), and appear to pick out highly
collaborative research groups.
In our
preliminary all-author ACA study of the XML field, we noticed that all-author
ACA tended to result in a smaller number of specialties than first-author ACA
when the same number of most highly cited authors was selected for analysis.
Although this is confirmed here for the IS field, the strength of this effect
depends heavily on the particular choice of all-author ACA methodology:
compared to the 14 IS specialties identified through a classic first-author ACA
with missing diagonal values in the co-citation matrix, exclusive all-author
ACA yielded mere 11 in the case of missing diagonal values and 13 when
meaningful diagonal values were used, while inclusive all-author ACA identified
12 specialties.
All in all,
classic first-author ACA results in a picture of the intellectual structure of
the IS research field that is of similar structure and comparable clarity to
that found by exclusive all-author co-citation analysis with meaningful
diagonal values. In the case of inclusive all-author co-citation analysis and
exclusive all-author ACA with missing diagonal values, on the other hand, it is
first-author ACA that appears to produce a clearer picture, unless we correctly
identified the common interest of a somewhat mixed group of cited authors that
we tentatively labeled Knowledge management.
Given the low
level of collaborative research and publishing in the IS field (less than two
citing authors on average), especially with respect to the cited references
that we actually analyze in ACA (close to 1.5 cited authors per reference on
average), we have retrospectively and independently validated the use of
first-author instead of all-author ACA in fields like IS, in that it appears to
produce quite similar results to an ACA that fully takes into account all
authors of all referenced articles. This assumes that, in general, it is
preferable to include all available information in a statistical
analysis rather than biasing it by arbitrarily ignoring some of the data, and
that therefore, in our case, the results from an all-author co-citation
analysis that uses meaningful values in the diagonal of the co-citation count
matrix are as close to the “true” intellectual structure of a field as an ACA
can get. Indeed, Rousseau & Zuccala (2004) identified all-author ACA with
the meaningful diagonal values we used here as theoretically optimal.
Conversely, the
close correspondence between the results of the classic first-author ACA and
the all-author ACA that uses an exclusive all-author co-citation matrix with
meaningful diagonal values provides evidence that the latter is an excellent
candidate for valid co-citation analyses that do take into account all authors
of all cited papers. This is because we verified that, in a situation where
similar results are to be expected from both, this alternative method works
just as well as the classic first-author ACA whose validity has been amply
demonstrated in a range of research fields like IS.
In the XML
field that we investigated in our preliminary study, we find about three
authors per citing paper on average, and about 2.5 authors on average per cited
reference. Unlike in the IS field, the average number of cited authors per
reference for the XML field is thus significantly larger than 1.0 (the
corresponding value for first-author ACA), so that we can expect to see a much
larger difference (by a factor of three, in fact[2])
between all-author and first-author analyses in the XML field than in the IS
field, and that is indeed what we found. In the XML field, we found that even a
“lesser” type of all-author ACA tended to produce a clearer picture of the
intellectual structure, and that it is therefore important to count all authors
of cited papers in co-citation analysis studies. In this paper, we find that we
need to amend this statement slightly: the higher the level of collaborative
research is in a field, the more important it is to find ways to include the
names of all authors of all referenced papers in an all-author ACA. In order to
further test (and strengthen) this statement, a true highly collaborative
research field should be chosen.
This underscores
suspicions voiced by White (2004), that it may not be entirely straightforward
to extend first-author ACA methodology to areas where collaborations are the norm
rather than the exception, although he appears to be more concerned about the
problem of reliably studying individual author citation identities when
practically all their papers are multi-authored. Since “big sciences” of this
sort are of particular interest to social scientists and science and technology
policy researchers, however, we suspect that it is very important to develop a
better understanding of all-author citation analysis methodologies – and this
paper hopefully contributed to this.
Further
research will need to be done, however, in order to clarify findings from our
two studies and those of others that are beginning to appear. In particular, it
is important to test if, in an increasingly collaborative academic environment,
citation and co-citation analyses do indeed need to upgrade from first-author
to all-author citation counting methods if they are to realize the goal of
measuring realistic scholarly communication indicators.
In this study,
we find from several ACA analyses that the information technology revolution in
general, and the World Wide Web in particular, have had a significant impact on
the field, witness the emergence of the specialty of Webometrics, the shift of
traditional OPAC research towards Web searching, and the budding-off from
Scientometrics of the compute- and Web-resource-intensive Co-citation mapping
specialty.
Many studies on
various ways of allocating credit among co-authors have shown that counting all
authors can result in very different author rankings by number of citations or
publications compared to counting just first authors. Results from the present
study indicate that counting all authors in ACA studies, however, may result in
very similar specialty structures compared to classic ACA that only counts
first authors, provided the level of collaboration in the field studied
is sufficiently low.
Nevertheless,
all-author- and first-author-based ACAs do appear to pick out a small number of
different research foci while maintaining the same major specialty structure,
e.g., theoretical and methodological background vs. recent trends.
All-author-based ACA also tends to be sensitive to collaborative research
projects and groups. Thus, a complete view of the intellectual structure of a
research field may require both types of ACA.
The present
study also confirms a theoretical prediction (Rousseau & Zuccala, 2004;
White 2003a) that, for exclusive all-author co-citation count matrices,
computing meaningful diagonal values produces a statistically more meaningful
picture of the intellectual structure of a research field than treating the
diagonal as missing values. We suspect that this is true because information in
the matrix of co-citation counts is retained and not thrown out by replacing
diagonal values with statistically generated values (e.g., the mean or the average
of the three highest off-diagonal values).
Further studies
are suggested in order to test these findings more thoroughly. A research area
in which large-group collaboration is commonplace such as in the biomedical
field should be a good choice for this purpose.
This study was
funded in part by the Social Sciences and Humanities Research Council (SSHRC)
of
Ahlgren, P., Jarneving,
B., & Rousseau, R. (2003). Requirements for a cocitation similarity
measure, with special reference to Pearson’s correlation coefficient. Journal of the American Society for
Information Science, 54, 550-560
Batagelj, V., & Mrvar,
A. (2007). Pajek: Program for analysis
and visualization of large networks. Retrieved on
Eom, S. (2008). All author
cocitation analysis and first author cocitation analysis: A comparative
empirical investigation. Journal of
Informetrics, 2, 53-64.
Hair, J.F. Anderson, R.E.,
Tatham, R.L., & Black, W.C. (1998). Multivariate
Data Analysis (5th edition).
McCain, K. W. (1990).
Mapping authors in intellectual space: a technical overview. Journal of the American Society for
Information Science, 41, 433-443.
Persson, O. (2001). All
author citations versus first author citations. Scientometrics, 50 (2): 339-344
Rousseau, R., &
Zuccala, A. (2004). A classification of author co-citations: definitions and
search strategies. Journal of the
American Society for Information Science and Technology, 55(6), 513-629
Schneider, J.W., Larssen,
B., & Ingwersen, P. (2007). Comparative study between first and all-author
co-citation analysis based on citation indexes generated from XML data. Proceedings of the 11th
International Conference of the International Society for Scientometrics and
Informetrics (pp. 696-707),
White, H.D. (2004).
Reward, persuasion, and the Sokal Hoax: A study in citation identities. Scientometrics, 60 (1): 93-120.
White, H.D. (2003a).
Author cocitation analysis and Pearson's r. Journal
of the American Society for Information Science and Technology, 54,
1250-1259.
White, H. D. (2003b).
Pathfinder networks and author cocitation analysis: a remapping of paradigmatic
information scientists. Journal of the
American Society for Information Science, 54, 423-434.
White, H. D. &
Griffith, B.C. (1981). Author cocitation: A literature measure of intellectual
structure. Journal of the American
Society for Information Science, 32, 163-171.
White, H. D. & McCain,
K.W. (1998). Visualizing a discipline: An author co-citation analysis of
information science, 1972-1995. Journal
of the American Society for Information Science, 49, 327-355.
Zhao, D. (2003). A comparative citation analysis study of
Web-based and print journal-based scholarly communication in the XML research
field. Dissertation,
Zhao, D. (2006). Towards
all-author co-citation analysis. Information
Processing & Management, 42, 1578-1591.
Zhao, D. & Strotmann,
A. (2008). Information science in the first decade of the Web: An enriched
author co-citation analysis. Accepted for publication in Journal of the American Society for Information Science and Technology
Information science |
Library automation |
|
|
* Taken from White & McCain (1998, p. 330) with only minor updates
Input co-citation matrix |
Factor model |
Total variance explained |
% |nonredundant residuals| > 0.05 |
Communalities |
||
range |
< 0.7 |
< 0.8 |
||||
first-author |
14-factor |
84% |
0% |
0.60 – 0.94 |
6 (4%) |
28 (17%) |
exclusive all-author with meaningful diagonal values |
13-factor |
91% |
0% |
0.68 – 0.99 |
1 (0.6%) |
7 (4%) |
inclusive all-author |
12-factor |
84% |
0% |
0.55 – 0.95 |
5 (3%) |
38 (23%) |
exclusive all-author with diagonal values deleted |
11-factor |
87% |
0% |
0.58 – 0.95 |
3 (2%) |
14 (8%) |
*
Corresponding author. Tel. 1-780-4922824. Fax. 1-780-4922430
[1] Actually, that study limited itself to up to five authors, but
only a small fraction of cited references in that dataset had more than five
authors.
[2] (2.5 – 1.0) = 3 * (1.5 – 1.0) : the difference in average
co-authorship levels between all- and first-author citation counting is three
times higher in XML than in IS.