A blog on statistics, methods, philosophy of science, and open science. Understanding 20% of statistics will improve 80% of your inferences.

Sunday, December 13, 2015

Plotting Scopus article level citation data in R



The Royal Society has decided to publish journal citations distributions. This makes sense. The journal impact factor is a single number trying to summarize a distribution, but it’s almost always better to plot your data. Some have been hopeful that visualizing such distributions will make it clear what a troublesome statistic the journal impact factor is, and hope that other journals will also be open with their data.

I want to point out that all this data is readily available to anyone who has access to Scopus, and at the bottom of this post I’ll share the R code to create these plots yourself.

Go to Scopus, and search for any journal you’d like. Here, I’ll illustrate this process by a search for the journal Psychological Science, which has ISSN number 0956-7976. You can search for any range of years, but Scopus will only allow you to export 2000 cases at once. I limited this search to issues from 2010-2015.Due to copyright reasons, I cannot share the Scopus data I downloaded.


Then, select all results, and export ‘all available information’ as a .csv file, as illustrated in the animation below.


Now you have the data, plotting the citations is straightforward, and can be done with the code below (the plots in this blog posts look a bit different then the output in the code, but the data is the same). For example, here is the distribution of citations for Psychological Science for the years 2010-2015. The tail is so long, that I cut off the x-axis at 200 citations. Three (most notably, Simonsohn, Nelson, & Simmons, 2011, with 662 citations) papers are cited more than 200 times.



The data is clearly skewed, and obviously papers are cited more often, as the years go by. The differences between the means:

        Year    Mean
1       2010    34.551724
2       2011    25.329167
3       2012    18.460465
4       2013    12.055016
5       2014    6.176471
6       2015    1.814815

and medians:

        Year    Median
1       2010    25
2       2011    17
3       2012    15
4       2013    8
5       2014    4
6       2015    1

are obvious. You would probably exclude extreme outliers when analyzing your own data, but journals obviously like to keep them in because they boost the impact factor, even though they are not representative.

Feel free to play around with the script, and link to your plots in the comments below, or tweet them to me at @Lakens.




1 comment:

  1. There are schools that require an applicant to submit a Statement of Purpose instead of a PS, the thrust of which is to present an applicant's goals in terms of what major to pursue, what research direction to take, and other specific study and career plans, if any. See more college statement of purpose

    ReplyDelete