Copyright Cameron Neylon 2020-2021. This slide deck is licensed under a Creative Commons Attribution 4.0 International License. Code is available under an Apache v2.0 license at Github.
"Our goal is to change the stories that universities tell about themselves, placing open knowledge at the heart of that narrative"
Data is derived from the following sources
Data is integrated and processed via Observatory Platform, an open source workflow system developed within COKI to integrate data related to scholarly communications. The code is available on Github including the template for SQL queries which generates the OA status we use from the Unpaywall data. More detail on the OA categories is also provided on the slide below.
For this analysis, publisher is defined by the text string in the Crossref metadata, affiliation is derived from the assignment by Microsoft Academic and field is the Level Zero field "Chemistry" from Microsoft Academic as provided in the data dump used. Some comparative data for "Materials Science" and "Biology" is presented in additional slides.
The following is derived from the template SQL query used to define the OA categories in Observatory Platform. For the most up to date information view the file on Github.
This template query contains the SQL that directly interprets Unpaywall data to determine OA categories at the output level. This is therefore the canonical location for the precise definitions used for OA categories. Ideally this file should contain both the queries themselves and a description clear enough for a non-expert in SQL to understand how each category is defined.
The current categories of Open Access described in this file are:
Levels of open access publishing are as provided by Unpaywall for all outputs with RSC and ACS as publisher in Crossref metadata. In the Observatory Platform terminology this is "gold" open access and includes all articles published in either fully open access (as defined by indexing in DOAJ) and in mixed journals (i.e. "hybrid")
As defined in the SQL used to generate these categories.
Gold Open Access:
gold OA is defined as either the journal being in DOAJ or the best_oa_location being a publisher and a license being detected. This works because Unpaywall will set the publisher as the best oa location if it identifies an accessible publisher copy.
CASE WHEN journal_is_in_doaj OR (best_oa_location.host_type = "publisher" AND best_oa_location.license is not null AND not journal_is_in_doaj) THEN TRUE ELSE FALSE END as gold,
There are significant shifts in national patterns that can be associated with changes in funder policy and with the offerings of RSC and ACS
RSC took a significant lead in early open access provision for chemistry, particularly in the UK but has fallen back
National averages don’t tell the full picture. Specific institutions show very different and quite specific patterns. There are differential policy effects
Recent changes are strongly driven by read and publish agreements with substantial shifts in publisher choice corresponding to introduction of deals.
There is evidence of concentration of publishing in chemistry with two large publishers taking up an increasing percentage. Should we be concerned about diversity?
Centre for Culture and Technology
Curtin Institute for Computation
Publisher percentage of chemistry is calculated based on the total count of publications assigned in Microsoft Academic to the Level 0 field of "chemistry" as the denominator with the total count of publications in Crossref where the publisher name field corresponds to the most common variant of the publisher name (Royal Society of Chemistry (RSC) and American Chemical Society (ACS) respectively).
Strictly these are not percentages and could in theory go to greater than 100%. They are also sensitive in magnitude to changes in the Microsoft field assignment. However it is a like for like comparison across the two publishers.
An analysis by journal would also be interesting but currently the metadata for journal identification is less reliable with journal names and choice of ISSN provided in Crossref metadata changing from year to year for both publishers. Future analysis deploying ISSN-L could address this.