This repository contains a Python-based analysis to assess journal coverage overlap between JSTOR collections and other library collections, based on subscription metadata.
The input data represents journal subscriptions over time, where each row corresponds to a journal × period × collection triple.
Using journal identifiers (print and online ISSNs), the analysis examines whether the same journal-period appears in other JSTOR collections or in non-JSTOR collections, with full, partial, or complementary temporal coverage.
Special attention is given to journals that would be lost if JSTOR access were reduced or cancelled.
A TSV file containing all journal subscriptions held by the library. Each row must represent a single subscription period in a specific collection.
Required columns:
| Column name | Description |
|---|---|
oclc_collection_name |
Name of the collection (used to identify JSTOR and sub-collections) |
publication_title |
Journal title (may vary slightly across collections) |
print_identifier |
Print ISSN (used for matching) |
online_identifier |
Online ISSN (used for matching) |
date_first_issue_online |
Start of coverage period |
date_last_issue_online |
End of coverage period (missing = assumed ongoing) |
Notes:
- Journals are matched using both identifiers, with fallback logic if identifiers are swapped across sources.
- Only records marked as "fulltext" are used for the analysis.
- Missing end dates are treated as coverage through 2026.
For each journal-period pair, matches are classified as:
- Full: the comparison period completely covers the source period
- Partial: periods overlap, but coverage is incomplete
- Complementary: no overlap, but periods are adjacent or disjoint
Overlap percentages (share of years covered) are also calculated.
For each JSTOR journal-period pair (excluding Books):
- Check for matches in all other collections
- Flag whether it has:
- any match
- at least one full match
- at least one partial match
- at least one complementary match
- Count total matches by overlap type
Analyzes overlap between JSTOR sub-collections themselves.
For each pair of JSTOR collections:
- Counts how many unique journal-periods co-occur
- Distinguishes full, partial, and complementary coverage
For each JSTOR collection:
- Identifies journals that also appear in non-JSTOR collections
- Classifies overlap type
Creates a fully expanded match table intended for interactive use in Power BI.
Users can select a specific journal-period in a JSTOR collection and see:
- All other collections where it appears
- Periods covered
- Overlap type and percentage
Identifies two mutually exclusive categories:
- No matches: journals not present in any other collection
- JSTOR matches only: journals present in JSTOR collections but not outside JSTOR
For at-risk journals, the analysis can be enriched with:
- Number of VU publications (last 10 years) from PURE as a proxy for local usage
- SJR indicator as a proxy for journal importance/prestige
Matching is performed using multiple ISSN columns with priority rules.
When run as a script, the analysis produces a single Excel workbook:
journal_overlap_analysis_results.xlsx
with the following sheets:
| Sheet name | Description |
|---|---|
Overall |
JSTOR journal-periods with co-occurrence flags and counts |
JSTOR_InterCollection |
Overlap between JSTOR sub-collections |
JSTOR_vs_NonJSTOR |
Overlap between JSTOR and non-JSTOR collections |
Drilldown |
Detailed journal-period match table |
Unique_with_VU_Pubs |
At-risk journals enriched with VU publications and SJR |
(or) Unique |
At-risk journals without enrichment |