Journal Overlap Analysis for JSTOR Collections

This repository contains a Python-based analysis to assess journal coverage overlap between JSTOR collections and other library collections, based on subscription metadata.

The input data represents journal subscriptions over time, where each row corresponds to a journal × period × collection triple.

Using journal identifiers (print and online ISSNs), the analysis examines whether the same journal-period appears in other JSTOR collections or in non-JSTOR collections, with full, partial, or complementary temporal coverage.

Special attention is given to journals that would be lost if JSTOR access were reduced or cancelled.

Input Data

A TSV file containing all journal subscriptions held by the library. Each row must represent a single subscription period in a specific collection.

Required columns:

Column name	Description
`oclc_collection_name`	Name of the collection (used to identify JSTOR and sub-collections)
`publication_title`	Journal title (may vary slightly across collections)
`print_identifier`	Print ISSN (used for matching)
`online_identifier`	Online ISSN (used for matching)
`date_first_issue_online`	Start of coverage period
`date_last_issue_online`	End of coverage period (missing = assumed ongoing)

Notes:

Journals are matched using both identifiers, with fallback logic if identifiers are swapped across sources.
Only records marked as "fulltext" are used for the analysis.
Missing end dates are treated as coverage through 2026.

Methodology

Period overlap classification

For each journal-period pair, matches are classified as:

Full: the comparison period completely covers the source period
Partial: periods overlap, but coverage is incomplete
Complementary: no overlap, but periods are adjacent or disjoint

Overlap percentages (share of years covered) are also calculated.

Analysis

1. Overall co-occurrences

For each JSTOR journal-period pair (excluding Books):

Check for matches in all other collections
Flag whether it has:
- any match
- at least one full match
- at least one partial match
- at least one complementary match
Count total matches by overlap type

2. JSTOR inter-collection overlap

Analyzes overlap between JSTOR sub-collections themselves.

For each pair of JSTOR collections:

Counts how many unique journal-periods co-occur
Distinguishes full, partial, and complementary coverage

3. JSTOR vs non-JSTOR overlap

For each JSTOR collection:

Identifies journals that also appear in non-JSTOR collections
Classifies overlap type

4. Detailed drill-down table

Creates a fully expanded match table intended for interactive use in Power BI.

Users can select a specific journal-period in a JSTOR collection and see:

All other collections where it appears
Periods covered
Overlap type and percentage

5. Unique and at-risk journals

Identifies two mutually exclusive categories:

No matches: journals not present in any other collection
JSTOR matches only: journals present in JSTOR collections but not outside JSTOR

6. Enrichment: VU publications and SJR

For at-risk journals, the analysis can be enriched with:

Number of VU publications (last 10 years) from PURE as a proxy for local usage
SJR indicator as a proxy for journal importance/prestige

Matching is performed using multiple ISSN columns with priority rules.

Outputs

When run as a script, the analysis produces a single Excel workbook:

journal_overlap_analysis_results.xlsx

with the following sheets:

Sheet name	Description
`Overall`	JSTOR journal-periods with co-occurrence flags and counts
`JSTOR_InterCollection`	Overlap between JSTOR sub-collections
`JSTOR_vs_NonJSTOR`	Overlap between JSTOR and non-JSTOR collections
`Drilldown`	Detailed journal-period match table
`Unique_with_VU_Pubs`	At-risk journals enriched with VU publications and SJR
(or) `Unique`	At-risk journals without enrichment

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
scripts		scripts
.gitignore		.gitignore
README.md		README.md
configTEMPLATE.json		configTEMPLATE.json
environment-large.yml		environment-large.yml
environment.yml		environment.yml
jupytext.toml		jupytext.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Journal Overlap Analysis for JSTOR Collections

Input Data

Methodology

Period overlap classification

Analysis

1. Overall co-occurrences

2. JSTOR inter-collection overlap

3. JSTOR vs non-JSTOR overlap

4. Detailed drill-down table

5. Unique and at-risk journals

6. Enrichment: VU publications and SJR

Outputs

About

Uh oh!

Releases

Packages

Languages

ubvu/jstor-collections

Folders and files

Latest commit

History

Repository files navigation

Journal Overlap Analysis for JSTOR Collections

Input Data

Methodology

Period overlap classification

Analysis

1. Overall co-occurrences

2. JSTOR inter-collection overlap

3. JSTOR vs non-JSTOR overlap

4. Detailed drill-down table

5. Unique and at-risk journals

6. Enrichment: VU publications and SJR

Outputs

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages