-
Notifications
You must be signed in to change notification settings - Fork 122
Description
Stumbled upon an interesting case today with the latest contribution from TUHH (#93). As it turned out, the file was already enriched with all relevant metadata (in the old schema, though)
Here are the first 3 entries from the original file:
| institution | period | euro | doi | is_hybrid | publisher | journal_full_title | issn | issn_print | issn_electronic | indexed_in_CrossRef | pmid | pmcid | DOAJ | license_ref |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Hamburg TUHH | 2015 | 694,76 | 10.1155/2015/898651 | FALSE | Hindawi Publishing Corporation | Advances in Fuzzy Systems | 1687-711X | 1687-7101 | 1687-711X | TRUE | NA | NA | TRUE | http://creativecommons.org/licenses/by/3.0/ |
| Hamburg TUHH | 2015 | 1248,46 | 10.3390/ma8010285 | FALSE | MPID | Materials | 1996-1944 | NA | 1996-1944 | TRUE | NA | NA | TRUE | http://creativecommons.org/licenses/by/4.0/ |
| Hamburg TUHH | 2015 | 1874,25 | 10.1186/s13568-015-0122-7 | FALSE | Springer Open | AMB Express | 2191-0855 | NA | 2191-0855 | TRUE | 26054736 | PMC4460186 | TRUE | http://creativecommons.org/licenses/by/4.0/ |
And here is the result from the automated metadata enrichment. As you can see, there are several differences in the publisher and issn columns (journal_title also often differs, although not in these first 3). On the other hand, the original file contributes much more information in the license column, where crossref will often report "NA" values.
| "institution" | "period" | "euro" | "doi" | "is_hybrid" | "publisher" | "journal_full_title" | "issn" | "issn_print" | "issn_electronic" | "license_ref" | "indexed_in_crossref" | "pmid" | "pmcid" | "ut" | "url" | "doaj" |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| "Hamburg TUHH" | 2015 | 694.76 | "10.1155/2015/898651" | FALSE | "Hindawi Publishing Corporation" | "Advances in Fuzzy Systems" | "1687-7101" | "1687-7101" | "1687-711X" | "http://creativecommons.org/licenses/by/3.0/" | TRUE | NA | NA | NA | NA | TRUE |
| "Hamburg TUHH" | 2015 | 1248.46 | "10.3390/ma8010285" | FALSE | "MDPI AG" | "Materials" | "1996-1944" | NA | "1996-1944" | NA | TRUE | NA | NA | NA | NA | TRUE |
| "Hamburg TUHH" | 2015 | 1874.25 | "10.1186/s13568-015-0122-7 " | FALSE | "Springer Science + Business Media" | "AMB Express" | "2191-0855" | NA | "2191-0855" | NA | TRUE | "26054736" | "PMC4460186" | NA | NA | TRUE |
Question is: How should we proceed in such cases? Where should crossref (or Pubmed or DOAJ) imports overwrite existing data?
As for the current task, I decided to replace the journal_title, publisher and ISSNs with crossref imports but keep the license column.