Skip to content

Missing XML files in 990 bucket #7

@elijahgoldberg

Description

@elijahgoldberg

Hi there — I am seeing a large number of missing XML files, and wondering whether it is something on my fault or something in the S3 instance. Of 300 objects tested, 80% (241) threw a 404 error:

In download.file(url, file_path, quiet = TRUE)
cannot open URL 'https://s3.amazonaws.com/irs-form-990/201723259349300000_public.xml': HTTP status was '404 Not Found'

When I try to access one of these files through the browser, I see this message:

NoSuchKey The specified key does not exist. 201700379349300224_public.xml CFAMJNK6P9V903JZ TkCHSmCi+R7gHMceu2ZZp3jIrEGWT3IQlmAXn28iLU7S0pfltJ/Bvz+TZeGhhhBkEOT95EI3BQA=

3 sample keys that are in the index and return XML

5 sample keys that are in the index but return 404

For context, I tried a similar exercise about 2 years ago and anecdotally remember the occasional missing XML file, but not close to 80%.

Any ideas about what is going on? Thanks!

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions