ScraReuters

A crawler for Reuters Archives.

Usage

Make sure scrapy and nltk are installed.

Run ./ScraReuters.py fetch to fetch the news. The articles are saved in json in the folder reuters in the current directory.
Run ./ScraReuters.py show to get some statistics on the sectors of the fetched news.

During fetching, the output info like the law firm of levi & korsinsky , llp announces investigation into possible breaches of fiduciary duty by the board of nyse euronext , inc . in connection with the sale of the company to intercontinentalexchange , inc . ['nyx'] means that the title is supposed to be related to the stock symbol 'nyx', though it may be inaccurate.

Issues

Due to the update of Reuters website, the identification of stock symbols doesn't work. A simple manual detection is used, which usually makes mistake. The rules are defined in ScraReuters/static/company.csv.

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
ScraReuters		ScraReuters
.gitignore		.gitignore
README.rst		README.rst
ScraReuters.py		ScraReuters.py
scrapy.cfg		scrapy.cfg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

ScraReuters

Usage

Issues

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

hanzhichao2000/ScraReuters

Folders and files

Latest commit

History

Repository files navigation

ScraReuters

Usage

Issues

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages