Web Scrape app using Node X-Ray and MongoDB.
Our test is using Yahoo News! to scrape and import content to a database.
Quick setup:
-
Clone the repo
-
cd into scrape-app -
npm install -
node app.js
Bam! Now you'll have Yahoo News scraped and imported into your MongoDB. Enjoy!
To crawl more pages, you can change the limit() on line 29 of app.js to your desired setting.
MongoDB is doing an insert(). To keep from importing duplicate content, you can set the title field to unique by using the following command for MongoDB.
db.scraped-articles.createIndex( { "title": 1 }, { unique: true });