Word Power

The purpose of this project is to recreate the Word Power paper by Jegadeesh and Wu. There is a documented Jupyter Notebook in notebooks/Word Power.ipynb which can be run in order to generate results, but it is not able to use multiprocessing so it is slower than the actual program. To leverage multiprocessing abilities, run the main.py program via $ python main.py.

System Requirements

Given that Python and Redis will need to load the data into system memory (RAM) the program can get very memory intense. It is necessary to at least have 16GB of RAM in order to run the full program (from year 1995 - 2008). In order to run this program, it is necessary to have Redis installed and working properly. If you are using Windows, Redis can be installed through Chocolatey via C:> choco install redis-64. If you are running MacOSX you can install Redis via $ brew install redis. If you are running Linux or another Unix you should be able to install Redis through your package manager or compile from source.

Dependencies

This program depends on having the necessary software and packages in order to run. First, you need to have Python 3.5.2 installed. Next, you should be able to install all Python software dependencies through running $ pip install -r requirements.txt. In order to get the lxml package installed on Windows, it may be necessary to install the .whl file located in the lib project directory via C:> pip install lib/lxml-3.6.4-cp35-cp35m-win_amd64.whl. We had some issues with the third-party package SECEdgar and had to modify it in order to get it to work properly. Once the package is installed via pip it is possible to copy our version in the lib/SECEdgar folder and overwrite the version downloaded via pip if needed.

Project Structure

This outlines the project structure.

data - This folder contains necessary data to run the analysis. The merged CRSP and Compustat datafile is too large to include in the project so it is necessary to run the SAS program (CRSP+Comp.sas) to generate the crsp_comp.sas7bdat data file first.
data/_amended - This folder is used to hold 10-K files that are amended 10-Ks
data/_error - This folder is used to hold 10-K files that contained errors that made it impossible to analyze
data/_nostockdata - This folder contains 10-K files in which we had no stock information for the company on the filing date
data/_outofrange - This folder contains 10-K files that are outside of our date range that we are looking at
data/SEC-Edgar-data - This folder is created by the Jupyter Notebook program to contain the download 10-K files
lib - This folder contains needed library files that may be helpful
notebooks - This folder contains a Jupyter Notebook file that was used in development of the algorithm
SEC-Edgar-data - This folder contains the download 10-K files from the main.py program
CRSP+Comp.egp - This is the SAS Enterprise Guide project file that can be used to generate the CRSP and Compustat data
CRSP+Comp.sas - This is the SAS code file that can be run to generate the CRSP and Compustat data
main.py - This is the main file that runs the program version of the algorithm
requirements.txt - This file contains all needed dependencies which can be installed via $ pip install -r requirements.txt
Word Power - A New Approach for Content Analysis.pdf - This is the PDF version of the paper we are recreating
WordPower.py - This is the Python file that contains code for the WordPower class that the main.py file uses

Contact

For any issues that may occur or if there is an issue with obtaining the correct data, please contact Andrew Jarrett at andrew.jarrett@gatech.edu.

Name		Name	Last commit message	Last commit date
Latest commit History 48 Commits
data		data
lib		lib
notebooks		notebooks
.gitattributes		.gitattributes
.gitignore		.gitignore
Comp+CRSP.egp		Comp+CRSP.egp
Comp+CRSP.sas		Comp+CRSP.sas
README.md		README.md
Word Power - A New Approach for Content Analysis.pdf		Word Power - A New Approach for Content Analysis.pdf
WordPower.py		WordPower.py
main.py		main.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Word Power

System Requirements

Dependencies

Project Structure

Contact

About

Uh oh!

Releases

Packages

Languages

AndrewJarrett/word-power

Folders and files

Latest commit

History

Repository files navigation

Word Power

System Requirements

Dependencies

Project Structure

Contact

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages