- Download and install (don't use windows store -- it installs to a weird path)
- Add to path (I had to place it at the front of the path)
- Restart powershell
- This should now find it:
get-command python
python -m pip install pyenv-win- run the above again to see where it was installed. Create an environment variable
PYENV_HOMEpointing to this. Add%PYENV_HOME%\binto the path. - modify powershell execution policy to allow running scripts
- Create an environment variable
POETRY_HOME. Initialize it to where you want this installed. Also, add%POETRY_HOME%\binto the path. - Install poetry:
(iwr https://install.python-poetry.org/ -UseBasicParsing).Content | python - No need to restart terminal:
poetry --version - Enable powershell to run scripts
Get-ExecutionPolicy- in an elevated powershell:
Set-ExecutionPolicy -ExecutionPolicy RemoteSigned
- cd into the directory with your project
- Create a
pyproject.toml:poetry init poetry config virtualenvs.path "$(pwd)/.venv"- Create the environment:
poetry env use $(get-command python)
- Download git for windows
- Download spark pre-built for apache hadoop 2.7
- Verify:
certUtil --hashfile ~\Downloads\spark... SHA512 - Download 7-zip and use it to decompress and untar this into a folder.
- Create environment variable
SPARK_HOMEwith the directory you untarred to.
- Verify:
git clone https://github.com/steveloughran/winutils.git- Create environment variable
HADOOP_HOMEwith the full path towinutils/hadoop-2.7.1. - You actually only need files
winutils.exeandhadoop.dllfrom the above. - Add
HADOOP_HOME\binto the path.
- Create environment variable
poetry add pysparkpoetry add pandaspoetry install