Word count example

This example project will count words in a given text and plot a bar chart of the 10 most common words.

Inspired by and derived from https://github.com/coderefinery/word-count which is distributed under Creative Commons Attribution license (CC-BY 4.0).

Install dependencies

Python virtual environment

Create a virtual environement

python3 -m venv venv

and activate it. On Unix (MacOSX and Linux) you do

. \venv/bin/activate

and on Windows you do

.\venv\Scripts\activate

Next you can install the dependencies

python3 -m pip install -r requirements.txt

Conda environment

Install the conda environment using

conda env create -f environment.yml

and activate the environment

conda activate word-count

Exercises

Exercise 1

Create an automated workflow to count all words in the data folder and save the results to a new directory called results using the script count.py.
Also, use the script plot.py to create a figure for each dataset and save it in a folder called figures

Note: Make sure to use appropriate names for the results and figures.

Solution: https://github.com/finsberg/word-count/tree/exercise-1

Exercise 2

Create a test that verifies the implementation of count_words in count.py. Add the test in a new folder called tests.

Run the test(s) with pytest.

Tip 1: You can add the code directory to your python path in your test using the following snippet

from pathlib import Path
import sys

here = Path(__file__).parent
sys.path.append((here / ".." / "code").as_posix())

Tip 2: You can run pytest using

python3 -m pytest

Solution: https://github.com/finsberg/word-count/tree/exercise-2

Exercise 3

Create a GitHub action to run the test every time you push to the repo.

Create a folder called .github/workflows and add the file tests.yml to it with the following content.

# Simple workflow for deploying static content to GitHub Pages
name: Run tests

on: [push]

jobs:
  run:
    runs-on: ubuntu-22.04
    steps:
      - uses: actions/checkout@v4

      - name: Setup Python
        uses: actions/setup-python@v4
        with:
          python-version: "3.10"

      - name: Install dependencies
        run: python3 -m pip install -r requirements.txt

      - name: Run tests
        run: python3 -m pytest tests

Solution: https://github.com/finsberg/word-count/tree/exercise-3

Exercise 4

Create a GitHub workflow for running the full analysis and uploading the results and figures as artifact. Create a new file .github/workflows/reproduce_results.yml with the following content

# Simple workflow for deploying static content to GitHub Pages
name: Run tests

on: [push]

jobs:
  run:
    runs-on: ubuntu-22.04

    env:
      # Directory that will be published on github pages
      DATADIR: ./data
      FIGDIR: ./artifacts/figures
      RESULTDIR: ./artifacts/results


    steps:
      - uses: actions/checkout@v4

      - name: Setup Python
        uses: actions/setup-python@v4
        with:
          python-version: "3.10"

      - name: Install dependencies
        run: python3 -m pip install -r requirements.txt

      - name: Run all experiments
        run: python3 run_all_experiments.py

      - name: Upload artifact
        if: always()
        uses: actions/upload-artifact@v3
        with:
          path: ./artifacts
          if-no-files-found: error

Solution: https://github.com/finsberg/word-count/tree/exercise-4

Exercise 5:

Create a release of your repository. Use the tag v1.0

Exercise 6:

Create a Dockerfile in the root of the repo that captures the environment. Try to build the docker image locally, e.g (from the root to the repo)

docker build -t word-count .

Try to run a container

docker run --rm -it word-count

And make sure all the code works inside the container.

Solution: https://github.com/finsberg/word-count/tree/exercise-6

Exercise 7:

Create a GitHub workflow for building and pushing a docker image to the registry associated with your repository. Create a new file .github/workflows/docker-image.yml with the following content

name: Create and publish a Docker image

on:
  push:
    branches:
      - "!*"
    tags:
      - "v*"

env:
  REGISTRY: ghcr.io
  IMAGE_NAME: ${{ github.repository }}

jobs:
  build-and-push-image:
    runs-on: ubuntu-latest
    permissions:
      contents: read
      packages: write

    steps:
      - name: Checkout repository
        uses: actions/checkout@v3

      - name: Set up QEMU
        uses: docker/setup-qemu-action@v3

      - name: Set up Docker Buildx
        uses: docker/setup-buildx-action@v3

      - name: Log in to the Container registry
        uses: docker/login-action@v3
        with:
          registry: ${{ env.REGISTRY }}
          username: ${{ github.actor }}
          password: ${{ secrets.GITHUB_TOKEN }}

      - name: Extract metadata (tags, labels) for Docker
        id: meta
        uses: docker/metadata-action@v5
        with:
          images: ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}

      - name: Build and push Docker image
        uses: docker/build-push-action@v5
        with:
          context: .
          push: true
          platforms: linux/amd64,linux/arm64
          tags: ${{ steps.meta.outputs.tags }}
          labels: ${{ steps.meta.outputs.labels }}

Note 1: This will only run whenever you create a new tag (i.e you create a Release on GitHub) Note 2: This will create two different images; one for Linux/AMD64 and one for Linux/ARM64. Linux/ARM64 is the type of image you will need if you are running Docker on Mac with Apple Silicon.

Create a new release of the code, and make sure the workflow runs and creates an image (also called package on GitHub) in your repository

Name		Name	Last commit message	Last commit date
Latest commit History 49 Commits
code		code
data		data
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
environment.yml		environment.yml
requirements.in		requirements.in
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Word count example

Install dependencies

Python virtual environment

Conda environment

Exercises

Exercise 1

Exercise 2

Exercise 3

Exercise 4

Exercise 5:

Exercise 6:

Exercise 7:

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Word count example

Install dependencies

Python virtual environment

Conda environment

Exercises

Exercise 1

Exercise 2

Exercise 3

Exercise 4

Exercise 5:

Exercise 6:

Exercise 7:

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages