Skip to content

merge codebase-hydra-restructure into main#90

Merged
cameronraysmith merged 8 commits intomainfrom
codebase-hydra-restructure
Mar 7, 2023
Merged

merge codebase-hydra-restructure into main#90
cameronraysmith merged 8 commits intomainfrom
codebase-hydra-restructure

Conversation

@cameronraysmith
Copy link
Collaborator

@cameronraysmith cameronraysmith commented Mar 5, 2023

To be followed by #92 .

@cameronraysmith cameronraysmith changed the title merge codebase hydra restructure into codebase merge codebase-hydra-restructure into codebase Mar 5, 2023
@cameronraysmith cameronraysmith changed the base branch from codebase to main March 5, 2023 18:20
@cameronraysmith cameronraysmith changed the title merge codebase-hydra-restructure into codebase merge codebase-hydra-restructure into main Mar 5, 2023
@cameronraysmith cameronraysmith added enhancement New feature or request refactoring Refactoring labels Mar 5, 2023
@cameronraysmith cameronraysmith added this to the 0.0.1 milestone Mar 5, 2023
@cameronraysmith cameronraysmith linked an issue Mar 5, 2023 that may be closed by this pull request
@cameronraysmith cameronraysmith force-pushed the codebase-hydra-restructure branch from 6897a69 to bbe8da4 Compare March 7, 2023 16:24
@cameronraysmith cameronraysmith self-assigned this Mar 7, 2023
@cameronraysmith cameronraysmith marked this pull request as ready for review March 7, 2023 16:28
@cameronraysmith cameronraysmith merged commit e9794d3 into main Mar 7, 2023
cameronraysmith added a commit that referenced this pull request Mar 7, 2023
* wip: dataloader first draf

* Fixing train, val, and test path

* Added initial project structure

Added a bunch of directories with (mostly) empty/dummy .py files for now, so that everyone can see what the project will be structured like. On top of the present directories, there will also be a datasets and a logs directory, the latter being dynamically created at traintime or validation time.

* rename file, remove one-hot encode

* Revert "wip: dataloader first draft"

* Updating component loading section

* sequence dataloader baseline model

* fixing a couple typos

* Delete src/metrics directory

Deleting metrics directory as it was decided we'll have only one file with all metrics.

* Added refactored DDPM and UNet from notebook V2

Refactored Lucas's DDPM, UNet and units and added them as PL modules.

* Update diffusion.py

Added "instantiate_from_config" import.

* Update ddpm.py

Added nucleotides as a parameter with a default of 4 to the sample method.

* wip: separate train/val/test subclasses

* Delete codebase/src/data directory

* Updated PL  dataloader

* placeholder test file

* Update unet_lucas.py

Added default function import.

* Added matching dummy test files

* complete: initial dataloader

* Added config template

Designed config template mainly for PL-related parameters. Keeping multiprocessing arguments for multi-GPU for the first test, which we'll change to multi-node. Diffusion and UNet parameters can easily vary.

* Delete dummy_config.yaml

* delete test_diffusion

* fix: fixed function naming convention

* feat: Add initial CI proposal

* feat: Add a simple pyproject config file

* wip: train.py + configs

* config folder structure update

* fix datapath param of datasets

* add additional sequence encoding schemes + separate transforms

* add tests for sequence dataloader

* add additional asserts for data batches

* check sequence lengths in datasets

* add more tests for invalid data

* style: run black

* feat: Refactor schedules and remove time_difference

* feat: Add type hints to schedule utility functions

* feat: Refactor noise schedule fn

* feat: refactor q_sample fn

* feat: add type hints to q_sample

* feat: drop bit_scale

* feat: run black and switch to torch.log

* feat: drop t_index

* feat: refactor p_sample fn

* feat: refactor p_sample_loop fn

* feat: refactor sample fn

* feat: refactor training_step fn

* feat(ci): Add `codebase` branch to CI

Based on discussion with @mateibejan1, running the tests on the `codebase` branch is also essential. It's the branch which is under heavy development and we should ensure all tests pass before we merge into `codebase` as well.

* reqs: add `pandas` to requirements.txt

* reqs: add `torch` to requirements.txt

* reqs: bump torch to `1.11.0` for compatibility

* fix(ci): run pytest as a module

* reqs: add torchvision to `0.12.0`

* reqs: add `pytorch-lightning`

* fix: failing CI tests for dataloader across platforms

* fix: failing CI tests for dataloader - wrap transforms

* fix: failing CI tests for dataloader - no multiprocessing for transforms

* Add Lucas' conditioned UNet

* Update EMA with Lucas' version

* Added mean_flat util from P2 paper

* Added P2 weighting skeleton. 

Need to figure out how to use P2 weighting on DNA data.

* misc: create a PR template

Fixes #51

* misc: add doc strings and type hints to the PR template

cc: @mateibejan1

* Add files via upload

* Add files via upload

* Add files via upload

Updated DDPM with the Noah's refactored notebook version. Preemptively added p2_weighting, need to figure out if/how it works on bit sequences.

* Add files via upload

* Add files via upload

* style: run black

* feat: add type hints to `utils/misc.py`

* feat: add type hints to utils/metrics

* feat: add type hints to utils/schedules

* feat: add type hints to unet_bitdiffusion

* feat: add type hints to unet_lucas

* feat: add type hints to ddim

* feat: add type hints to seq dataloader

* feat: add type hints to unet_lucas_cond

* Delete ddim.py

Deprecated.

* Delete unet_bitdiffusion.py

Deprecated.

* Update unet_conditional.yaml

Changed default number of timesteps from 1000 to 200.

* Update unet_conditional.yaml

Moved unet_config params inside the diffusion models params, so it mirrors the hierarchical relationship between the diffusion class and the unet class.

* Update misc.py

Minor dict property name changes.

* Update diffusion.py

* Update diffusion.py

* Update default.yaml

* Update unet_lucas.py

* initial test lucas unet

* add test vq

* ddm

* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci

* merge codebase-hydra-restructure into main (#90)

* WIP new folder structure

* ema parameter fix

* Base dataloader instantiation with full hydraconfig succesful, missing full params

* Update sequence_dataloader.py

* Remove outputs folder, update .gitignore

* Update network.py

* Update sequence_datamodule.py

* Update sequence_datamodule.py

---------

Co-authored-by: cmvcordova <cmvcordova@github.com>
Co-authored-by: cmvcordova <cmvcordova@pm.me>
Co-authored-by: Matei Bejan <24592776+mateibejan1@users.noreply.github.com>

---------

Co-authored-by: ssenan <simonsenan@gmail.com>
Co-authored-by: Matei Bejan <24592776+mateibejan1@users.noreply.github.com>
Co-authored-by: Bendidi Ihab <ihabnobendidi@gmail.com>
Co-authored-by: Saurav Maheshkar <sauravvmaheshkar@gmail.com>
Co-authored-by: Jan Sobotka <jsobotka1188@gmail.com>
Co-authored-by: ceziegler <cheyenneeziegler@gmail.com>
Co-authored-by: jamesthesnake <james.ryan.hennessy@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: cmvcordova <cmvcordova@github.com>
Co-authored-by: cmvcordova <cmvcordova@pm.me>
@cameronraysmith cameronraysmith deleted the codebase-hydra-restructure branch March 10, 2023 04:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request refactoring Refactoring

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

merge "codebase" work into default branch

3 participants