-
Notifications
You must be signed in to change notification settings - Fork 80
Open
Labels
bugSomething isn't workingSomething isn't working
Description
Describe the bug
This bug came up in the process of generating this notebook for our recent preprint. As part of the workflow, we fit a simple model with two design factors. One design factor has two levels; the other has seventy.
In 0.4.8, the procedure takes about four minutes, the vast majority of it spent on fitting dispersions. The mean/dispersion curve looks OK.
In 0.5.1, the dispersion fitting fails.
To Reproduce
Notebook here. Test data here.
New version:
anndata==0.11.4
matplotlib==3.10.3
numpy==2.2.6
pandas==2.2.3
pydeseq2==0.5.1
scanpy==1.11.1
scipy==1.15.3
tqdm==4.67.1
python==3.13.2
Old version:
tqdm==4.67.1
scipy==1.11.4
scanpy==1.9.6
pydeseq2==0.4.8
pandas==2.2.3
numpy==1.26.2
matplotlib==3.8.2
anndata==0.10.3
python==3.9.7
Expected behavior
- Most immediately, the dispersion estimation should probably not fail, given that it works in an older version.
- If the dispersion fitting procedure changed at some point between 0.4.8 and 0.5.1, it would be good if this were documented. But the only entries I see relate to vst (unrelated) and refitting (also unrelated).
- I do not expect the dispersion computation to scale so unfavorably with the number of levels. The example shown here takes a few minutes for
dds_f, which has 70 levels, but over an hour fordds_m, which has a little over 200 levels. This is about a 20x increase in runtime for a 2.8x increase in the size of the design matrix.
Screenshots
See above.
Desktop (please complete the following information):
- OS: Linux-5.10.93-87.444.amzn2.x86_64-x86_64-with-glibc2.26
BorisMuzellec
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working