Skip to content

[BUG] Dispersion estimation fails after 0.4.8 #391

@gennadyFauna

Description

@gennadyFauna

Describe the bug
This bug came up in the process of generating this notebook for our recent preprint. As part of the workflow, we fit a simple model with two design factors. One design factor has two levels; the other has seventy.

In 0.4.8, the procedure takes about four minutes, the vast majority of it spent on fitting dispersions. The mean/dispersion curve looks OK.

Image

In 0.5.1, the dispersion fitting fails.

Image

To Reproduce

Notebook here. Test data here.

New version:

anndata==0.11.4
matplotlib==3.10.3
numpy==2.2.6
pandas==2.2.3
pydeseq2==0.5.1
scanpy==1.11.1
scipy==1.15.3
tqdm==4.67.1
python==3.13.2

Old version:

tqdm==4.67.1
scipy==1.11.4
scanpy==1.9.6
pydeseq2==0.4.8
pandas==2.2.3
numpy==1.26.2
matplotlib==3.8.2
anndata==0.10.3
python==3.9.7

Expected behavior

  1. Most immediately, the dispersion estimation should probably not fail, given that it works in an older version.
  2. If the dispersion fitting procedure changed at some point between 0.4.8 and 0.5.1, it would be good if this were documented. But the only entries I see relate to vst (unrelated) and refitting (also unrelated).
  3. I do not expect the dispersion computation to scale so unfavorably with the number of levels. The example shown here takes a few minutes for dds_f, which has 70 levels, but over an hour for dds_m, which has a little over 200 levels. This is about a 20x increase in runtime for a 2.8x increase in the size of the design matrix.

Screenshots
See above.

Desktop (please complete the following information):

  • OS: Linux-5.10.93-87.444.amzn2.x86_64-x86_64-with-glibc2.26

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions