Skip to content

Refactor dace/data.py into dace/data/ package#2245

Merged
tbennun merged 6 commits intomainfrom
copilot/refactor-dace-data-structure
Dec 9, 2025
Merged

Refactor dace/data.py into dace/data/ package#2245
tbennun merged 6 commits intomainfrom
copilot/refactor-dace-data-structure

Conversation

Copy link
Contributor

Copilot AI commented Dec 4, 2025

Refactor dace/data.py into dace/data/ package

Summary

This PR refactors the monolithic dace/data.py file into a modular dace/data/ package with separate files for different functionality, improving code organization and maintainability.

Changes

  • dace/data/core.py: Core data descriptor classes (Data, Scalar, Array, ContainerArray, Stream, Structure, View, Reference and their subclasses)
  • dace/data/tensor.py: Tensor/sparse tensor support (Tensor, TensorIndex* classes)
  • dace/data/creation.py: Data descriptor creation functions (create_datadescriptor, make_array_from_descriptor, make_reference_from_descriptor)
  • dace/data/ctypes_interop.py: Ctypes interoperability (make_ctypes_argument)
  • dace/data/ml.py: ML-related descriptors (ParameterArray)
  • dace/data/__init__.py: Re-exports all public API for backward compatibility
  • dace/utils.py: Utility functions (find_new_name, deduplicate, prod)
  • dace/properties.py: Updated to handle circular import gracefully
  • dace/autodiff/library/library.py: Updated to import ParameterArray from the new location
  • Deleted old dace/data.py file
  • Removed Number and ArrayLike from dace/data/__init__.py (other places import directly)
  • Moved _prod to dace/utils.py as prod (kept _prod export for backward compat)
  • Fixed broken imports in data_report.py, data_layout_tuner.py, and cutout.py

Backward Compatibility

All public APIs are re-exported from dace.data, ensuring backward compatibility with existing code.

Original prompt

This section details on the original issue you should resolve

<issue_title>Refactor dace/data.py</issue_title>
<issue_description>data.py is a monolithic file containing classes for core data containers (Data, Scalar, Array, Stream, View, Reference, and their subclasses *{View, Reference}; functionality to get data descriptors from arbitrary objects; derived objects for Tensors and sparse tensors; and other functions.

This issue will be resolved once data.py is refactored to a dace/data/* folder, which will contain separate files for:

  1. core descriptor classes
  2. structures (the Structure class and similar functionality)
  3. tensors/sparse tensors
  4. descriptor creation
  5. ML-related data descriptors, such as parameter arrays (see dace/autodiff/library/library.py)
    6...N. Other functions and classes categorized by their semantic meaning.

The code for dace/data/* will be refactored out of data.py (which should not exist at the end of this issue), dtypes.py (which may exist but be shorter), and other files that contain data descriptors (subclasses of Data/Array/Stream/Structure/View/Reference, such as ParameterArray. Try to find all such subclasses in the codebase barring tests/* and samples/*).

Lastly, utility functions in data.py and dtypes.py (only those two files for this issue), such as find_new_name from data.py and deduplicate from dtypes.py, should find themselves in a new dace/utils.py file.</issue_description>

Comments on the Issue (you are @copilot in this section)


💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.

Copilot AI and others added 2 commits December 4, 2025 17:01
- Create dace/data/ package with modular structure:
  - core.py: Data, Scalar, Array, ContainerArray, Stream, Structure, View, Reference and subclasses
  - tensor.py: Tensor and TensorIndex* classes for sparse tensor support
  - creation.py: create_datadescriptor, make_array_from_descriptor, make_reference_from_descriptor
  - ctypes_interop.py: make_ctypes_argument
  - __init__.py: Re-exports all public API for backward compatibility
- Create dace/utils.py with utility functions (find_new_name, deduplicate)
- Update dace/properties.py to handle circular import gracefully
- Remove old dace/data.py (now replaced by package)

Co-authored-by: tbennun <8348955+tbennun@users.noreply.github.com>
- Create dace/data/ml.py with ParameterArray for ML-related descriptors
- Update dace/autodiff/library/library.py to import from new location
- Update dace/data/__init__.py to export ParameterArray

Co-authored-by: tbennun <8348955+tbennun@users.noreply.github.com>
Copilot AI changed the title [WIP] Refactor data.py into modular components in dace/data Refactor dace/data.py into dace/data/ package Dec 4, 2025
Copilot AI requested a review from tbennun December 4, 2025 17:13
Copy link
Collaborator

@tbennun tbennun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great

ArrayReference,
StructureReference,
ContainerArrayReference,
_prod,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_prod looks like a prime candidate to move to dace.utils.prod

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved _prod to dace/utils.py as prod in ae2ae8b. The function is now exported from dace.data as _prod for backward compatibility with existing code.

Array,
ContainerArray,
Stream,
Structure,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should move structures and structure view/reference to structure.py. There are several helper functions that are only needed by structures.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Structure, StructureView, and StructureReference classes have complex inheritance relationships with Data, View, and Reference classes respectively. Moving them to a separate file creates circular import issues since View.view() and Reference.view() create instances of these classes. The helper functions _arrays_to_json and _arrays_from_json are used only by Structure and could be moved if we refactor to break the circular dependencies. Would you prefer I attempt this refactor with deferred imports, or is keeping them in core.py acceptable for now?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did you erase dace/data.py? I don't see the removal in this PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, deleted dace/data.py in ae2ae8b.

Comment on lines +117 to +119
# Type hints
'ArrayLike',
'Number',
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that belongs in dace/data. Other places need to import it directly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed Number and ArrayLike from dace/data/__init__.py in ae2ae8b.

Comment on lines +106 to +107
'Range',
'Offset',
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that belongs in dace/data. Other places need to import it directly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved _prod to dace/utils.py as prod in ae2ae8b. Still exported from dace.data as _prod for backward compatibility (there are many internal usages like data._prod).

Copilot AI and others added 2 commits December 6, 2025 22:47
- Move `_prod` to `dace/utils.py` as `prod`
- Delete `dace/data.py` file
- Remove `Number` and `ArrayLike` from `dace/data/__init__.py`
- Keep `_prod` in `dace/data/__init__.py` for backward compatibility
- Update imports in files that used `from dace.data import _prod`

Co-authored-by: tbennun <8348955+tbennun@users.noreply.github.com>
- Add type hints to prod function
- Update docstring with more detailed documentation
- Handle symbolic expressions (SymPy) correctly

Co-authored-by: tbennun <8348955+tbennun@users.noreply.github.com>
Copilot AI requested a review from tbennun December 6, 2025 22:50
@tbennun
Copy link
Collaborator

tbennun commented Dec 6, 2025

@copilot acceptable for now

@tbennun
Copy link
Collaborator

tbennun commented Dec 7, 2025

@copilot Looks like an error has occured because several modules import ArrayLike and Number directly from data.

Apply this patch (at least), and then try to find any other uses of functions you removed.

diff --git a/dace/codegen/instrumentation/data/data_report.py b/dace/codegen/instrumentation/data/data_report.py
index d944c916f..eb7813fb0 100644
--- a/dace/codegen/instrumentation/data/data_report.py
+++ b/dace/codegen/instrumentation/data/data_report.py
@@ -2,10 +2,15 @@
 from dataclasses import dataclass
 import struct
 from typing import Any, Dict, List, Set, Tuple, Union
+from numbers import Number
 import os
 
 from dace import dtypes, SDFG
-from dace.data import ArrayLike, Number  # Type hint
+try:
+    from numpy.typing import ArrayLike
+except ImportError:
+    ArrayLike = Any  # type: ignore
+
 
 import numpy as np
 
diff --git a/dace/optimization/data_layout_tuner.py b/dace/optimization/data_layout_tuner.py
index 1dab5b3f1..f45325f31 100644
--- a/dace/optimization/data_layout_tuner.py
+++ b/dace/optimization/data_layout_tuner.py
@@ -5,7 +5,7 @@ import enum
 import copy
 import itertools
 
-from typing import Generator, Optional, Tuple, Dict, List, Sequence, Set
+from typing import Any, Generator, Optional, Tuple, Dict, List, Sequence, Set
 
 from dace import data as dt, SDFG, dtypes
 from dace.optimization import cutout_tuner
@@ -19,6 +19,11 @@ try:
 except (ImportError, ModuleNotFoundError):
     tqdm = lambda x, **kwargs: x
 
+try:
+    from numpy.typing import ArrayLike
+except ImportError:
+    ArrayLike = Any  # type: ignore
+
 
 class TuningGroups(enum.Enum):
     Separate = enum.auto()
@@ -111,7 +116,7 @@ class DataLayoutTuner(cutout_tuner.CutoutTuner):
         cutout.instrument = self.instrument
 
         # Prepare original arguments to sub-SDFG from instrumented data report
-        arguments: Dict[str, dt.ArrayLike] = {}
+        arguments: Dict[str, ArrayLike] = {}
         for cstate in cutout.nodes():
             for dnode in cstate.data_nodes():
                 if cutout.arrays[dnode.data].transient:
diff --git a/dace/sdfg/analysis/cutout.py b/dace/sdfg/analysis/cutout.py
index 432c765aa..9feda0a00 100644
--- a/dace/sdfg/analysis/cutout.py
+++ b/dace/sdfg/analysis/cutout.py
@@ -8,7 +8,7 @@ from networkx.algorithms.flow import edmondskarp
 import sympy as sp
 from collections import deque
 import copy
-from typing import Deque, Dict, List, Set, Tuple, Union, Optional, Any
+from typing import Any, Deque, Dict, List, Set, Tuple, Union, Optional, Any
 from dace import data, DataInstrumentationType
 from dace.sdfg import nodes as nd, SDFG, SDFGState, utils as sdutil, InterstateEdge
 from dace.memlet import Memlet
@@ -18,6 +18,12 @@ from dace.transformation.transformation import (MultiStateTransformation, Patter
                                                 SingleStateTransformation)
 from dace.transformation.interstate.loop_detection import DetectLoop
 from dace.transformation.passes.analysis import StateReachability
+from numbers import Number
+
+try:
+    from numpy.typing import ArrayLike
+except ImportError:
+    ArrayLike = Any  # type: ignore
 
 
 class SDFGCutout(SDFG):
@@ -52,12 +58,12 @@ class SDFGCutout(SDFG):
         self._instrument_base_sdfg()
         self._base_sdfg(*args, **kwargs)
 
-    def find_inputs(self, *args, **kwargs) -> Dict[str, Union[data.ArrayLike, data.Number]]:
+    def find_inputs(self, *args, **kwargs) -> Dict[str, Union[ArrayLike, Number]]:
         self._dry_run_base_sdfg(*args, **kwargs)
 
         drep = self._base_sdfg.get_instrumented_data()
         if drep:
-            vals: Dict[str, Union[data.ArrayLike, data.Number]] = dict()
+            vals: Dict[str, Union[ArrayLike, Number]] = dict()
             for ip in self.input_config.union(set(self.symbols)):
                 val = drep.get_first_version(ip)
                 vals[ip] = val

- Update dace/codegen/instrumentation/data/data_report.py to import
  Number from numbers module and ArrayLike from numpy.typing
- Update dace/optimization/data_layout_tuner.py with proper ArrayLike import
- Update dace/sdfg/analysis/cutout.py with proper ArrayLike and Number imports

Co-authored-by: tbennun <8348955+tbennun@users.noreply.github.com>
Copy link
Contributor Author

Copilot AI commented Dec 7, 2025

@copilot Looks like an error has occured because several modules import ArrayLike and Number directly from data.

Apply this patch (at least), and then try to find any other uses of functions you remo...

Applied the patch and fixed all three files in 378ac9a. Also searched for any other usages of ArrayLike and Number from dace.data - these were the only three files affected.

@tbennun tbennun marked this pull request as ready for review December 7, 2025 17:16
@tbennun
Copy link
Collaborator

tbennun commented Dec 7, 2025

@affifboudaoud please verify new location of parameter array
@phschaad why did CSCS/GPU not run? is there a way we can debug the webhooks? GitHub Copilot marked the PR as draft. Nevermind
@acalotoiu FYI

@tbennun tbennun disabled auto-merge December 8, 2025 14:25
Copy link
Contributor

@acalotoiu acalotoiu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@tbennun tbennun added this pull request to the merge queue Dec 8, 2025
from dace.transformation.passes.analysis import StateReachability

try:
from numpy.typing import ArrayLike
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this try/except a fallback for numpy <1.2, in which case we can drop it?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was added because it was a direct refactor and we have that try in data.py

tqdm = lambda x, **kwargs: x

try:
from numpy.typing import ArrayLike
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same fallback comment

from dace import dtypes, SDFG
from dace.data import ArrayLike, Number # Type hint
try:
from numpy.typing import ArrayLike
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same - why was this added?

Merged via the queue into main with commit 173de0b Dec 9, 2025
26 of 27 checks passed
@tbennun tbennun deleted the copilot/refactor-dace-data-structure branch December 9, 2025 01:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Refactor dace/data.py

4 participants