Skip to content

_single_size_image_list removing duplicates also removing the corresponding filename/label #801

@LIEeOoNn

Description

@LIEeOoNn

Is your feature request related to a problem?

When removing duplicates of an ImageList the ImageList length will be different from the filenames/ labels, thus causing problems when training a nn.

Desired solution

the remove_duplicate_images() meth should also have a parameter for the filenames/labels list[str] so both are updated correctly
solution which is also much faster than the old one

def remove_duplicate_images (self, filenames: list[str]) ->tuple[ImageList, list[str]]:
        import numpy
        image_list = self.to_images()
        image_list_without_dubs: list[Image] = []
        images = ImageList
        filenames_new: list[str] = []
        unique_byte = set()
        for i in range(len(image_list)):
            tensor = image_list[i]._image_tensor
            tensor_byte = tensor.numpy().tobytes()
            if tensor_byte not in unique_byte:
                unique_byte.add(tensor_byte)
                image_list_without_dubs.append(image_list[i])
                filenames_new.append(filenames[i])
        images = images.from_images(image_list_without_dubs)
        return images, filenames_new

Possible alternatives (optional)

No response

Screenshots (optional)

No response

Additional Context (optional)

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    Projects

    Status

    Backlog

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions