-
Notifications
You must be signed in to change notification settings - Fork 5
Open
Milestone
Description
Is your feature request related to a problem?
When removing duplicates of an ImageList the ImageList length will be different from the filenames/ labels, thus causing problems when training a nn.
Desired solution
the remove_duplicate_images() meth should also have a parameter for the filenames/labels list[str] so both are updated correctly
solution which is also much faster than the old one
def remove_duplicate_images (self, filenames: list[str]) ->tuple[ImageList, list[str]]:
import numpy
image_list = self.to_images()
image_list_without_dubs: list[Image] = []
images = ImageList
filenames_new: list[str] = []
unique_byte = set()
for i in range(len(image_list)):
tensor = image_list[i]._image_tensor
tensor_byte = tensor.numpy().tobytes()
if tensor_byte not in unique_byte:
unique_byte.add(tensor_byte)
image_list_without_dubs.append(image_list[i])
filenames_new.append(filenames[i])
images = images.from_images(image_list_without_dubs)
return images, filenames_newPossible alternatives (optional)
No response
Screenshots (optional)
No response
Additional Context (optional)
No response
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels
Type
Projects
Status
Backlog