diff --git a/README.md b/README.md index 06e2a5e..2eea3cf 100644 --- a/README.md +++ b/README.md @@ -1,54 +1,48 @@
-

PLA: Language-Driven Open-Vocabulary 3D Scene Understanding

+

PLA & RegionPLC

+

This repo contains the official implementation of PLA (CVPR2023) and RegionPLC (CVPR 2024)

-
- Runyu Ding1*, - Jihan Yang1*, - Chuhui Xue2, - Wenqing Zhang2, - Song Bai2†, - Xiaojuan Qi1†, -
+
-
- 1The University of Hong Kong  - 2ByteDance -
+

PLA: Language-Driven Open-Vocabulary 3D Scene Understanding

- *equal contribution  - +corresponding author + Runyu Ding*, + Jihan Yang*, + Chuhui Xue, + Wenqing Zhang, + Song Bai, + Xiaojuan Qi,
-**CVPR 2023** +

CVPR 2023

-TL;DR: PLA leverages powerful VL foundation models to construct hierarchical 3D-text pairs for 3D open-world learning. +[project page](https://dingry.github.io/projects/PLA) | [arXiv](https://arxiv.org/abs/2211.16312) - - - - - - - - - - - -
working spacepianovending machine
+
+

RegionPLC: Regional Point-Language Contrastive Learning for Open-World 3D Scene Understanding

- +
+ Jihan Yang*, + Runyu Ding*, + Weipeng Deng, + Zhe Wang, + Xiaojuan Qi, +
+

CVPR 2024

- -[project page](https://dingry.github.io/projects/PLA) | [arXiv](https://arxiv.org/abs/2211.16312) +

project page | arXiv

-### TODO -- [ ] Release caption processing code +##### Highlights: +- Official PLA implementation is contained in the `main` branch +- Official RegionPLC implementation is contained in the `regionplc` branch + +### Release +- [2024-05-05] Releasing **RegionPLC** implementation. Please checkout `regionplc` branch to try it! ### Getting Started @@ -74,5 +68,14 @@ If you find this project useful in your research, please consider cite: } ``` +```bibtex +@inproceedings{yang2024regionplc, + title={RegionPLC: Regional point-language contrastive learning for open-world 3d scene understanding}, + author={Yang, Jihan and Ding, Runyu and Deng, Weipeng and Wang, Zhe and Qi, Xiaojuan}, + booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition}, + year={2024} +} +``` + ### Acknowledgement Code is partly borrowed from [OpenPCDet](https://github.com/open-mmlab/OpenPCDet), [PointGroup](https://github.com/dvlab-research/PointGroup) and [SoftGroup](https://github.com/thangvubk/SoftGroup). \ No newline at end of file diff --git a/docs/DATASET.md b/docs/DATASET.md index f081b57..57c4581 100644 --- a/docs/DATASET.md +++ b/docs/DATASET.md @@ -29,7 +29,7 @@ The dataset configs are located within [tools/cfgs/dataset_configs](../tools/cfg python3 pcseg/datasets/s3dis/preprocess.py ``` -- Additionally, please download the caption data [here](https://connecthkuhk-my.sharepoint.com/:f:/g/personal/u3007346_connect_hku_hk/EoNAsU5f8YRGtQYV8ewhwvQB7QPbxT-uwKqTk8FPiyUTtQ?e=wq58H7). Download image data [here](https://github.com/alexsax/2D-3D-Semantics) if you want to generate captions on your own. +- Additionally, please download the caption data [here](https://connecthkuhk-my.sharepoint.com/:f:/g/personal/u3007346_connect_hku_hk/EoNAsU5f8YRGtQYV8ewhwvQB7QPbxT-uwKqTk8FPiyUTtQ?e=wq58H7). If you want to generate captions on your own, please download image data [here](https://github.com/alexsax/2D-3D-Semantics) and follows scripts here: [generate_caption.py](../tools/process_tools/generate_caption.py) and [generate_caption_idx.py](../tools/process_tools/generate_caption_idx.py). - The directory organization should be as follows: @@ -46,5 +46,3 @@ The dataset configs are located within [tools/cfgs/dataset_configs](../tools/cfg ├── pcseg ├── tools ``` - -The scripts that process S3DIS images to generate captions and corresponding point indices will be available soon. diff --git a/docs/INFER.md b/docs/INFER.md new file mode 100644 index 0000000..9bb2419 --- /dev/null +++ b/docs/INFER.md @@ -0,0 +1,15 @@ +If you wish to test on custom 3D scenes or categories, you can utilize our example configs: + `tools/cfgs/scannet_models/spconv_clip_openvocab.yaml` and `tools/cfgs/scannet_models/inst/softgroup_clip_openvocab.yaml` + +The key parameters to consider are as follows: +- `TEXT_EMBED.CATEGORY_NAMES` + + This parameter allows you to define the category list for segmentation. + +- `TASK_HEAD.CORRECT_SEG_PRED_BINARY` and `INST_HEAD.CORRECT_SEG_PRED_BINARY` + + These parameters allow you to decide using binary head to rectify semantic scores or not. + + +To save the results, you can use the command `--save_results semantic,instance`. Afterward, you can employ the visualization utilities found in tools/visual_utils/visualize_indoor.py to visualize the predicted results. + diff --git a/docs/INSTALL.md b/docs/INSTALL.md index cf4df85..b95a60f 100644 --- a/docs/INSTALL.md +++ b/docs/INSTALL.md @@ -7,13 +7,13 @@ All the codes are tested in the following environment: #### Install dependent libraries a. Clone this repository. -```shell +```bash git clone https://github.com/CVMI-Lab/PLA.git ``` b. Install the dependent libraries as follows: -* Install the dependent python libraries: +* Install the dependent Python libraries (Please note that you need to install the correct version of `torch` and `spconv` according to your CUDA version): ```bash pip install -r requirements.txt ``` @@ -28,4 +28,4 @@ b. Install the dependent libraries as follows: * Install [pcseg](../pcseg) ```bash python3 setup.py develop - ``` \ No newline at end of file + ``` diff --git a/pcseg/datasets/__pycache__/__init__.cpython-38.pyc b/pcseg/datasets/__pycache__/__init__.cpython-38.pyc new file mode 100644 index 0000000..c190c9c Binary files /dev/null and b/pcseg/datasets/__pycache__/__init__.cpython-38.pyc differ diff --git a/pcseg/datasets/__pycache__/dataset.cpython-38.pyc b/pcseg/datasets/__pycache__/dataset.cpython-38.pyc new file mode 100644 index 0000000..8ff101c Binary files /dev/null and b/pcseg/datasets/__pycache__/dataset.cpython-38.pyc differ diff --git a/pcseg/datasets/__pycache__/indoor_dataset.cpython-38.pyc b/pcseg/datasets/__pycache__/indoor_dataset.cpython-38.pyc new file mode 100644 index 0000000..edf7631 Binary files /dev/null and b/pcseg/datasets/__pycache__/indoor_dataset.cpython-38.pyc differ diff --git a/pcseg/datasets/augmentor/__pycache__/__init__.cpython-38.pyc b/pcseg/datasets/augmentor/__pycache__/__init__.cpython-38.pyc new file mode 100644 index 0000000..21833d7 Binary files /dev/null and b/pcseg/datasets/augmentor/__pycache__/__init__.cpython-38.pyc differ diff --git a/pcseg/datasets/augmentor/__pycache__/augmentor_utils.cpython-38.pyc b/pcseg/datasets/augmentor/__pycache__/augmentor_utils.cpython-38.pyc new file mode 100644 index 0000000..d5a7193 Binary files /dev/null and b/pcseg/datasets/augmentor/__pycache__/augmentor_utils.cpython-38.pyc differ diff --git a/pcseg/datasets/augmentor/__pycache__/data_augmentor.cpython-38.pyc b/pcseg/datasets/augmentor/__pycache__/data_augmentor.cpython-38.pyc new file mode 100644 index 0000000..7adbf10 Binary files /dev/null and b/pcseg/datasets/augmentor/__pycache__/data_augmentor.cpython-38.pyc differ diff --git a/pcseg/datasets/dataset.py b/pcseg/datasets/dataset.py index 57d00c3..1b0592e 100755 --- a/pcseg/datasets/dataset.py +++ b/pcseg/datasets/dataset.py @@ -65,7 +65,7 @@ def __init__(self, dataset_cfg=None, class_names=None, training=True, root_path= self.valid_class_idx, self.ignore_label, squeeze_label=self.training) # caption config - if 'CAPTION_INFO' in self.dataset_cfg: + if self.training and 'CAPTION_INFO' in self.dataset_cfg: self.caption_cfg = self.dataset_cfg.CAPTION_INFO self.caption_keys = self.dataset_cfg.CAPTION_INFO.KEY self.caption = self.get_caption_items(self.caption_cfg) diff --git a/pcseg/datasets/processor/__pycache__/__init__.cpython-38.pyc b/pcseg/datasets/processor/__pycache__/__init__.cpython-38.pyc new file mode 100644 index 0000000..bdac192 Binary files /dev/null and b/pcseg/datasets/processor/__pycache__/__init__.cpython-38.pyc differ diff --git a/pcseg/datasets/processor/__pycache__/data_processor.cpython-38.pyc b/pcseg/datasets/processor/__pycache__/data_processor.cpython-38.pyc new file mode 100644 index 0000000..438de93 Binary files /dev/null and b/pcseg/datasets/processor/__pycache__/data_processor.cpython-38.pyc differ diff --git a/pcseg/datasets/s3dis/__pycache__/s3dis_dataset.cpython-38.pyc b/pcseg/datasets/s3dis/__pycache__/s3dis_dataset.cpython-38.pyc new file mode 100644 index 0000000..9f704ea Binary files /dev/null and b/pcseg/datasets/s3dis/__pycache__/s3dis_dataset.cpython-38.pyc differ diff --git a/pcseg/datasets/s3dis/s3dis_dataset.py b/pcseg/datasets/s3dis/s3dis_dataset.py index d56d235..b07e006 100644 --- a/pcseg/datasets/s3dis/s3dis_dataset.py +++ b/pcseg/datasets/s3dis/s3dis_dataset.py @@ -332,7 +332,10 @@ def __init__(self, dataset_cfg, class_names, training, root_path, logger=None): S3DISDataset.__init__(self, dataset_cfg, class_names, training, root_path, logger=logger) self.inst_class_idx = dataset_cfg.inst_class_idx self.inst_label_shift = dataset_cfg.inst_label_shift - if 'base_class_idx' in dataset_cfg: + if 'base_inst_class_idx' in dataset_cfg: + self.base_inst_class_idx = dataset_cfg.base_inst_class_idx + self.novel_inst_class_idx = dataset_cfg.novel_inst_class_idx + elif 'base_class_idx' in dataset_cfg: self.base_inst_class_idx = self.base_class_idx self.novel_inst_class_idx = self.novel_class_idx self.sem2ins_classes = dataset_cfg.sem2ins_classes diff --git a/pcseg/datasets/scannet/__pycache__/scannet_dataset.cpython-38.pyc b/pcseg/datasets/scannet/__pycache__/scannet_dataset.cpython-38.pyc new file mode 100644 index 0000000..77d436b Binary files /dev/null and b/pcseg/datasets/scannet/__pycache__/scannet_dataset.cpython-38.pyc differ diff --git a/pcseg/datasets/scannet/scannet_dataset.py b/pcseg/datasets/scannet/scannet_dataset.py index 60305ae..45421ec 100755 --- a/pcseg/datasets/scannet/scannet_dataset.py +++ b/pcseg/datasets/scannet/scannet_dataset.py @@ -309,7 +309,10 @@ def __init__(self, dataset_cfg, class_names, training, root_path, logger=None): ScanNetDataset.__init__(self, dataset_cfg, class_names, training, root_path, logger=logger) self.inst_class_idx = dataset_cfg.inst_class_idx self.inst_label_shift = dataset_cfg.inst_label_shift - if 'base_class_idx' in dataset_cfg: + if 'base_inst_class_idx' in dataset_cfg: + self.base_inst_class_idx = dataset_cfg.base_inst_class_idx + self.novel_inst_class_idx = dataset_cfg.novel_inst_class_idx + elif 'base_class_idx' in dataset_cfg: self.base_inst_class_idx = np.array(self.base_class_idx)[dataset_cfg.inst_label_shift:] - self.inst_label_shift self.novel_inst_class_idx = np.array(self.novel_class_idx) - self.inst_label_shift self.sem2ins_classes = dataset_cfg.sem2ins_classes diff --git a/pcseg/models/__pycache__/__init__.cpython-38.pyc b/pcseg/models/__pycache__/__init__.cpython-38.pyc new file mode 100644 index 0000000..a41d0e4 Binary files /dev/null and b/pcseg/models/__pycache__/__init__.cpython-38.pyc differ diff --git a/pcseg/models/adapter/__pycache__/__init__.cpython-38.pyc b/pcseg/models/adapter/__pycache__/__init__.cpython-38.pyc new file mode 100644 index 0000000..ef01c1c Binary files /dev/null and b/pcseg/models/adapter/__pycache__/__init__.cpython-38.pyc differ diff --git a/pcseg/models/adapter/__pycache__/vl_adapter.cpython-38.pyc b/pcseg/models/adapter/__pycache__/vl_adapter.cpython-38.pyc new file mode 100644 index 0000000..8e7c074 Binary files /dev/null and b/pcseg/models/adapter/__pycache__/vl_adapter.cpython-38.pyc differ diff --git a/pcseg/models/head/__pycache__/__init__.cpython-38.pyc b/pcseg/models/head/__pycache__/__init__.cpython-38.pyc new file mode 100644 index 0000000..b0979dc Binary files /dev/null and b/pcseg/models/head/__pycache__/__init__.cpython-38.pyc differ diff --git a/pcseg/models/head/__pycache__/binary_head.cpython-38.pyc b/pcseg/models/head/__pycache__/binary_head.cpython-38.pyc new file mode 100644 index 0000000..f130149 Binary files /dev/null and b/pcseg/models/head/__pycache__/binary_head.cpython-38.pyc differ diff --git a/pcseg/models/head/__pycache__/caption_head.cpython-38.pyc b/pcseg/models/head/__pycache__/caption_head.cpython-38.pyc new file mode 100644 index 0000000..3f4a743 Binary files /dev/null and b/pcseg/models/head/__pycache__/caption_head.cpython-38.pyc differ diff --git a/pcseg/models/head/__pycache__/inst_head.cpython-38.pyc b/pcseg/models/head/__pycache__/inst_head.cpython-38.pyc new file mode 100644 index 0000000..d656cfb Binary files /dev/null and b/pcseg/models/head/__pycache__/inst_head.cpython-38.pyc differ diff --git a/pcseg/models/head/__pycache__/linear_head.cpython-38.pyc b/pcseg/models/head/__pycache__/linear_head.cpython-38.pyc new file mode 100644 index 0000000..a2c2423 Binary files /dev/null and b/pcseg/models/head/__pycache__/linear_head.cpython-38.pyc differ diff --git a/pcseg/models/head/__pycache__/text_seg_head.cpython-38.pyc b/pcseg/models/head/__pycache__/text_seg_head.cpython-38.pyc new file mode 100644 index 0000000..458a831 Binary files /dev/null and b/pcseg/models/head/__pycache__/text_seg_head.cpython-38.pyc differ diff --git a/pcseg/models/head/inst_head.py b/pcseg/models/head/inst_head.py index ef1f8ab..9f17121 100644 --- a/pcseg/models/head/inst_head.py +++ b/pcseg/models/head/inst_head.py @@ -78,6 +78,7 @@ def __init__(self, model_cfg, in_channel, inst_class_idx, sem2ins_classes, else: self.train_sem_classes = self.valid_class_idx self.test_sem_classes = self.valid_class_idx + self.correct_seg_pred_binary = model_cfg.get('CORRECT_SEG_PRED_BINARY', True) self.forward_ret_dict = {} @@ -118,7 +119,7 @@ def forward_grouping(self, batch_size, semantic_scores, pt_offsets, batch_idxs, binary_scores_list = [] _semantic_scores = semantic_scores.clone() - if not self.training and binary_scores is not None: + if not self.training and binary_scores is not None and self.correct_seg_pred_binary: base_semantic_scores = semantic_scores[..., self.base_class_idx].softmax(dim=-1) novel_semantic_scores = semantic_scores[..., self.novel_class_idx].softmax(dim=-1) semantic_scores = semantic_scores.clone() @@ -244,7 +245,7 @@ def get_instances(self, scan_id, proposals_idx, semantic_scores, cls_scores, iou num_instances = cls_scores.size(0) num_points = semantic_scores.size(0) - if binary_scores is not None: + if self.correct_seg_pred_binary and binary_scores is not None: assert proposal_binary_scores is not None base_cls_scores = cls_scores[..., self.inst_base_class_idx].softmax(dim=-1) novel_cls_scores = cls_scores[..., self.inst_novel_class_idx].softmax(dim=-1) @@ -292,7 +293,7 @@ def get_instances(self, scan_id, proposals_idx, semantic_scores, cls_scores, iou mask_pred = torch.zeros((num_instances, num_points), dtype=torch.int8, device='cuda') mask_inds = cur_mask_scores > self.test_cfg.MASK_SCORE_THR - cur_proposals_idx = proposals_idx[mask_inds].long() + cur_proposals_idx = proposals_idx[mask_inds.cpu()].long() mask_pred[cur_proposals_idx[:, 0], cur_proposals_idx[:, 1]] = 1 # filter low score instance diff --git a/pcseg/models/head/text_seg_head.py b/pcseg/models/head/text_seg_head.py index 19283f9..6d85439 100755 --- a/pcseg/models/head/text_seg_head.py +++ b/pcseg/models/head/text_seg_head.py @@ -11,7 +11,7 @@ class TextSegHead(nn.Module): - def __init__(self, model_cfg, in_channel, ignore_label, **kwargs): + def __init__(self, model_cfg, in_channel, ignore_label, valid_class_idx, **kwargs): super(TextSegHead, self).__init__() self.model_cfg = model_cfg self.in_channel = in_channel @@ -36,14 +36,10 @@ def __init__(self, model_cfg, in_channel, ignore_label, **kwargs): param.requires_grad = False # open vocab - self.valid_class_idx = [i for i in range(len(cfg.CLASS_NAMES))] + self.valid_class_idx = valid_class_idx if hasattr(cfg.DATA_CONFIG, 'base_class_idx'): self.base_class_idx = cfg.DATA_CONFIG.base_class_idx self.novel_class_idx = cfg.DATA_CONFIG.novel_class_idx - if hasattr(cfg.DATA_CONFIG, 'ignore_class_idx'): - self.ignore_class_idx = cfg.DATA_CONFIG.ignore_class_idx - for i in self.ignore_class_idx: - self.valid_class_idx.remove(i) # remap category name for ambigous categories self.need_class_mapping = self.model_cfg.get('CLASS_MAPPING', False) diff --git a/pcseg/models/model_utils/__pycache__/__init__.cpython-38.pyc b/pcseg/models/model_utils/__pycache__/__init__.cpython-38.pyc new file mode 100644 index 0000000..9675a23 Binary files /dev/null and b/pcseg/models/model_utils/__pycache__/__init__.cpython-38.pyc differ diff --git a/pcseg/models/model_utils/__pycache__/basic_block_1d.cpython-38.pyc b/pcseg/models/model_utils/__pycache__/basic_block_1d.cpython-38.pyc new file mode 100644 index 0000000..c7afbf9 Binary files /dev/null and b/pcseg/models/model_utils/__pycache__/basic_block_1d.cpython-38.pyc differ diff --git a/pcseg/models/model_utils/__pycache__/fp16.cpython-38.pyc b/pcseg/models/model_utils/__pycache__/fp16.cpython-38.pyc new file mode 100644 index 0000000..34dd704 Binary files /dev/null and b/pcseg/models/model_utils/__pycache__/fp16.cpython-38.pyc differ diff --git a/pcseg/models/model_utils/__pycache__/rle_utils.cpython-38.pyc b/pcseg/models/model_utils/__pycache__/rle_utils.cpython-38.pyc new file mode 100644 index 0000000..88dda6c Binary files /dev/null and b/pcseg/models/model_utils/__pycache__/rle_utils.cpython-38.pyc differ diff --git a/pcseg/models/model_utils/__pycache__/unet_blocks.cpython-38.pyc b/pcseg/models/model_utils/__pycache__/unet_blocks.cpython-38.pyc new file mode 100644 index 0000000..10b8b14 Binary files /dev/null and b/pcseg/models/model_utils/__pycache__/unet_blocks.cpython-38.pyc differ diff --git a/pcseg/models/text_networks/__pycache__/__init__.cpython-38.pyc b/pcseg/models/text_networks/__pycache__/__init__.cpython-38.pyc new file mode 100644 index 0000000..80e2e8b Binary files /dev/null and b/pcseg/models/text_networks/__pycache__/__init__.cpython-38.pyc differ diff --git a/pcseg/models/text_networks/__pycache__/prompt_template.cpython-38.pyc b/pcseg/models/text_networks/__pycache__/prompt_template.cpython-38.pyc new file mode 100644 index 0000000..faebc6f Binary files /dev/null and b/pcseg/models/text_networks/__pycache__/prompt_template.cpython-38.pyc differ diff --git a/pcseg/models/text_networks/__pycache__/text_models.cpython-38.pyc b/pcseg/models/text_networks/__pycache__/text_models.cpython-38.pyc new file mode 100644 index 0000000..4800df7 Binary files /dev/null and b/pcseg/models/text_networks/__pycache__/text_models.cpython-38.pyc differ diff --git a/pcseg/models/vision_backbones_3d/__pycache__/__init__.cpython-38.pyc b/pcseg/models/vision_backbones_3d/__pycache__/__init__.cpython-38.pyc new file mode 100644 index 0000000..1e7c3cb Binary files /dev/null and b/pcseg/models/vision_backbones_3d/__pycache__/__init__.cpython-38.pyc differ diff --git a/pcseg/models/vision_backbones_3d/__pycache__/spconv_unet_indoor.cpython-38.pyc b/pcseg/models/vision_backbones_3d/__pycache__/spconv_unet_indoor.cpython-38.pyc new file mode 100644 index 0000000..76208e8 Binary files /dev/null and b/pcseg/models/vision_backbones_3d/__pycache__/spconv_unet_indoor.cpython-38.pyc differ diff --git a/pcseg/models/vision_backbones_3d/vfe/__pycache__/__init__.cpython-38.pyc b/pcseg/models/vision_backbones_3d/vfe/__pycache__/__init__.cpython-38.pyc new file mode 100644 index 0000000..4cfca89 Binary files /dev/null and b/pcseg/models/vision_backbones_3d/vfe/__pycache__/__init__.cpython-38.pyc differ diff --git a/pcseg/models/vision_backbones_3d/vfe/__pycache__/indoor_vfe.cpython-38.pyc b/pcseg/models/vision_backbones_3d/vfe/__pycache__/indoor_vfe.cpython-38.pyc new file mode 100644 index 0000000..ba9e942 Binary files /dev/null and b/pcseg/models/vision_backbones_3d/vfe/__pycache__/indoor_vfe.cpython-38.pyc differ diff --git a/pcseg/models/vision_backbones_3d/vfe/__pycache__/vfe_template.cpython-38.pyc b/pcseg/models/vision_backbones_3d/vfe/__pycache__/vfe_template.cpython-38.pyc new file mode 100644 index 0000000..539b19a Binary files /dev/null and b/pcseg/models/vision_backbones_3d/vfe/__pycache__/vfe_template.cpython-38.pyc differ diff --git a/pcseg/models/vision_networks/__pycache__/__init__.cpython-38.pyc b/pcseg/models/vision_networks/__pycache__/__init__.cpython-38.pyc new file mode 100644 index 0000000..deea5cb Binary files /dev/null and b/pcseg/models/vision_networks/__pycache__/__init__.cpython-38.pyc differ diff --git a/pcseg/models/vision_networks/__pycache__/network_template.cpython-38.pyc b/pcseg/models/vision_networks/__pycache__/network_template.cpython-38.pyc new file mode 100644 index 0000000..ce2b988 Binary files /dev/null and b/pcseg/models/vision_networks/__pycache__/network_template.cpython-38.pyc differ diff --git a/pcseg/models/vision_networks/__pycache__/sparseunet_textseg.cpython-38.pyc b/pcseg/models/vision_networks/__pycache__/sparseunet_textseg.cpython-38.pyc new file mode 100644 index 0000000..4a5a166 Binary files /dev/null and b/pcseg/models/vision_networks/__pycache__/sparseunet_textseg.cpython-38.pyc differ diff --git a/pcseg/models/vision_networks/network_template.py b/pcseg/models/vision_networks/network_template.py index 57583bb..bf1d884 100755 --- a/pcseg/models/vision_networks/network_template.py +++ b/pcseg/models/vision_networks/network_template.py @@ -80,7 +80,8 @@ def build_task_head(self, model_info_dict): model_cfg=self.model_cfg.TASK_HEAD, in_channel=in_channel, ignore_label=self.dataset.ignore_label, - num_class=self.num_class + num_class=self.num_class, + valid_class_idx=self.dataset.valid_class_idx ) model_info_dict['module_list'].append(task_head_module) return task_head_module, model_info_dict diff --git a/tools/cfgs/s3dis_models/inst/softgroup_clip_base8_caption_adamw.yaml b/tools/cfgs/s3dis_models/inst/softgroup_clip_base8_caption_adamw.yaml index 2228b6e..01e3d2a 100644 --- a/tools/cfgs/s3dis_models/inst/softgroup_clip_base8_caption_adamw.yaml +++ b/tools/cfgs/s3dis_models/inst/softgroup_clip_base8_caption_adamw.yaml @@ -61,3 +61,6 @@ MODEL: SCENE: 0.0 VIEW: 0.08 ENTITY: 0.02 + + INST_HEAD: + CORRECT_SEG_PRED_BINARY: Tru \ No newline at end of file diff --git a/tools/cfgs/scannet_models/inst/softgroup_clip_base13_caption_adamw.yaml b/tools/cfgs/scannet_models/inst/softgroup_clip_base13_caption_adamw.yaml index 16f7ae3..b49e308 100644 --- a/tools/cfgs/scannet_models/inst/softgroup_clip_base13_caption_adamw.yaml +++ b/tools/cfgs/scannet_models/inst/softgroup_clip_base13_caption_adamw.yaml @@ -62,3 +62,6 @@ MODEL: SCENE: 0.0 VIEW: 0.05 ENTITY: 0.05 + + INST_HEAD: + CORRECT_SEG_PRED_BINARY: True diff --git a/tools/cfgs/scannet_models/inst/softgroup_clip_openvocab_test.yaml b/tools/cfgs/scannet_models/inst/softgroup_clip_openvocab_test.yaml new file mode 100644 index 0000000..df502bd --- /dev/null +++ b/tools/cfgs/scannet_models/inst/softgroup_clip_openvocab_test.yaml @@ -0,0 +1,28 @@ +_BASE_CONFIG_: cfgs/scannet_models/inst/softgroup_clip_base13_caption_adamw.yaml + +DATA_CONFIG: + # TODO: split the input categories into base/novel/ignore. + # Note that if you has gropud-truth annotations for the test samples, + # you need to carefully set thoese parameters to evaluate the performance quantitatively. + # If you just want to evaluate it qualitiatively, you can just put all the categories into base_class_idx. + base_class_idx: [ 0, 1, 2, 3, 4] + novel_class_idx: [] + ignore_class_idx: [ ] + + # TODO: split the categories into inst_base/inst_novel + inst_class_idx: [2, 3] + base_inst_class_idx: [0, 1] # the base category indices for instance categories. The length of this list should be the same as or smaller than the length of inst_class_idx + novel_inst_class_idx: [] + +MODEL: + TASK_HEAD: + CORRECT_SEG_PRED_BINARY: True # TODO: For out-of-domain data, set this to False probably leads to better performance + + INST_HEAD: + CORRECT_SEG_PRED_BINARY: True # TODO: For out-of-domain data, set this to False probably leads to better performance + CLUSTERING: + PREPARE_EPOCH: -1 + +TEXT_ENCODER: + EXTRACT_EMBED: True + CATEGORY_NAMES: [door, window, desk, keyboard, others] # TODO: input your custom categories \ No newline at end of file diff --git a/tools/cfgs/scannet_models/spconv_clip_openvocab_test.yaml b/tools/cfgs/scannet_models/spconv_clip_openvocab_test.yaml new file mode 100644 index 0000000..05a02f3 --- /dev/null +++ b/tools/cfgs/scannet_models/spconv_clip_openvocab_test.yaml @@ -0,0 +1,18 @@ +_BASE_CONFIG_: cfgs/scannet_models/spconv_clip_base15_caption_adamw.yaml + +DATA_CONFIG: + # TODO: split the input categories into base/novel/ignore. + # Note that if you has gropud-truth annotations for the test samples, + # you need to carefully set thoese parameters to evaluate the performance quantitatively. + # If you just want to evaluate it qualitiatively, you can just put all the categories into base_class_idx. + base_class_idx: [ 0, 1, 2, 3, 4] + novel_class_idx: [] + ignore_class_idx: [ ] + +MODEL: + TASK_HEAD: + CORRECT_SEG_PRED_BINARY: True # TODO: For out-of-domain data, set this to False probably leads to better performance + +TEXT_ENCODER: + EXTRACT_EMBED: True + CATEGORY_NAMES: [door, window, desk, keyboard, others] # TODO: input your custom categories \ No newline at end of file diff --git a/tools/eval_utils/inst_eval/eval_utils.py b/tools/eval_utils/inst_eval/eval_utils.py index 1dda7a7..a234c08 100644 --- a/tools/eval_utils/inst_eval/eval_utils.py +++ b/tools/eval_utils/inst_eval/eval_utils.py @@ -40,8 +40,8 @@ def evaluate_matches(self, matches): dist_confs = [self.distance_confs[0]] # results: class x iou - ap = np.zeros((len(dist_threshes), len(self.eval_class_labels), len(ious)), np.float) - rc = np.zeros((len(dist_threshes), len(self.eval_class_labels), len(ious)), np.float) + ap = np.zeros((len(dist_threshes), len(self.eval_class_labels), len(ious)), np.float32) + rc = np.zeros((len(dist_threshes), len(self.eval_class_labels), len(ious)), np.float32) for di, (min_region_size, distance_thresh, distance_conf) in enumerate(zip(min_region_sizes, dist_threshes, dist_confs)): for oi, iou_th in enumerate(ious): @@ -74,7 +74,7 @@ def evaluate_matches(self, matches): cur_true = np.ones(len(gt_instances)) cur_score = np.ones(len(gt_instances)) * (-float('inf')) - cur_match = np.zeros(len(gt_instances), dtype=np.bool) + cur_match = np.zeros(len(gt_instances), dtype=bool) # collect matches for (gti, gt) in enumerate(gt_instances): found_match = False diff --git a/tools/process_tools/generate_caption_idx.py b/tools/process_tools/generate_caption_idx.py index 9ed20ef..497a60a 100644 --- a/tools/process_tools/generate_caption_idx.py +++ b/tools/process_tools/generate_caption_idx.py @@ -236,6 +236,6 @@ def get_entity_caption_corr_idx(self, view_entity_caption, view_caption_corr_idx --view_caption_path ./data/scannetv2/text_embed/caption_view_scannet_vit-gpt2-image-captioning_25k.json \ --view_caption_corr_idx_path ./data/scannetv2/scannetv2_view_vit-gpt2_matching_idx.pickle """ - processor.create_caption_idx(args.workers) + processor.create_entity_caption_idx(args.workers) else: raise NotImplementedError diff --git a/tools/test.py b/tools/test.py index 10bfa33..21c7de5 100755 --- a/tools/test.py +++ b/tools/test.py @@ -203,9 +203,13 @@ def main(): common_utils.oss_data_client = common_utils.OSSClient() logger.info(f'Ceph client initialization with root path at {cfg.DATA_CONFIG.OSS_PATH}') + if cfg.get('TEXT_ENCODER', None) and cfg.TEXT_ENCODER.EXTRACT_EMBED: + class_names = cfg.TEXT_ENCODER.CATEGORY_NAMES + else: + class_names = cfg.CLASS_NAMES test_set, test_loader, sampler = build_dataloader( dataset_cfg=cfg.DATA_CONFIG, - class_names=cfg.CLASS_NAMES, + class_names=class_names, batch_size=args.batch_size, dist=dist_test, workers=args.workers, logger=logger, training=False )