diff --git a/README.md b/README.md
index 06e2a5e..2eea3cf 100644
--- a/README.md
+++ b/README.md
@@ -1,54 +1,48 @@
-
PLA: Language-Driven Open-Vocabulary 3D Scene Understanding
+
PLA & RegionPLC
+
This repo contains the official implementation of PLA (CVPR2023) and RegionPLC (CVPR 2024)
-
+
-
- 1The University of Hong Kong
- 2ByteDance
-
+
PLA: Language-Driven Open-Vocabulary 3D Scene Understanding
-**CVPR 2023**
+
CVPR 2023
-TL;DR: PLA leverages powerful VL foundation models to construct hierarchical 3D-text pairs for 3D open-world learning.
+[project page](https://dingry.github.io/projects/PLA) | [arXiv](https://arxiv.org/abs/2211.16312)
-
-
-  |
-  |
-  |
-
-
- | working space |
- piano |
- vending machine |
-
-
+
+
RegionPLC: Regional Point-Language Contrastive Learning for Open-World 3D Scene Understanding
-
+
+
CVPR 2024
-
-[project page](https://dingry.github.io/projects/PLA) | [arXiv](https://arxiv.org/abs/2211.16312)
+
project page | arXiv
-### TODO
-- [ ] Release caption processing code
+##### Highlights:
+- Official PLA implementation is contained in the `main` branch
+- Official RegionPLC implementation is contained in the `regionplc` branch
+
+### Release
+- [2024-05-05] Releasing **RegionPLC** implementation. Please checkout `regionplc` branch to try it!
### Getting Started
@@ -74,5 +68,14 @@ If you find this project useful in your research, please consider cite:
}
```
+```bibtex
+@inproceedings{yang2024regionplc,
+ title={RegionPLC: Regional point-language contrastive learning for open-world 3d scene understanding},
+ author={Yang, Jihan and Ding, Runyu and Deng, Weipeng and Wang, Zhe and Qi, Xiaojuan},
+ booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
+ year={2024}
+}
+```
+
### Acknowledgement
Code is partly borrowed from [OpenPCDet](https://github.com/open-mmlab/OpenPCDet), [PointGroup](https://github.com/dvlab-research/PointGroup) and [SoftGroup](https://github.com/thangvubk/SoftGroup).
\ No newline at end of file
diff --git a/docs/DATASET.md b/docs/DATASET.md
index f081b57..57c4581 100644
--- a/docs/DATASET.md
+++ b/docs/DATASET.md
@@ -29,7 +29,7 @@ The dataset configs are located within [tools/cfgs/dataset_configs](../tools/cfg
python3 pcseg/datasets/s3dis/preprocess.py
```
-- Additionally, please download the caption data [here](https://connecthkuhk-my.sharepoint.com/:f:/g/personal/u3007346_connect_hku_hk/EoNAsU5f8YRGtQYV8ewhwvQB7QPbxT-uwKqTk8FPiyUTtQ?e=wq58H7). Download image data [here](https://github.com/alexsax/2D-3D-Semantics) if you want to generate captions on your own.
+- Additionally, please download the caption data [here](https://connecthkuhk-my.sharepoint.com/:f:/g/personal/u3007346_connect_hku_hk/EoNAsU5f8YRGtQYV8ewhwvQB7QPbxT-uwKqTk8FPiyUTtQ?e=wq58H7). If you want to generate captions on your own, please download image data [here](https://github.com/alexsax/2D-3D-Semantics) and follows scripts here: [generate_caption.py](../tools/process_tools/generate_caption.py) and [generate_caption_idx.py](../tools/process_tools/generate_caption_idx.py).
- The directory organization should be as follows:
@@ -46,5 +46,3 @@ The dataset configs are located within [tools/cfgs/dataset_configs](../tools/cfg
├── pcseg
├── tools
```
-
-The scripts that process S3DIS images to generate captions and corresponding point indices will be available soon.
diff --git a/docs/INFER.md b/docs/INFER.md
new file mode 100644
index 0000000..9bb2419
--- /dev/null
+++ b/docs/INFER.md
@@ -0,0 +1,15 @@
+If you wish to test on custom 3D scenes or categories, you can utilize our example configs:
+ `tools/cfgs/scannet_models/spconv_clip_openvocab.yaml` and `tools/cfgs/scannet_models/inst/softgroup_clip_openvocab.yaml`
+
+The key parameters to consider are as follows:
+- `TEXT_EMBED.CATEGORY_NAMES`
+
+ This parameter allows you to define the category list for segmentation.
+
+- `TASK_HEAD.CORRECT_SEG_PRED_BINARY` and `INST_HEAD.CORRECT_SEG_PRED_BINARY`
+
+ These parameters allow you to decide using binary head to rectify semantic scores or not.
+
+
+To save the results, you can use the command `--save_results semantic,instance`. Afterward, you can employ the visualization utilities found in tools/visual_utils/visualize_indoor.py to visualize the predicted results.
+
diff --git a/docs/INSTALL.md b/docs/INSTALL.md
index cf4df85..b95a60f 100644
--- a/docs/INSTALL.md
+++ b/docs/INSTALL.md
@@ -7,13 +7,13 @@ All the codes are tested in the following environment:
#### Install dependent libraries
a. Clone this repository.
-```shell
+```bash
git clone https://github.com/CVMI-Lab/PLA.git
```
b. Install the dependent libraries as follows:
-* Install the dependent python libraries:
+* Install the dependent Python libraries (Please note that you need to install the correct version of `torch` and `spconv` according to your CUDA version):
```bash
pip install -r requirements.txt
```
@@ -28,4 +28,4 @@ b. Install the dependent libraries as follows:
* Install [pcseg](../pcseg)
```bash
python3 setup.py develop
- ```
\ No newline at end of file
+ ```
diff --git a/pcseg/datasets/__pycache__/__init__.cpython-38.pyc b/pcseg/datasets/__pycache__/__init__.cpython-38.pyc
new file mode 100644
index 0000000..c190c9c
Binary files /dev/null and b/pcseg/datasets/__pycache__/__init__.cpython-38.pyc differ
diff --git a/pcseg/datasets/__pycache__/dataset.cpython-38.pyc b/pcseg/datasets/__pycache__/dataset.cpython-38.pyc
new file mode 100644
index 0000000..8ff101c
Binary files /dev/null and b/pcseg/datasets/__pycache__/dataset.cpython-38.pyc differ
diff --git a/pcseg/datasets/__pycache__/indoor_dataset.cpython-38.pyc b/pcseg/datasets/__pycache__/indoor_dataset.cpython-38.pyc
new file mode 100644
index 0000000..edf7631
Binary files /dev/null and b/pcseg/datasets/__pycache__/indoor_dataset.cpython-38.pyc differ
diff --git a/pcseg/datasets/augmentor/__pycache__/__init__.cpython-38.pyc b/pcseg/datasets/augmentor/__pycache__/__init__.cpython-38.pyc
new file mode 100644
index 0000000..21833d7
Binary files /dev/null and b/pcseg/datasets/augmentor/__pycache__/__init__.cpython-38.pyc differ
diff --git a/pcseg/datasets/augmentor/__pycache__/augmentor_utils.cpython-38.pyc b/pcseg/datasets/augmentor/__pycache__/augmentor_utils.cpython-38.pyc
new file mode 100644
index 0000000..d5a7193
Binary files /dev/null and b/pcseg/datasets/augmentor/__pycache__/augmentor_utils.cpython-38.pyc differ
diff --git a/pcseg/datasets/augmentor/__pycache__/data_augmentor.cpython-38.pyc b/pcseg/datasets/augmentor/__pycache__/data_augmentor.cpython-38.pyc
new file mode 100644
index 0000000..7adbf10
Binary files /dev/null and b/pcseg/datasets/augmentor/__pycache__/data_augmentor.cpython-38.pyc differ
diff --git a/pcseg/datasets/dataset.py b/pcseg/datasets/dataset.py
index 57d00c3..1b0592e 100755
--- a/pcseg/datasets/dataset.py
+++ b/pcseg/datasets/dataset.py
@@ -65,7 +65,7 @@ def __init__(self, dataset_cfg=None, class_names=None, training=True, root_path=
self.valid_class_idx, self.ignore_label, squeeze_label=self.training)
# caption config
- if 'CAPTION_INFO' in self.dataset_cfg:
+ if self.training and 'CAPTION_INFO' in self.dataset_cfg:
self.caption_cfg = self.dataset_cfg.CAPTION_INFO
self.caption_keys = self.dataset_cfg.CAPTION_INFO.KEY
self.caption = self.get_caption_items(self.caption_cfg)
diff --git a/pcseg/datasets/processor/__pycache__/__init__.cpython-38.pyc b/pcseg/datasets/processor/__pycache__/__init__.cpython-38.pyc
new file mode 100644
index 0000000..bdac192
Binary files /dev/null and b/pcseg/datasets/processor/__pycache__/__init__.cpython-38.pyc differ
diff --git a/pcseg/datasets/processor/__pycache__/data_processor.cpython-38.pyc b/pcseg/datasets/processor/__pycache__/data_processor.cpython-38.pyc
new file mode 100644
index 0000000..438de93
Binary files /dev/null and b/pcseg/datasets/processor/__pycache__/data_processor.cpython-38.pyc differ
diff --git a/pcseg/datasets/s3dis/__pycache__/s3dis_dataset.cpython-38.pyc b/pcseg/datasets/s3dis/__pycache__/s3dis_dataset.cpython-38.pyc
new file mode 100644
index 0000000..9f704ea
Binary files /dev/null and b/pcseg/datasets/s3dis/__pycache__/s3dis_dataset.cpython-38.pyc differ
diff --git a/pcseg/datasets/s3dis/s3dis_dataset.py b/pcseg/datasets/s3dis/s3dis_dataset.py
index d56d235..b07e006 100644
--- a/pcseg/datasets/s3dis/s3dis_dataset.py
+++ b/pcseg/datasets/s3dis/s3dis_dataset.py
@@ -332,7 +332,10 @@ def __init__(self, dataset_cfg, class_names, training, root_path, logger=None):
S3DISDataset.__init__(self, dataset_cfg, class_names, training, root_path, logger=logger)
self.inst_class_idx = dataset_cfg.inst_class_idx
self.inst_label_shift = dataset_cfg.inst_label_shift
- if 'base_class_idx' in dataset_cfg:
+ if 'base_inst_class_idx' in dataset_cfg:
+ self.base_inst_class_idx = dataset_cfg.base_inst_class_idx
+ self.novel_inst_class_idx = dataset_cfg.novel_inst_class_idx
+ elif 'base_class_idx' in dataset_cfg:
self.base_inst_class_idx = self.base_class_idx
self.novel_inst_class_idx = self.novel_class_idx
self.sem2ins_classes = dataset_cfg.sem2ins_classes
diff --git a/pcseg/datasets/scannet/__pycache__/scannet_dataset.cpython-38.pyc b/pcseg/datasets/scannet/__pycache__/scannet_dataset.cpython-38.pyc
new file mode 100644
index 0000000..77d436b
Binary files /dev/null and b/pcseg/datasets/scannet/__pycache__/scannet_dataset.cpython-38.pyc differ
diff --git a/pcseg/datasets/scannet/scannet_dataset.py b/pcseg/datasets/scannet/scannet_dataset.py
index 60305ae..45421ec 100755
--- a/pcseg/datasets/scannet/scannet_dataset.py
+++ b/pcseg/datasets/scannet/scannet_dataset.py
@@ -309,7 +309,10 @@ def __init__(self, dataset_cfg, class_names, training, root_path, logger=None):
ScanNetDataset.__init__(self, dataset_cfg, class_names, training, root_path, logger=logger)
self.inst_class_idx = dataset_cfg.inst_class_idx
self.inst_label_shift = dataset_cfg.inst_label_shift
- if 'base_class_idx' in dataset_cfg:
+ if 'base_inst_class_idx' in dataset_cfg:
+ self.base_inst_class_idx = dataset_cfg.base_inst_class_idx
+ self.novel_inst_class_idx = dataset_cfg.novel_inst_class_idx
+ elif 'base_class_idx' in dataset_cfg:
self.base_inst_class_idx = np.array(self.base_class_idx)[dataset_cfg.inst_label_shift:] - self.inst_label_shift
self.novel_inst_class_idx = np.array(self.novel_class_idx) - self.inst_label_shift
self.sem2ins_classes = dataset_cfg.sem2ins_classes
diff --git a/pcseg/models/__pycache__/__init__.cpython-38.pyc b/pcseg/models/__pycache__/__init__.cpython-38.pyc
new file mode 100644
index 0000000..a41d0e4
Binary files /dev/null and b/pcseg/models/__pycache__/__init__.cpython-38.pyc differ
diff --git a/pcseg/models/adapter/__pycache__/__init__.cpython-38.pyc b/pcseg/models/adapter/__pycache__/__init__.cpython-38.pyc
new file mode 100644
index 0000000..ef01c1c
Binary files /dev/null and b/pcseg/models/adapter/__pycache__/__init__.cpython-38.pyc differ
diff --git a/pcseg/models/adapter/__pycache__/vl_adapter.cpython-38.pyc b/pcseg/models/adapter/__pycache__/vl_adapter.cpython-38.pyc
new file mode 100644
index 0000000..8e7c074
Binary files /dev/null and b/pcseg/models/adapter/__pycache__/vl_adapter.cpython-38.pyc differ
diff --git a/pcseg/models/head/__pycache__/__init__.cpython-38.pyc b/pcseg/models/head/__pycache__/__init__.cpython-38.pyc
new file mode 100644
index 0000000..b0979dc
Binary files /dev/null and b/pcseg/models/head/__pycache__/__init__.cpython-38.pyc differ
diff --git a/pcseg/models/head/__pycache__/binary_head.cpython-38.pyc b/pcseg/models/head/__pycache__/binary_head.cpython-38.pyc
new file mode 100644
index 0000000..f130149
Binary files /dev/null and b/pcseg/models/head/__pycache__/binary_head.cpython-38.pyc differ
diff --git a/pcseg/models/head/__pycache__/caption_head.cpython-38.pyc b/pcseg/models/head/__pycache__/caption_head.cpython-38.pyc
new file mode 100644
index 0000000..3f4a743
Binary files /dev/null and b/pcseg/models/head/__pycache__/caption_head.cpython-38.pyc differ
diff --git a/pcseg/models/head/__pycache__/inst_head.cpython-38.pyc b/pcseg/models/head/__pycache__/inst_head.cpython-38.pyc
new file mode 100644
index 0000000..d656cfb
Binary files /dev/null and b/pcseg/models/head/__pycache__/inst_head.cpython-38.pyc differ
diff --git a/pcseg/models/head/__pycache__/linear_head.cpython-38.pyc b/pcseg/models/head/__pycache__/linear_head.cpython-38.pyc
new file mode 100644
index 0000000..a2c2423
Binary files /dev/null and b/pcseg/models/head/__pycache__/linear_head.cpython-38.pyc differ
diff --git a/pcseg/models/head/__pycache__/text_seg_head.cpython-38.pyc b/pcseg/models/head/__pycache__/text_seg_head.cpython-38.pyc
new file mode 100644
index 0000000..458a831
Binary files /dev/null and b/pcseg/models/head/__pycache__/text_seg_head.cpython-38.pyc differ
diff --git a/pcseg/models/head/inst_head.py b/pcseg/models/head/inst_head.py
index ef1f8ab..9f17121 100644
--- a/pcseg/models/head/inst_head.py
+++ b/pcseg/models/head/inst_head.py
@@ -78,6 +78,7 @@ def __init__(self, model_cfg, in_channel, inst_class_idx, sem2ins_classes,
else:
self.train_sem_classes = self.valid_class_idx
self.test_sem_classes = self.valid_class_idx
+ self.correct_seg_pred_binary = model_cfg.get('CORRECT_SEG_PRED_BINARY', True)
self.forward_ret_dict = {}
@@ -118,7 +119,7 @@ def forward_grouping(self, batch_size, semantic_scores, pt_offsets, batch_idxs,
binary_scores_list = []
_semantic_scores = semantic_scores.clone()
- if not self.training and binary_scores is not None:
+ if not self.training and binary_scores is not None and self.correct_seg_pred_binary:
base_semantic_scores = semantic_scores[..., self.base_class_idx].softmax(dim=-1)
novel_semantic_scores = semantic_scores[..., self.novel_class_idx].softmax(dim=-1)
semantic_scores = semantic_scores.clone()
@@ -244,7 +245,7 @@ def get_instances(self, scan_id, proposals_idx, semantic_scores, cls_scores, iou
num_instances = cls_scores.size(0)
num_points = semantic_scores.size(0)
- if binary_scores is not None:
+ if self.correct_seg_pred_binary and binary_scores is not None:
assert proposal_binary_scores is not None
base_cls_scores = cls_scores[..., self.inst_base_class_idx].softmax(dim=-1)
novel_cls_scores = cls_scores[..., self.inst_novel_class_idx].softmax(dim=-1)
@@ -292,7 +293,7 @@ def get_instances(self, scan_id, proposals_idx, semantic_scores, cls_scores, iou
mask_pred = torch.zeros((num_instances, num_points), dtype=torch.int8, device='cuda')
mask_inds = cur_mask_scores > self.test_cfg.MASK_SCORE_THR
- cur_proposals_idx = proposals_idx[mask_inds].long()
+ cur_proposals_idx = proposals_idx[mask_inds.cpu()].long()
mask_pred[cur_proposals_idx[:, 0], cur_proposals_idx[:, 1]] = 1
# filter low score instance
diff --git a/pcseg/models/head/text_seg_head.py b/pcseg/models/head/text_seg_head.py
index 19283f9..6d85439 100755
--- a/pcseg/models/head/text_seg_head.py
+++ b/pcseg/models/head/text_seg_head.py
@@ -11,7 +11,7 @@
class TextSegHead(nn.Module):
- def __init__(self, model_cfg, in_channel, ignore_label, **kwargs):
+ def __init__(self, model_cfg, in_channel, ignore_label, valid_class_idx, **kwargs):
super(TextSegHead, self).__init__()
self.model_cfg = model_cfg
self.in_channel = in_channel
@@ -36,14 +36,10 @@ def __init__(self, model_cfg, in_channel, ignore_label, **kwargs):
param.requires_grad = False
# open vocab
- self.valid_class_idx = [i for i in range(len(cfg.CLASS_NAMES))]
+ self.valid_class_idx = valid_class_idx
if hasattr(cfg.DATA_CONFIG, 'base_class_idx'):
self.base_class_idx = cfg.DATA_CONFIG.base_class_idx
self.novel_class_idx = cfg.DATA_CONFIG.novel_class_idx
- if hasattr(cfg.DATA_CONFIG, 'ignore_class_idx'):
- self.ignore_class_idx = cfg.DATA_CONFIG.ignore_class_idx
- for i in self.ignore_class_idx:
- self.valid_class_idx.remove(i)
# remap category name for ambigous categories
self.need_class_mapping = self.model_cfg.get('CLASS_MAPPING', False)
diff --git a/pcseg/models/model_utils/__pycache__/__init__.cpython-38.pyc b/pcseg/models/model_utils/__pycache__/__init__.cpython-38.pyc
new file mode 100644
index 0000000..9675a23
Binary files /dev/null and b/pcseg/models/model_utils/__pycache__/__init__.cpython-38.pyc differ
diff --git a/pcseg/models/model_utils/__pycache__/basic_block_1d.cpython-38.pyc b/pcseg/models/model_utils/__pycache__/basic_block_1d.cpython-38.pyc
new file mode 100644
index 0000000..c7afbf9
Binary files /dev/null and b/pcseg/models/model_utils/__pycache__/basic_block_1d.cpython-38.pyc differ
diff --git a/pcseg/models/model_utils/__pycache__/fp16.cpython-38.pyc b/pcseg/models/model_utils/__pycache__/fp16.cpython-38.pyc
new file mode 100644
index 0000000..34dd704
Binary files /dev/null and b/pcseg/models/model_utils/__pycache__/fp16.cpython-38.pyc differ
diff --git a/pcseg/models/model_utils/__pycache__/rle_utils.cpython-38.pyc b/pcseg/models/model_utils/__pycache__/rle_utils.cpython-38.pyc
new file mode 100644
index 0000000..88dda6c
Binary files /dev/null and b/pcseg/models/model_utils/__pycache__/rle_utils.cpython-38.pyc differ
diff --git a/pcseg/models/model_utils/__pycache__/unet_blocks.cpython-38.pyc b/pcseg/models/model_utils/__pycache__/unet_blocks.cpython-38.pyc
new file mode 100644
index 0000000..10b8b14
Binary files /dev/null and b/pcseg/models/model_utils/__pycache__/unet_blocks.cpython-38.pyc differ
diff --git a/pcseg/models/text_networks/__pycache__/__init__.cpython-38.pyc b/pcseg/models/text_networks/__pycache__/__init__.cpython-38.pyc
new file mode 100644
index 0000000..80e2e8b
Binary files /dev/null and b/pcseg/models/text_networks/__pycache__/__init__.cpython-38.pyc differ
diff --git a/pcseg/models/text_networks/__pycache__/prompt_template.cpython-38.pyc b/pcseg/models/text_networks/__pycache__/prompt_template.cpython-38.pyc
new file mode 100644
index 0000000..faebc6f
Binary files /dev/null and b/pcseg/models/text_networks/__pycache__/prompt_template.cpython-38.pyc differ
diff --git a/pcseg/models/text_networks/__pycache__/text_models.cpython-38.pyc b/pcseg/models/text_networks/__pycache__/text_models.cpython-38.pyc
new file mode 100644
index 0000000..4800df7
Binary files /dev/null and b/pcseg/models/text_networks/__pycache__/text_models.cpython-38.pyc differ
diff --git a/pcseg/models/vision_backbones_3d/__pycache__/__init__.cpython-38.pyc b/pcseg/models/vision_backbones_3d/__pycache__/__init__.cpython-38.pyc
new file mode 100644
index 0000000..1e7c3cb
Binary files /dev/null and b/pcseg/models/vision_backbones_3d/__pycache__/__init__.cpython-38.pyc differ
diff --git a/pcseg/models/vision_backbones_3d/__pycache__/spconv_unet_indoor.cpython-38.pyc b/pcseg/models/vision_backbones_3d/__pycache__/spconv_unet_indoor.cpython-38.pyc
new file mode 100644
index 0000000..76208e8
Binary files /dev/null and b/pcseg/models/vision_backbones_3d/__pycache__/spconv_unet_indoor.cpython-38.pyc differ
diff --git a/pcseg/models/vision_backbones_3d/vfe/__pycache__/__init__.cpython-38.pyc b/pcseg/models/vision_backbones_3d/vfe/__pycache__/__init__.cpython-38.pyc
new file mode 100644
index 0000000..4cfca89
Binary files /dev/null and b/pcseg/models/vision_backbones_3d/vfe/__pycache__/__init__.cpython-38.pyc differ
diff --git a/pcseg/models/vision_backbones_3d/vfe/__pycache__/indoor_vfe.cpython-38.pyc b/pcseg/models/vision_backbones_3d/vfe/__pycache__/indoor_vfe.cpython-38.pyc
new file mode 100644
index 0000000..ba9e942
Binary files /dev/null and b/pcseg/models/vision_backbones_3d/vfe/__pycache__/indoor_vfe.cpython-38.pyc differ
diff --git a/pcseg/models/vision_backbones_3d/vfe/__pycache__/vfe_template.cpython-38.pyc b/pcseg/models/vision_backbones_3d/vfe/__pycache__/vfe_template.cpython-38.pyc
new file mode 100644
index 0000000..539b19a
Binary files /dev/null and b/pcseg/models/vision_backbones_3d/vfe/__pycache__/vfe_template.cpython-38.pyc differ
diff --git a/pcseg/models/vision_networks/__pycache__/__init__.cpython-38.pyc b/pcseg/models/vision_networks/__pycache__/__init__.cpython-38.pyc
new file mode 100644
index 0000000..deea5cb
Binary files /dev/null and b/pcseg/models/vision_networks/__pycache__/__init__.cpython-38.pyc differ
diff --git a/pcseg/models/vision_networks/__pycache__/network_template.cpython-38.pyc b/pcseg/models/vision_networks/__pycache__/network_template.cpython-38.pyc
new file mode 100644
index 0000000..ce2b988
Binary files /dev/null and b/pcseg/models/vision_networks/__pycache__/network_template.cpython-38.pyc differ
diff --git a/pcseg/models/vision_networks/__pycache__/sparseunet_textseg.cpython-38.pyc b/pcseg/models/vision_networks/__pycache__/sparseunet_textseg.cpython-38.pyc
new file mode 100644
index 0000000..4a5a166
Binary files /dev/null and b/pcseg/models/vision_networks/__pycache__/sparseunet_textseg.cpython-38.pyc differ
diff --git a/pcseg/models/vision_networks/network_template.py b/pcseg/models/vision_networks/network_template.py
index 57583bb..bf1d884 100755
--- a/pcseg/models/vision_networks/network_template.py
+++ b/pcseg/models/vision_networks/network_template.py
@@ -80,7 +80,8 @@ def build_task_head(self, model_info_dict):
model_cfg=self.model_cfg.TASK_HEAD,
in_channel=in_channel,
ignore_label=self.dataset.ignore_label,
- num_class=self.num_class
+ num_class=self.num_class,
+ valid_class_idx=self.dataset.valid_class_idx
)
model_info_dict['module_list'].append(task_head_module)
return task_head_module, model_info_dict
diff --git a/tools/cfgs/s3dis_models/inst/softgroup_clip_base8_caption_adamw.yaml b/tools/cfgs/s3dis_models/inst/softgroup_clip_base8_caption_adamw.yaml
index 2228b6e..01e3d2a 100644
--- a/tools/cfgs/s3dis_models/inst/softgroup_clip_base8_caption_adamw.yaml
+++ b/tools/cfgs/s3dis_models/inst/softgroup_clip_base8_caption_adamw.yaml
@@ -61,3 +61,6 @@ MODEL:
SCENE: 0.0
VIEW: 0.08
ENTITY: 0.02
+
+ INST_HEAD:
+ CORRECT_SEG_PRED_BINARY: Tru
\ No newline at end of file
diff --git a/tools/cfgs/scannet_models/inst/softgroup_clip_base13_caption_adamw.yaml b/tools/cfgs/scannet_models/inst/softgroup_clip_base13_caption_adamw.yaml
index 16f7ae3..b49e308 100644
--- a/tools/cfgs/scannet_models/inst/softgroup_clip_base13_caption_adamw.yaml
+++ b/tools/cfgs/scannet_models/inst/softgroup_clip_base13_caption_adamw.yaml
@@ -62,3 +62,6 @@ MODEL:
SCENE: 0.0
VIEW: 0.05
ENTITY: 0.05
+
+ INST_HEAD:
+ CORRECT_SEG_PRED_BINARY: True
diff --git a/tools/cfgs/scannet_models/inst/softgroup_clip_openvocab_test.yaml b/tools/cfgs/scannet_models/inst/softgroup_clip_openvocab_test.yaml
new file mode 100644
index 0000000..df502bd
--- /dev/null
+++ b/tools/cfgs/scannet_models/inst/softgroup_clip_openvocab_test.yaml
@@ -0,0 +1,28 @@
+_BASE_CONFIG_: cfgs/scannet_models/inst/softgroup_clip_base13_caption_adamw.yaml
+
+DATA_CONFIG:
+ # TODO: split the input categories into base/novel/ignore.
+ # Note that if you has gropud-truth annotations for the test samples,
+ # you need to carefully set thoese parameters to evaluate the performance quantitatively.
+ # If you just want to evaluate it qualitiatively, you can just put all the categories into base_class_idx.
+ base_class_idx: [ 0, 1, 2, 3, 4]
+ novel_class_idx: []
+ ignore_class_idx: [ ]
+
+ # TODO: split the categories into inst_base/inst_novel
+ inst_class_idx: [2, 3]
+ base_inst_class_idx: [0, 1] # the base category indices for instance categories. The length of this list should be the same as or smaller than the length of inst_class_idx
+ novel_inst_class_idx: []
+
+MODEL:
+ TASK_HEAD:
+ CORRECT_SEG_PRED_BINARY: True # TODO: For out-of-domain data, set this to False probably leads to better performance
+
+ INST_HEAD:
+ CORRECT_SEG_PRED_BINARY: True # TODO: For out-of-domain data, set this to False probably leads to better performance
+ CLUSTERING:
+ PREPARE_EPOCH: -1
+
+TEXT_ENCODER:
+ EXTRACT_EMBED: True
+ CATEGORY_NAMES: [door, window, desk, keyboard, others] # TODO: input your custom categories
\ No newline at end of file
diff --git a/tools/cfgs/scannet_models/spconv_clip_openvocab_test.yaml b/tools/cfgs/scannet_models/spconv_clip_openvocab_test.yaml
new file mode 100644
index 0000000..05a02f3
--- /dev/null
+++ b/tools/cfgs/scannet_models/spconv_clip_openvocab_test.yaml
@@ -0,0 +1,18 @@
+_BASE_CONFIG_: cfgs/scannet_models/spconv_clip_base15_caption_adamw.yaml
+
+DATA_CONFIG:
+ # TODO: split the input categories into base/novel/ignore.
+ # Note that if you has gropud-truth annotations for the test samples,
+ # you need to carefully set thoese parameters to evaluate the performance quantitatively.
+ # If you just want to evaluate it qualitiatively, you can just put all the categories into base_class_idx.
+ base_class_idx: [ 0, 1, 2, 3, 4]
+ novel_class_idx: []
+ ignore_class_idx: [ ]
+
+MODEL:
+ TASK_HEAD:
+ CORRECT_SEG_PRED_BINARY: True # TODO: For out-of-domain data, set this to False probably leads to better performance
+
+TEXT_ENCODER:
+ EXTRACT_EMBED: True
+ CATEGORY_NAMES: [door, window, desk, keyboard, others] # TODO: input your custom categories
\ No newline at end of file
diff --git a/tools/eval_utils/inst_eval/eval_utils.py b/tools/eval_utils/inst_eval/eval_utils.py
index 1dda7a7..a234c08 100644
--- a/tools/eval_utils/inst_eval/eval_utils.py
+++ b/tools/eval_utils/inst_eval/eval_utils.py
@@ -40,8 +40,8 @@ def evaluate_matches(self, matches):
dist_confs = [self.distance_confs[0]]
# results: class x iou
- ap = np.zeros((len(dist_threshes), len(self.eval_class_labels), len(ious)), np.float)
- rc = np.zeros((len(dist_threshes), len(self.eval_class_labels), len(ious)), np.float)
+ ap = np.zeros((len(dist_threshes), len(self.eval_class_labels), len(ious)), np.float32)
+ rc = np.zeros((len(dist_threshes), len(self.eval_class_labels), len(ious)), np.float32)
for di, (min_region_size, distance_thresh,
distance_conf) in enumerate(zip(min_region_sizes, dist_threshes, dist_confs)):
for oi, iou_th in enumerate(ious):
@@ -74,7 +74,7 @@ def evaluate_matches(self, matches):
cur_true = np.ones(len(gt_instances))
cur_score = np.ones(len(gt_instances)) * (-float('inf'))
- cur_match = np.zeros(len(gt_instances), dtype=np.bool)
+ cur_match = np.zeros(len(gt_instances), dtype=bool)
# collect matches
for (gti, gt) in enumerate(gt_instances):
found_match = False
diff --git a/tools/process_tools/generate_caption_idx.py b/tools/process_tools/generate_caption_idx.py
index 9ed20ef..497a60a 100644
--- a/tools/process_tools/generate_caption_idx.py
+++ b/tools/process_tools/generate_caption_idx.py
@@ -236,6 +236,6 @@ def get_entity_caption_corr_idx(self, view_entity_caption, view_caption_corr_idx
--view_caption_path ./data/scannetv2/text_embed/caption_view_scannet_vit-gpt2-image-captioning_25k.json \
--view_caption_corr_idx_path ./data/scannetv2/scannetv2_view_vit-gpt2_matching_idx.pickle
"""
- processor.create_caption_idx(args.workers)
+ processor.create_entity_caption_idx(args.workers)
else:
raise NotImplementedError
diff --git a/tools/test.py b/tools/test.py
index 10bfa33..21c7de5 100755
--- a/tools/test.py
+++ b/tools/test.py
@@ -203,9 +203,13 @@ def main():
common_utils.oss_data_client = common_utils.OSSClient()
logger.info(f'Ceph client initialization with root path at {cfg.DATA_CONFIG.OSS_PATH}')
+ if cfg.get('TEXT_ENCODER', None) and cfg.TEXT_ENCODER.EXTRACT_EMBED:
+ class_names = cfg.TEXT_ENCODER.CATEGORY_NAMES
+ else:
+ class_names = cfg.CLASS_NAMES
test_set, test_loader, sampler = build_dataloader(
dataset_cfg=cfg.DATA_CONFIG,
- class_names=cfg.CLASS_NAMES,
+ class_names=class_names,
batch_size=args.batch_size,
dist=dist_test, workers=args.workers, logger=logger, training=False
)