diff --git a/README.md b/README.md
index 06e2a5e..2eea3cf 100644
--- a/README.md
+++ b/README.md
@@ -1,54 +1,48 @@
 <div align="center">
 
-<h1>PLA: Language-Driven Open-Vocabulary 3D Scene Understanding</h1>
+<h1>PLA & RegionPLC</h1>
+<p>This repo contains the official implementation of <a href="https://dingry.github.io/projects/PLA">PLA (CVPR2023)</a> and <a href="https://jihanyang.github.io/projects/RegionPLC">RegionPLC (CVPR 2024)</a></p>
 
-<div>
-    <a href="https://dingry.github.io/" target="_blank">Runyu Ding</a><sup>1*</sup>,</span>
-    <a href="https://jihanyang.github.io/" target="_blank">Jihan Yang</a><sup>1*</sup>,</span>
-    <a href="https://scholar.google.com/citations?user=KJU5YRYAAAAJ&hl=en" target="_blank">Chuhui Xue</a><sup>2</sup>,</span>
-    <a href="https://github.com/HannibalAPE" target="_blank">Wenqing Zhang</a><sup>2</sup>,</span>
-    <a href="https://songbai.site/" target="_blank">Song Bai</a><sup>2&#8224</sup>,</span>
-    <a href="https://xjqi.github.io/" target="_blank">Xiaojuan Qi</a><sup>1&#8224</sup>,</span>  
-</div>
+<hr style="color: #333; height: 2px; width: 85%">
 
-<div>
-    <sup>1</sup>The University of Hong Kong&emsp;
-    <sup>2</sup>ByteDance
-</div>
+<h4>PLA: Language-Driven Open-Vocabulary 3D Scene Understanding</h4>
 
 <div>
-    *equal contribution&emsp;
-    <sup>+</sup>corresponding author
+    <a href="https://dingry.github.io/" target="_blank">Runyu Ding</a><sup>*</sup>,</span>
+    <a href="https://jihanyang.github.io/" target="_blank">Jihan Yang</a><sup>*</sup>,</span>
+    <a href="https://scholar.google.com/citations?user=KJU5YRYAAAAJ&hl=en" target="_blank">Chuhui Xue</a><sup></sup>,</span>
+    <a href="https://github.com/HannibalAPE" target="_blank">Wenqing Zhang</a><sup></sup>,</span>
+    <a href="https://songbai.site/" target="_blank">Song Bai</a><sup>&#8224</sup>,</span>
+    <a href="https://xjqi.github.io/" target="_blank">Xiaojuan Qi</a><sup>&#8224</sup>,</span>  
 </div>
 
-**CVPR 2023**
+<p><em>CVPR 2023</em></p>
 
-TL;DR: PLA leverages powerful VL foundation models to construct hierarchical 3D-text pairs for 3D open-world learning.
+[project page](https://dingry.github.io/projects/PLA) | [arXiv](https://arxiv.org/abs/2211.16312)
 
-<table>
-<tr>
-    <td><img src="assets/scene_0025.gif" width="100%"/></td>
-    <td><img src="assets/scene_005.gif" width="100%"/></td>
-    <td><img src="assets/scene_0019.gif" width="100%"/></td>
-</tr>
-<tr>
-    <td align='center' width='24%'>working space</td>
-    <td align='center' width='24%'>piano</td>
-    <td align='center' width='24%'>vending machine</td>
-<tr>
-</table>
+<hr style="color: #333; height: 2px; width: 85%">
 
+<h4>RegionPLC: Regional Point-Language Contrastive Learning for Open-World 3D Scene Understanding</h4>
 
-<!-- ![framwork](./docs/framework.png)
-![association](./docs/association_module.png)-->
+<div>
+    <a href="https://jihanyang.github.io/" target="_blank">Jihan Yang</a><sup>*</sup>,</span>
+    <a href="https://dingry.github.io/" target="_blank">Runyu Ding</a><sup>*</sup>,</span>
+    <a href="https://github.com/VincentDENGP" target="_blank">Weipeng Deng</a>,</span>
+    <a href="https://wang-zhe.me/" target="_blank">Zhe Wang</a>,</span>
+    <a href="https://xjqi.github.io/" target="_blank">Xiaojuan Qi</a>,</span>  
+</div>
+<p><em>CVPR 2024</em></p>
 
-
-[project page](https://dingry.github.io/projects/PLA) | [arXiv](https://arxiv.org/abs/2211.16312)
+<p><a href="https://jihanyang.github.io/projects/RegionPLC">project page</a> | <a href="https://arxiv.org/pdf/2304.00962">arXiv</a></p>
 
 </div>
 
-### TODO
-- [ ] Release caption processing code
+##### Highlights:
+- Official PLA implementation is contained in the `main` branch
+- Official RegionPLC implementation is contained in the `regionplc` branch
+
+### Release
+- [2024-05-05] Releasing **RegionPLC** implementation. Please checkout `regionplc` branch to try it!
 
 ### Getting Started
 
@@ -74,5 +68,14 @@ If you find this project useful in your research, please consider cite:
 }
 ```
 
+```bibtex
+@inproceedings{yang2024regionplc,
+    title={RegionPLC: Regional point-language contrastive learning for open-world 3d scene understanding},
+    author={Yang, Jihan and Ding, Runyu and Deng, Weipeng and Wang, Zhe and Qi, Xiaojuan},
+    booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
+    year={2024}
+}
+```
+
 ### Acknowledgement
 Code is partly borrowed from [OpenPCDet](https://github.com/open-mmlab/OpenPCDet), [PointGroup](https://github.com/dvlab-research/PointGroup) and [SoftGroup](https://github.com/thangvubk/SoftGroup).
\ No newline at end of file
diff --git a/docs/DATASET.md b/docs/DATASET.md
index f081b57..57c4581 100644
--- a/docs/DATASET.md
+++ b/docs/DATASET.md
@@ -29,7 +29,7 @@ The dataset configs are located within [tools/cfgs/dataset_configs](../tools/cfg
     python3 pcseg/datasets/s3dis/preprocess.py 
     ```
     
-- Additionally, please download the caption data [here](https://connecthkuhk-my.sharepoint.com/:f:/g/personal/u3007346_connect_hku_hk/EoNAsU5f8YRGtQYV8ewhwvQB7QPbxT-uwKqTk8FPiyUTtQ?e=wq58H7). Download image data [here](https://github.com/alexsax/2D-3D-Semantics) if you want to generate captions on your own.
+- Additionally, please download the caption data [here](https://connecthkuhk-my.sharepoint.com/:f:/g/personal/u3007346_connect_hku_hk/EoNAsU5f8YRGtQYV8ewhwvQB7QPbxT-uwKqTk8FPiyUTtQ?e=wq58H7). If you want to generate captions on your own, please download image data [here](https://github.com/alexsax/2D-3D-Semantics) and follows scripts here: [generate_caption.py](../tools/process_tools/generate_caption.py) and [generate_caption_idx.py](../tools/process_tools/generate_caption_idx.py).
  
 - The directory organization should be as follows:
 
@@ -46,5 +46,3 @@ The dataset configs are located within [tools/cfgs/dataset_configs](../tools/cfg
     ├── pcseg
     ├── tools
     ```
-
-The scripts that process S3DIS images to generate captions and corresponding point indices will be available soon.
diff --git a/docs/INFER.md b/docs/INFER.md
new file mode 100644
index 0000000..9bb2419
--- /dev/null
+++ b/docs/INFER.md
@@ -0,0 +1,15 @@
+If you wish to test on custom 3D scenes or categories, you can utilize our example configs: 
+ `tools/cfgs/scannet_models/spconv_clip_openvocab.yaml` and `tools/cfgs/scannet_models/inst/softgroup_clip_openvocab.yaml`
+
+The key parameters to consider are as follows:
+- `TEXT_EMBED.CATEGORY_NAMES`
+
+    This parameter allows you to define the category list for segmentation.
+
+- `TASK_HEAD.CORRECT_SEG_PRED_BINARY` and `INST_HEAD.CORRECT_SEG_PRED_BINARY`
+
+    These parameters allow you to decide using binary head to rectify semantic scores or not.
+
+
+To save the results, you can use the command `--save_results semantic,instance`. Afterward, you can employ the visualization utilities found in tools/visual_utils/visualize_indoor.py to visualize the predicted results.
+
diff --git a/docs/INSTALL.md b/docs/INSTALL.md
index cf4df85..b95a60f 100644
--- a/docs/INSTALL.md
+++ b/docs/INSTALL.md
@@ -7,13 +7,13 @@ All the codes are tested in the following environment:
 
 #### Install dependent libraries
 a. Clone this repository.
-```shell
+```bash
 git clone https://github.com/CVMI-Lab/PLA.git
 ```
 
 b. Install the dependent libraries as follows:
 
-* Install the dependent python libraries: 
+* Install the dependent Python libraries (Please note that you need to install the correct version of `torch` and `spconv` according to your CUDA version): 
     ```bash
     pip install -r requirements.txt 
     ```
@@ -28,4 +28,4 @@ b. Install the dependent libraries as follows:
 * Install [pcseg](../pcseg)
     ```bash
     python3 setup.py develop
-    ```
\ No newline at end of file
+    ```
diff --git a/pcseg/datasets/__pycache__/__init__.cpython-38.pyc b/pcseg/datasets/__pycache__/__init__.cpython-38.pyc
new file mode 100644
index 0000000..c190c9c
Binary files /dev/null and b/pcseg/datasets/__pycache__/__init__.cpython-38.pyc differ
diff --git a/pcseg/datasets/__pycache__/dataset.cpython-38.pyc b/pcseg/datasets/__pycache__/dataset.cpython-38.pyc
new file mode 100644
index 0000000..8ff101c
Binary files /dev/null and b/pcseg/datasets/__pycache__/dataset.cpython-38.pyc differ
diff --git a/pcseg/datasets/__pycache__/indoor_dataset.cpython-38.pyc b/pcseg/datasets/__pycache__/indoor_dataset.cpython-38.pyc
new file mode 100644
index 0000000..edf7631
Binary files /dev/null and b/pcseg/datasets/__pycache__/indoor_dataset.cpython-38.pyc differ
diff --git a/pcseg/datasets/augmentor/__pycache__/__init__.cpython-38.pyc b/pcseg/datasets/augmentor/__pycache__/__init__.cpython-38.pyc
new file mode 100644
index 0000000..21833d7
Binary files /dev/null and b/pcseg/datasets/augmentor/__pycache__/__init__.cpython-38.pyc differ
diff --git a/pcseg/datasets/augmentor/__pycache__/augmentor_utils.cpython-38.pyc b/pcseg/datasets/augmentor/__pycache__/augmentor_utils.cpython-38.pyc
new file mode 100644
index 0000000..d5a7193
Binary files /dev/null and b/pcseg/datasets/augmentor/__pycache__/augmentor_utils.cpython-38.pyc differ
diff --git a/pcseg/datasets/augmentor/__pycache__/data_augmentor.cpython-38.pyc b/pcseg/datasets/augmentor/__pycache__/data_augmentor.cpython-38.pyc
new file mode 100644
index 0000000..7adbf10
Binary files /dev/null and b/pcseg/datasets/augmentor/__pycache__/data_augmentor.cpython-38.pyc differ
diff --git a/pcseg/datasets/dataset.py b/pcseg/datasets/dataset.py
index 57d00c3..1b0592e 100755
--- a/pcseg/datasets/dataset.py
+++ b/pcseg/datasets/dataset.py
@@ -65,7 +65,7 @@ def __init__(self, dataset_cfg=None, class_names=None, training=True, root_path=
             self.valid_class_idx, self.ignore_label, squeeze_label=self.training)
 
         # caption config
-        if 'CAPTION_INFO' in self.dataset_cfg:
+        if self.training and 'CAPTION_INFO' in self.dataset_cfg:
             self.caption_cfg = self.dataset_cfg.CAPTION_INFO
             self.caption_keys = self.dataset_cfg.CAPTION_INFO.KEY
             self.caption = self.get_caption_items(self.caption_cfg)
diff --git a/pcseg/datasets/processor/__pycache__/__init__.cpython-38.pyc b/pcseg/datasets/processor/__pycache__/__init__.cpython-38.pyc
new file mode 100644
index 0000000..bdac192
Binary files /dev/null and b/pcseg/datasets/processor/__pycache__/__init__.cpython-38.pyc differ
diff --git a/pcseg/datasets/processor/__pycache__/data_processor.cpython-38.pyc b/pcseg/datasets/processor/__pycache__/data_processor.cpython-38.pyc
new file mode 100644
index 0000000..438de93
Binary files /dev/null and b/pcseg/datasets/processor/__pycache__/data_processor.cpython-38.pyc differ
diff --git a/pcseg/datasets/s3dis/__pycache__/s3dis_dataset.cpython-38.pyc b/pcseg/datasets/s3dis/__pycache__/s3dis_dataset.cpython-38.pyc
new file mode 100644
index 0000000..9f704ea
Binary files /dev/null and b/pcseg/datasets/s3dis/__pycache__/s3dis_dataset.cpython-38.pyc differ
diff --git a/pcseg/datasets/s3dis/s3dis_dataset.py b/pcseg/datasets/s3dis/s3dis_dataset.py
index d56d235..b07e006 100644
--- a/pcseg/datasets/s3dis/s3dis_dataset.py
+++ b/pcseg/datasets/s3dis/s3dis_dataset.py
@@ -332,7 +332,10 @@ def __init__(self, dataset_cfg, class_names, training, root_path, logger=None):
         S3DISDataset.__init__(self, dataset_cfg, class_names, training, root_path, logger=logger)
         self.inst_class_idx = dataset_cfg.inst_class_idx
         self.inst_label_shift = dataset_cfg.inst_label_shift
-        if 'base_class_idx' in dataset_cfg:
+        if 'base_inst_class_idx' in dataset_cfg:
+            self.base_inst_class_idx = dataset_cfg.base_inst_class_idx
+            self.novel_inst_class_idx = dataset_cfg.novel_inst_class_idx
+        elif 'base_class_idx' in dataset_cfg:
             self.base_inst_class_idx = self.base_class_idx
             self.novel_inst_class_idx = self.novel_class_idx
         self.sem2ins_classes = dataset_cfg.sem2ins_classes
diff --git a/pcseg/datasets/scannet/__pycache__/scannet_dataset.cpython-38.pyc b/pcseg/datasets/scannet/__pycache__/scannet_dataset.cpython-38.pyc
new file mode 100644
index 0000000..77d436b
Binary files /dev/null and b/pcseg/datasets/scannet/__pycache__/scannet_dataset.cpython-38.pyc differ
diff --git a/pcseg/datasets/scannet/scannet_dataset.py b/pcseg/datasets/scannet/scannet_dataset.py
index 60305ae..45421ec 100755
--- a/pcseg/datasets/scannet/scannet_dataset.py
+++ b/pcseg/datasets/scannet/scannet_dataset.py
@@ -309,7 +309,10 @@ def __init__(self, dataset_cfg, class_names, training, root_path, logger=None):
         ScanNetDataset.__init__(self, dataset_cfg, class_names, training, root_path, logger=logger)
         self.inst_class_idx = dataset_cfg.inst_class_idx
         self.inst_label_shift = dataset_cfg.inst_label_shift
-        if 'base_class_idx' in dataset_cfg:
+        if 'base_inst_class_idx' in dataset_cfg:
+            self.base_inst_class_idx = dataset_cfg.base_inst_class_idx
+            self.novel_inst_class_idx = dataset_cfg.novel_inst_class_idx
+        elif 'base_class_idx' in dataset_cfg:
             self.base_inst_class_idx = np.array(self.base_class_idx)[dataset_cfg.inst_label_shift:] - self.inst_label_shift
             self.novel_inst_class_idx = np.array(self.novel_class_idx) - self.inst_label_shift
         self.sem2ins_classes = dataset_cfg.sem2ins_classes
diff --git a/pcseg/models/__pycache__/__init__.cpython-38.pyc b/pcseg/models/__pycache__/__init__.cpython-38.pyc
new file mode 100644
index 0000000..a41d0e4
Binary files /dev/null and b/pcseg/models/__pycache__/__init__.cpython-38.pyc differ
diff --git a/pcseg/models/adapter/__pycache__/__init__.cpython-38.pyc b/pcseg/models/adapter/__pycache__/__init__.cpython-38.pyc
new file mode 100644
index 0000000..ef01c1c
Binary files /dev/null and b/pcseg/models/adapter/__pycache__/__init__.cpython-38.pyc differ
diff --git a/pcseg/models/adapter/__pycache__/vl_adapter.cpython-38.pyc b/pcseg/models/adapter/__pycache__/vl_adapter.cpython-38.pyc
new file mode 100644
index 0000000..8e7c074
Binary files /dev/null and b/pcseg/models/adapter/__pycache__/vl_adapter.cpython-38.pyc differ
diff --git a/pcseg/models/head/__pycache__/__init__.cpython-38.pyc b/pcseg/models/head/__pycache__/__init__.cpython-38.pyc
new file mode 100644
index 0000000..b0979dc
Binary files /dev/null and b/pcseg/models/head/__pycache__/__init__.cpython-38.pyc differ
diff --git a/pcseg/models/head/__pycache__/binary_head.cpython-38.pyc b/pcseg/models/head/__pycache__/binary_head.cpython-38.pyc
new file mode 100644
index 0000000..f130149
Binary files /dev/null and b/pcseg/models/head/__pycache__/binary_head.cpython-38.pyc differ
diff --git a/pcseg/models/head/__pycache__/caption_head.cpython-38.pyc b/pcseg/models/head/__pycache__/caption_head.cpython-38.pyc
new file mode 100644
index 0000000..3f4a743
Binary files /dev/null and b/pcseg/models/head/__pycache__/caption_head.cpython-38.pyc differ
diff --git a/pcseg/models/head/__pycache__/inst_head.cpython-38.pyc b/pcseg/models/head/__pycache__/inst_head.cpython-38.pyc
new file mode 100644
index 0000000..d656cfb
Binary files /dev/null and b/pcseg/models/head/__pycache__/inst_head.cpython-38.pyc differ
diff --git a/pcseg/models/head/__pycache__/linear_head.cpython-38.pyc b/pcseg/models/head/__pycache__/linear_head.cpython-38.pyc
new file mode 100644
index 0000000..a2c2423
Binary files /dev/null and b/pcseg/models/head/__pycache__/linear_head.cpython-38.pyc differ
diff --git a/pcseg/models/head/__pycache__/text_seg_head.cpython-38.pyc b/pcseg/models/head/__pycache__/text_seg_head.cpython-38.pyc
new file mode 100644
index 0000000..458a831
Binary files /dev/null and b/pcseg/models/head/__pycache__/text_seg_head.cpython-38.pyc differ
diff --git a/pcseg/models/head/inst_head.py b/pcseg/models/head/inst_head.py
index ef1f8ab..9f17121 100644
--- a/pcseg/models/head/inst_head.py
+++ b/pcseg/models/head/inst_head.py
@@ -78,6 +78,7 @@ def __init__(self, model_cfg, in_channel, inst_class_idx, sem2ins_classes,
         else:
             self.train_sem_classes = self.valid_class_idx
         self.test_sem_classes = self.valid_class_idx
+        self.correct_seg_pred_binary = model_cfg.get('CORRECT_SEG_PRED_BINARY', True)
 
         self.forward_ret_dict = {}
 
@@ -118,7 +119,7 @@ def forward_grouping(self, batch_size, semantic_scores, pt_offsets, batch_idxs,
         binary_scores_list = []
 
         _semantic_scores = semantic_scores.clone()
-        if not self.training and binary_scores is not None:
+        if not self.training and binary_scores is not None and self.correct_seg_pred_binary:
             base_semantic_scores = semantic_scores[..., self.base_class_idx].softmax(dim=-1)
             novel_semantic_scores = semantic_scores[..., self.novel_class_idx].softmax(dim=-1)
             semantic_scores = semantic_scores.clone()
@@ -244,7 +245,7 @@ def get_instances(self, scan_id, proposals_idx, semantic_scores, cls_scores, iou
         num_instances = cls_scores.size(0)
         num_points = semantic_scores.size(0)
 
-        if binary_scores is not None:
+        if self.correct_seg_pred_binary and binary_scores is not None:
             assert proposal_binary_scores is not None
             base_cls_scores = cls_scores[..., self.inst_base_class_idx].softmax(dim=-1)
             novel_cls_scores = cls_scores[..., self.inst_novel_class_idx].softmax(dim=-1)
@@ -292,7 +293,7 @@ def get_instances(self, scan_id, proposals_idx, semantic_scores, cls_scores, iou
 
                 mask_pred = torch.zeros((num_instances, num_points), dtype=torch.int8, device='cuda')
                 mask_inds = cur_mask_scores > self.test_cfg.MASK_SCORE_THR
-                cur_proposals_idx = proposals_idx[mask_inds].long()
+                cur_proposals_idx = proposals_idx[mask_inds.cpu()].long()
                 mask_pred[cur_proposals_idx[:, 0], cur_proposals_idx[:, 1]] = 1
 
                 # filter low score instance
diff --git a/pcseg/models/head/text_seg_head.py b/pcseg/models/head/text_seg_head.py
index 19283f9..6d85439 100755
--- a/pcseg/models/head/text_seg_head.py
+++ b/pcseg/models/head/text_seg_head.py
@@ -11,7 +11,7 @@
 
 
 class TextSegHead(nn.Module):
-    def __init__(self, model_cfg, in_channel, ignore_label, **kwargs):
+    def __init__(self, model_cfg, in_channel, ignore_label, valid_class_idx, **kwargs):
         super(TextSegHead, self).__init__()
         self.model_cfg = model_cfg
         self.in_channel = in_channel
@@ -36,14 +36,10 @@ def __init__(self, model_cfg, in_channel, ignore_label, **kwargs):
             param.requires_grad = False
 
         # open vocab
-        self.valid_class_idx = [i for i in range(len(cfg.CLASS_NAMES))]
+        self.valid_class_idx = valid_class_idx
         if hasattr(cfg.DATA_CONFIG, 'base_class_idx'):
             self.base_class_idx = cfg.DATA_CONFIG.base_class_idx
             self.novel_class_idx = cfg.DATA_CONFIG.novel_class_idx
-        if hasattr(cfg.DATA_CONFIG, 'ignore_class_idx'):
-            self.ignore_class_idx = cfg.DATA_CONFIG.ignore_class_idx
-            for i in self.ignore_class_idx:
-                self.valid_class_idx.remove(i)
 
         # remap category name for ambigous categories
         self.need_class_mapping = self.model_cfg.get('CLASS_MAPPING', False)
diff --git a/pcseg/models/model_utils/__pycache__/__init__.cpython-38.pyc b/pcseg/models/model_utils/__pycache__/__init__.cpython-38.pyc
new file mode 100644
index 0000000..9675a23
Binary files /dev/null and b/pcseg/models/model_utils/__pycache__/__init__.cpython-38.pyc differ
diff --git a/pcseg/models/model_utils/__pycache__/basic_block_1d.cpython-38.pyc b/pcseg/models/model_utils/__pycache__/basic_block_1d.cpython-38.pyc
new file mode 100644
index 0000000..c7afbf9
Binary files /dev/null and b/pcseg/models/model_utils/__pycache__/basic_block_1d.cpython-38.pyc differ
diff --git a/pcseg/models/model_utils/__pycache__/fp16.cpython-38.pyc b/pcseg/models/model_utils/__pycache__/fp16.cpython-38.pyc
new file mode 100644
index 0000000..34dd704
Binary files /dev/null and b/pcseg/models/model_utils/__pycache__/fp16.cpython-38.pyc differ
diff --git a/pcseg/models/model_utils/__pycache__/rle_utils.cpython-38.pyc b/pcseg/models/model_utils/__pycache__/rle_utils.cpython-38.pyc
new file mode 100644
index 0000000..88dda6c
Binary files /dev/null and b/pcseg/models/model_utils/__pycache__/rle_utils.cpython-38.pyc differ
diff --git a/pcseg/models/model_utils/__pycache__/unet_blocks.cpython-38.pyc b/pcseg/models/model_utils/__pycache__/unet_blocks.cpython-38.pyc
new file mode 100644
index 0000000..10b8b14
Binary files /dev/null and b/pcseg/models/model_utils/__pycache__/unet_blocks.cpython-38.pyc differ
diff --git a/pcseg/models/text_networks/__pycache__/__init__.cpython-38.pyc b/pcseg/models/text_networks/__pycache__/__init__.cpython-38.pyc
new file mode 100644
index 0000000..80e2e8b
Binary files /dev/null and b/pcseg/models/text_networks/__pycache__/__init__.cpython-38.pyc differ
diff --git a/pcseg/models/text_networks/__pycache__/prompt_template.cpython-38.pyc b/pcseg/models/text_networks/__pycache__/prompt_template.cpython-38.pyc
new file mode 100644
index 0000000..faebc6f
Binary files /dev/null and b/pcseg/models/text_networks/__pycache__/prompt_template.cpython-38.pyc differ
diff --git a/pcseg/models/text_networks/__pycache__/text_models.cpython-38.pyc b/pcseg/models/text_networks/__pycache__/text_models.cpython-38.pyc
new file mode 100644
index 0000000..4800df7
Binary files /dev/null and b/pcseg/models/text_networks/__pycache__/text_models.cpython-38.pyc differ
diff --git a/pcseg/models/vision_backbones_3d/__pycache__/__init__.cpython-38.pyc b/pcseg/models/vision_backbones_3d/__pycache__/__init__.cpython-38.pyc
new file mode 100644
index 0000000..1e7c3cb
Binary files /dev/null and b/pcseg/models/vision_backbones_3d/__pycache__/__init__.cpython-38.pyc differ
diff --git a/pcseg/models/vision_backbones_3d/__pycache__/spconv_unet_indoor.cpython-38.pyc b/pcseg/models/vision_backbones_3d/__pycache__/spconv_unet_indoor.cpython-38.pyc
new file mode 100644
index 0000000..76208e8
Binary files /dev/null and b/pcseg/models/vision_backbones_3d/__pycache__/spconv_unet_indoor.cpython-38.pyc differ
diff --git a/pcseg/models/vision_backbones_3d/vfe/__pycache__/__init__.cpython-38.pyc b/pcseg/models/vision_backbones_3d/vfe/__pycache__/__init__.cpython-38.pyc
new file mode 100644
index 0000000..4cfca89
Binary files /dev/null and b/pcseg/models/vision_backbones_3d/vfe/__pycache__/__init__.cpython-38.pyc differ
diff --git a/pcseg/models/vision_backbones_3d/vfe/__pycache__/indoor_vfe.cpython-38.pyc b/pcseg/models/vision_backbones_3d/vfe/__pycache__/indoor_vfe.cpython-38.pyc
new file mode 100644
index 0000000..ba9e942
Binary files /dev/null and b/pcseg/models/vision_backbones_3d/vfe/__pycache__/indoor_vfe.cpython-38.pyc differ
diff --git a/pcseg/models/vision_backbones_3d/vfe/__pycache__/vfe_template.cpython-38.pyc b/pcseg/models/vision_backbones_3d/vfe/__pycache__/vfe_template.cpython-38.pyc
new file mode 100644
index 0000000..539b19a
Binary files /dev/null and b/pcseg/models/vision_backbones_3d/vfe/__pycache__/vfe_template.cpython-38.pyc differ
diff --git a/pcseg/models/vision_networks/__pycache__/__init__.cpython-38.pyc b/pcseg/models/vision_networks/__pycache__/__init__.cpython-38.pyc
new file mode 100644
index 0000000..deea5cb
Binary files /dev/null and b/pcseg/models/vision_networks/__pycache__/__init__.cpython-38.pyc differ
diff --git a/pcseg/models/vision_networks/__pycache__/network_template.cpython-38.pyc b/pcseg/models/vision_networks/__pycache__/network_template.cpython-38.pyc
new file mode 100644
index 0000000..ce2b988
Binary files /dev/null and b/pcseg/models/vision_networks/__pycache__/network_template.cpython-38.pyc differ
diff --git a/pcseg/models/vision_networks/__pycache__/sparseunet_textseg.cpython-38.pyc b/pcseg/models/vision_networks/__pycache__/sparseunet_textseg.cpython-38.pyc
new file mode 100644
index 0000000..4a5a166
Binary files /dev/null and b/pcseg/models/vision_networks/__pycache__/sparseunet_textseg.cpython-38.pyc differ
diff --git a/pcseg/models/vision_networks/network_template.py b/pcseg/models/vision_networks/network_template.py
index 57583bb..bf1d884 100755
--- a/pcseg/models/vision_networks/network_template.py
+++ b/pcseg/models/vision_networks/network_template.py
@@ -80,7 +80,8 @@ def build_task_head(self, model_info_dict):
             model_cfg=self.model_cfg.TASK_HEAD,
             in_channel=in_channel,
             ignore_label=self.dataset.ignore_label,
-            num_class=self.num_class
+            num_class=self.num_class,
+            valid_class_idx=self.dataset.valid_class_idx
         )
         model_info_dict['module_list'].append(task_head_module)
         return task_head_module, model_info_dict
diff --git a/tools/cfgs/s3dis_models/inst/softgroup_clip_base8_caption_adamw.yaml b/tools/cfgs/s3dis_models/inst/softgroup_clip_base8_caption_adamw.yaml
index 2228b6e..01e3d2a 100644
--- a/tools/cfgs/s3dis_models/inst/softgroup_clip_base8_caption_adamw.yaml
+++ b/tools/cfgs/s3dis_models/inst/softgroup_clip_base8_caption_adamw.yaml
@@ -61,3 +61,6 @@ MODEL:
       SCENE: 0.0
       VIEW: 0.08
       ENTITY: 0.02
+
+  INST_HEAD:
+    CORRECT_SEG_PRED_BINARY: Tru
\ No newline at end of file
diff --git a/tools/cfgs/scannet_models/inst/softgroup_clip_base13_caption_adamw.yaml b/tools/cfgs/scannet_models/inst/softgroup_clip_base13_caption_adamw.yaml
index 16f7ae3..b49e308 100644
--- a/tools/cfgs/scannet_models/inst/softgroup_clip_base13_caption_adamw.yaml
+++ b/tools/cfgs/scannet_models/inst/softgroup_clip_base13_caption_adamw.yaml
@@ -62,3 +62,6 @@ MODEL:
       SCENE: 0.0
       VIEW: 0.05
       ENTITY: 0.05
+
+  INST_HEAD:
+    CORRECT_SEG_PRED_BINARY: True
diff --git a/tools/cfgs/scannet_models/inst/softgroup_clip_openvocab_test.yaml b/tools/cfgs/scannet_models/inst/softgroup_clip_openvocab_test.yaml
new file mode 100644
index 0000000..df502bd
--- /dev/null
+++ b/tools/cfgs/scannet_models/inst/softgroup_clip_openvocab_test.yaml
@@ -0,0 +1,28 @@
+_BASE_CONFIG_: cfgs/scannet_models/inst/softgroup_clip_base13_caption_adamw.yaml
+
+DATA_CONFIG:
+  # TODO: split the input categories into base/novel/ignore.
+  # Note that if you has gropud-truth annotations for the test samples,
+  # you need to carefully set thoese parameters to evaluate the performance quantitatively.
+  # If you just want to evaluate it qualitiatively, you can just put all the categories into base_class_idx.
+  base_class_idx: [ 0, 1, 2, 3, 4]
+  novel_class_idx: []
+  ignore_class_idx: [ ]
+
+  # TODO: split the categories into inst_base/inst_novel
+  inst_class_idx: [2, 3]
+  base_inst_class_idx: [0, 1]  # the base category indices for instance categories. The length of this list should be the same as or smaller than the length of inst_class_idx
+  novel_inst_class_idx: []
+
+MODEL:
+  TASK_HEAD:
+    CORRECT_SEG_PRED_BINARY: True  # TODO: For out-of-domain data, set this to False probably leads to better performance
+
+  INST_HEAD:
+    CORRECT_SEG_PRED_BINARY: True  # TODO: For out-of-domain data, set this to False probably leads to better performance
+    CLUSTERING:
+      PREPARE_EPOCH: -1
+
+TEXT_ENCODER:
+  EXTRACT_EMBED: True
+  CATEGORY_NAMES: [door, window, desk, keyboard, others]  # TODO: input your custom categories
\ No newline at end of file
diff --git a/tools/cfgs/scannet_models/spconv_clip_openvocab_test.yaml b/tools/cfgs/scannet_models/spconv_clip_openvocab_test.yaml
new file mode 100644
index 0000000..05a02f3
--- /dev/null
+++ b/tools/cfgs/scannet_models/spconv_clip_openvocab_test.yaml
@@ -0,0 +1,18 @@
+_BASE_CONFIG_: cfgs/scannet_models/spconv_clip_base15_caption_adamw.yaml
+
+DATA_CONFIG:
+  # TODO: split the input categories into base/novel/ignore.
+  # Note that if you has gropud-truth annotations for the test samples,
+  # you need to carefully set thoese parameters to evaluate the performance quantitatively.
+  # If you just want to evaluate it qualitiatively, you can just put all the categories into base_class_idx.
+  base_class_idx: [ 0, 1, 2, 3, 4]
+  novel_class_idx: []
+  ignore_class_idx: [ ]
+
+MODEL:
+  TASK_HEAD:
+    CORRECT_SEG_PRED_BINARY: True  # TODO: For out-of-domain data, set this to False probably leads to better performance
+
+TEXT_ENCODER:
+  EXTRACT_EMBED: True
+  CATEGORY_NAMES: [door, window, desk, keyboard, others] # TODO: input your custom categories
\ No newline at end of file
diff --git a/tools/eval_utils/inst_eval/eval_utils.py b/tools/eval_utils/inst_eval/eval_utils.py
index 1dda7a7..a234c08 100644
--- a/tools/eval_utils/inst_eval/eval_utils.py
+++ b/tools/eval_utils/inst_eval/eval_utils.py
@@ -40,8 +40,8 @@ def evaluate_matches(self, matches):
         dist_confs = [self.distance_confs[0]]
 
         # results: class x iou
-        ap = np.zeros((len(dist_threshes), len(self.eval_class_labels), len(ious)), np.float)
-        rc = np.zeros((len(dist_threshes), len(self.eval_class_labels), len(ious)), np.float)
+        ap = np.zeros((len(dist_threshes), len(self.eval_class_labels), len(ious)), np.float32)
+        rc = np.zeros((len(dist_threshes), len(self.eval_class_labels), len(ious)), np.float32)
         for di, (min_region_size, distance_thresh,
                  distance_conf) in enumerate(zip(min_region_sizes, dist_threshes, dist_confs)):
             for oi, iou_th in enumerate(ious):
@@ -74,7 +74,7 @@ def evaluate_matches(self, matches):
 
                         cur_true = np.ones(len(gt_instances))
                         cur_score = np.ones(len(gt_instances)) * (-float('inf'))
-                        cur_match = np.zeros(len(gt_instances), dtype=np.bool)
+                        cur_match = np.zeros(len(gt_instances), dtype=bool)
                         # collect matches
                         for (gti, gt) in enumerate(gt_instances):
                             found_match = False
diff --git a/tools/process_tools/generate_caption_idx.py b/tools/process_tools/generate_caption_idx.py
index 9ed20ef..497a60a 100644
--- a/tools/process_tools/generate_caption_idx.py
+++ b/tools/process_tools/generate_caption_idx.py
@@ -236,6 +236,6 @@ def get_entity_caption_corr_idx(self, view_entity_caption, view_caption_corr_idx
         --view_caption_path ./data/scannetv2/text_embed/caption_view_scannet_vit-gpt2-image-captioning_25k.json \
         --view_caption_corr_idx_path ./data/scannetv2/scannetv2_view_vit-gpt2_matching_idx.pickle
         """
-        processor.create_caption_idx(args.workers)
+        processor.create_entity_caption_idx(args.workers)
     else:
         raise NotImplementedError
diff --git a/tools/test.py b/tools/test.py
index 10bfa33..21c7de5 100755
--- a/tools/test.py
+++ b/tools/test.py
@@ -203,9 +203,13 @@ def main():
         common_utils.oss_data_client = common_utils.OSSClient()
         logger.info(f'Ceph client initialization with root path at {cfg.DATA_CONFIG.OSS_PATH}')
 
+    if cfg.get('TEXT_ENCODER', None) and cfg.TEXT_ENCODER.EXTRACT_EMBED:
+        class_names = cfg.TEXT_ENCODER.CATEGORY_NAMES
+    else:
+        class_names = cfg.CLASS_NAMES
     test_set, test_loader, sampler = build_dataloader(
         dataset_cfg=cfg.DATA_CONFIG,
-        class_names=cfg.CLASS_NAMES,
+        class_names=class_names,
         batch_size=args.batch_size,
         dist=dist_test, workers=args.workers, logger=logger, training=False
     )