Is it possible to run the example with a smaller GPU?

Hi,

Just wondering if it is possible to run the examples in `sigcomm_ae.md` with a smaller-sized GPU. I tried to change the argument for `run_cachegen.py` like `p.add_argument("--max_gpu_memory", type=int, default=12, help="Default max GPU memory in GiB on Titan XP")`

However, I am still having the following error message:
```
(cachegen) elijahye@janux02:~/CacheGen$ bash scripts/7b.sh longchat 0
Using /home/elijahye/.cache/torch_extensions/py310_cu121 as PyTorch extensions root...
Emitting ninja build file /home/elijahye/.cache/torch_extensions/py310_cu121/torchac_backend/build.ninja...
Building extension module torchac_backend...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
ninja: no work to do.
Loading extension module torchac_backend...
/home/elijahye/miniconda3/envs/cachegen/lib/python3.10/site-packages/transformers/modeling_utils.py:4565: FutureWarning: `_is_quantized_training_enabled` is going to be deprecated in transformers 4.39.0. Please use `model.hf_quantizer.is_trainable` instead
  warnings.warn(
Fetching 12 files: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 12/12 [00:00<00:00, 145047.98it/s]
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:35<00:00, 11.92s/it]
Model and tokenizer loaded
Saving KV cache for doc:  0
The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
The attention mask is not set and cannot be inferred from input because pad token is same as eos token.As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Saving KV cache for doc:  1
The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Traceback (most recent call last):
  File "/home/elijahye/CacheGen/main.py", line 39, in <module>
    generated = model.generate(input_ids, max_new_tokens = 1, return_dict_in_generate=True)
  File "/home/elijahye/miniconda3/envs/cachegen/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/home/elijahye/miniconda3/envs/cachegen/lib/python3.10/site-packages/transformers/generation/utils.py", line 1914, in generate
    result = self._sample(
  File "/home/elijahye/miniconda3/envs/cachegen/lib/python3.10/site-packages/transformers/generation/utils.py", line 2651, in _sample
    outputs = self(
  File "/home/elijahye/miniconda3/envs/cachegen/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/elijahye/miniconda3/envs/cachegen/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/elijahye/miniconda3/envs/cachegen/lib/python3.10/site-packages/transformers/models/mistral/modeling_mistral.py", line 1200, in forward
    outputs = self.model(
  File "/home/elijahye/miniconda3/envs/cachegen/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/elijahye/miniconda3/envs/cachegen/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/elijahye/miniconda3/envs/cachegen/lib/python3.10/site-packages/transformers/models/mistral/modeling_mistral.py", line 976, in forward
    layer_outputs = decoder_layer(
  File "/home/elijahye/miniconda3/envs/cachegen/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/elijahye/miniconda3/envs/cachegen/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/elijahye/miniconda3/envs/cachegen/lib/python3.10/site-packages/transformers/models/mistral/modeling_mistral.py", line 732, in forward
    hidden_states = self.mlp(hidden_states)
  File "/home/elijahye/miniconda3/envs/cachegen/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/elijahye/miniconda3/envs/cachegen/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/elijahye/miniconda3/envs/cachegen/lib/python3.10/site-packages/transformers/models/mistral/modeling_mistral.py", line 171, in forward
    return self.down_proj(self.act_fn(self.gate_proj(hidden_state)) * self.up_proj(hidden_state))
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 250.00 MiB. GPU 
Using /home/elijahye/.cache/torch_extensions/py310_cu121 as PyTorch extensions root...
Emitting ninja build file /home/elijahye/.cache/torch_extensions/py310_cu121/torchac_backend/build.ninja...
Building extension module torchac_backend...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
ninja: no work to do.
Loading extension module torchac_backend...
Traceback (most recent call last):
  File "/home/elijahye/CacheGen/run_cachegen.py", line 53, in <module>
    key_value = torch.load(f"{args.save_dir}/raw_kv_{doc_id}.pt")
  File "/home/elijahye/miniconda3/envs/cachegen/lib/python3.10/site-packages/torch/serialization.py", line 997, in load
    with _open_file_like(f, 'rb') as opened_file:
  File "/home/elijahye/miniconda3/envs/cachegen/lib/python3.10/site-packages/torch/serialization.py", line 444, in _open_file_like
    return _open_file(name_or_buffer, mode)
  File "/home/elijahye/miniconda3/envs/cachegen/lib/python3.10/site-packages/torch/serialization.py", line 425, in __init__
    super().__init__(open(name, mode))
FileNotFoundError: [Errno 2] No such file or directory: './mistral7b_longchat_data/raw_kv_1.pt'
```
Do you know how to resolve this?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is it possible to run the example with a smaller GPU? #5

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Is it possible to run the example with a smaller GPU? #5

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions