-
Notifications
You must be signed in to change notification settings - Fork 22
Open
Description
Hi,
Just wondering if it is possible to run the examples in sigcomm_ae.md with a smaller-sized GPU. I tried to change the argument for run_cachegen.py like p.add_argument("--max_gpu_memory", type=int, default=12, help="Default max GPU memory in GiB on Titan XP")
However, I am still having the following error message:
(cachegen) elijahye@janux02:~/CacheGen$ bash scripts/7b.sh longchat 0
Using /home/elijahye/.cache/torch_extensions/py310_cu121 as PyTorch extensions root...
Emitting ninja build file /home/elijahye/.cache/torch_extensions/py310_cu121/torchac_backend/build.ninja...
Building extension module torchac_backend...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
ninja: no work to do.
Loading extension module torchac_backend...
/home/elijahye/miniconda3/envs/cachegen/lib/python3.10/site-packages/transformers/modeling_utils.py:4565: FutureWarning: `_is_quantized_training_enabled` is going to be deprecated in transformers 4.39.0. Please use `model.hf_quantizer.is_trainable` instead
warnings.warn(
Fetching 12 files: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 12/12 [00:00<00:00, 145047.98it/s]
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:35<00:00, 11.92s/it]
Model and tokenizer loaded
Saving KV cache for doc: 0
The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
The attention mask is not set and cannot be inferred from input because pad token is same as eos token.As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Saving KV cache for doc: 1
The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Traceback (most recent call last):
File "/home/elijahye/CacheGen/main.py", line 39, in <module>
generated = model.generate(input_ids, max_new_tokens = 1, return_dict_in_generate=True)
File "/home/elijahye/miniconda3/envs/cachegen/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/home/elijahye/miniconda3/envs/cachegen/lib/python3.10/site-packages/transformers/generation/utils.py", line 1914, in generate
result = self._sample(
File "/home/elijahye/miniconda3/envs/cachegen/lib/python3.10/site-packages/transformers/generation/utils.py", line 2651, in _sample
outputs = self(
File "/home/elijahye/miniconda3/envs/cachegen/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/elijahye/miniconda3/envs/cachegen/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
File "/home/elijahye/miniconda3/envs/cachegen/lib/python3.10/site-packages/transformers/models/mistral/modeling_mistral.py", line 1200, in forward
outputs = self.model(
File "/home/elijahye/miniconda3/envs/cachegen/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/elijahye/miniconda3/envs/cachegen/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
File "/home/elijahye/miniconda3/envs/cachegen/lib/python3.10/site-packages/transformers/models/mistral/modeling_mistral.py", line 976, in forward
layer_outputs = decoder_layer(
File "/home/elijahye/miniconda3/envs/cachegen/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/elijahye/miniconda3/envs/cachegen/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
File "/home/elijahye/miniconda3/envs/cachegen/lib/python3.10/site-packages/transformers/models/mistral/modeling_mistral.py", line 732, in forward
hidden_states = self.mlp(hidden_states)
File "/home/elijahye/miniconda3/envs/cachegen/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/elijahye/miniconda3/envs/cachegen/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
File "/home/elijahye/miniconda3/envs/cachegen/lib/python3.10/site-packages/transformers/models/mistral/modeling_mistral.py", line 171, in forward
return self.down_proj(self.act_fn(self.gate_proj(hidden_state)) * self.up_proj(hidden_state))
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 250.00 MiB. GPU
Using /home/elijahye/.cache/torch_extensions/py310_cu121 as PyTorch extensions root...
Emitting ninja build file /home/elijahye/.cache/torch_extensions/py310_cu121/torchac_backend/build.ninja...
Building extension module torchac_backend...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
ninja: no work to do.
Loading extension module torchac_backend...
Traceback (most recent call last):
File "/home/elijahye/CacheGen/run_cachegen.py", line 53, in <module>
key_value = torch.load(f"{args.save_dir}/raw_kv_{doc_id}.pt")
File "/home/elijahye/miniconda3/envs/cachegen/lib/python3.10/site-packages/torch/serialization.py", line 997, in load
with _open_file_like(f, 'rb') as opened_file:
File "/home/elijahye/miniconda3/envs/cachegen/lib/python3.10/site-packages/torch/serialization.py", line 444, in _open_file_like
return _open_file(name_or_buffer, mode)
File "/home/elijahye/miniconda3/envs/cachegen/lib/python3.10/site-packages/torch/serialization.py", line 425, in __init__
super().__init__(open(name, mode))
FileNotFoundError: [Errno 2] No such file or directory: './mistral7b_longchat_data/raw_kv_1.pt'
Do you know how to resolve this?
Metadata
Metadata
Assignees
Labels
No labels