Skip to content

Is it possible to run the example with a smaller GPU? #5

@Elijah-Ye

Description

@Elijah-Ye

Hi,

Just wondering if it is possible to run the examples in sigcomm_ae.md with a smaller-sized GPU. I tried to change the argument for run_cachegen.py like p.add_argument("--max_gpu_memory", type=int, default=12, help="Default max GPU memory in GiB on Titan XP")

However, I am still having the following error message:

(cachegen) elijahye@janux02:~/CacheGen$ bash scripts/7b.sh longchat 0
Using /home/elijahye/.cache/torch_extensions/py310_cu121 as PyTorch extensions root...
Emitting ninja build file /home/elijahye/.cache/torch_extensions/py310_cu121/torchac_backend/build.ninja...
Building extension module torchac_backend...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
ninja: no work to do.
Loading extension module torchac_backend...
/home/elijahye/miniconda3/envs/cachegen/lib/python3.10/site-packages/transformers/modeling_utils.py:4565: FutureWarning: `_is_quantized_training_enabled` is going to be deprecated in transformers 4.39.0. Please use `model.hf_quantizer.is_trainable` instead
  warnings.warn(
Fetching 12 files: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 12/12 [00:00<00:00, 145047.98it/s]
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:35<00:00, 11.92s/it]
Model and tokenizer loaded
Saving KV cache for doc:  0
The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
The attention mask is not set and cannot be inferred from input because pad token is same as eos token.As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Saving KV cache for doc:  1
The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Traceback (most recent call last):
  File "/home/elijahye/CacheGen/main.py", line 39, in <module>
    generated = model.generate(input_ids, max_new_tokens = 1, return_dict_in_generate=True)
  File "/home/elijahye/miniconda3/envs/cachegen/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/home/elijahye/miniconda3/envs/cachegen/lib/python3.10/site-packages/transformers/generation/utils.py", line 1914, in generate
    result = self._sample(
  File "/home/elijahye/miniconda3/envs/cachegen/lib/python3.10/site-packages/transformers/generation/utils.py", line 2651, in _sample
    outputs = self(
  File "/home/elijahye/miniconda3/envs/cachegen/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/elijahye/miniconda3/envs/cachegen/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/elijahye/miniconda3/envs/cachegen/lib/python3.10/site-packages/transformers/models/mistral/modeling_mistral.py", line 1200, in forward
    outputs = self.model(
  File "/home/elijahye/miniconda3/envs/cachegen/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/elijahye/miniconda3/envs/cachegen/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/elijahye/miniconda3/envs/cachegen/lib/python3.10/site-packages/transformers/models/mistral/modeling_mistral.py", line 976, in forward
    layer_outputs = decoder_layer(
  File "/home/elijahye/miniconda3/envs/cachegen/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/elijahye/miniconda3/envs/cachegen/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/elijahye/miniconda3/envs/cachegen/lib/python3.10/site-packages/transformers/models/mistral/modeling_mistral.py", line 732, in forward
    hidden_states = self.mlp(hidden_states)
  File "/home/elijahye/miniconda3/envs/cachegen/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/elijahye/miniconda3/envs/cachegen/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/elijahye/miniconda3/envs/cachegen/lib/python3.10/site-packages/transformers/models/mistral/modeling_mistral.py", line 171, in forward
    return self.down_proj(self.act_fn(self.gate_proj(hidden_state)) * self.up_proj(hidden_state))
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 250.00 MiB. GPU 
Using /home/elijahye/.cache/torch_extensions/py310_cu121 as PyTorch extensions root...
Emitting ninja build file /home/elijahye/.cache/torch_extensions/py310_cu121/torchac_backend/build.ninja...
Building extension module torchac_backend...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
ninja: no work to do.
Loading extension module torchac_backend...
Traceback (most recent call last):
  File "/home/elijahye/CacheGen/run_cachegen.py", line 53, in <module>
    key_value = torch.load(f"{args.save_dir}/raw_kv_{doc_id}.pt")
  File "/home/elijahye/miniconda3/envs/cachegen/lib/python3.10/site-packages/torch/serialization.py", line 997, in load
    with _open_file_like(f, 'rb') as opened_file:
  File "/home/elijahye/miniconda3/envs/cachegen/lib/python3.10/site-packages/torch/serialization.py", line 444, in _open_file_like
    return _open_file(name_or_buffer, mode)
  File "/home/elijahye/miniconda3/envs/cachegen/lib/python3.10/site-packages/torch/serialization.py", line 425, in __init__
    super().__init__(open(name, mode))
FileNotFoundError: [Errno 2] No such file or directory: './mistral7b_longchat_data/raw_kv_1.pt'

Do you know how to resolve this?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions