Skip to content

With latest Pytorch 2.7 Gramine SGX fails with can't allocate memory error #116

@jinengandhi-intel

Description

@jinengandhi-intel

Previously there was an open issue #113 with Pytorch 2.6 which was fixed by #115 and merged a few days back. But now as the pytorch execution proceeds further it fails with a new error:

gramine-sgx ./pytorch ./pytorchexample.py
Gramine is starting. Parsing TOML manifest file, this may take some time...
-----------------------------------------------------------------------------------------------------------------------
Gramine detected the following insecure configurations:

  - sgx.debug = true                           (this is a debug enclave)
  - loader.insecure__use_cmdline_argv = true   (forwarding command-line args from untrusted host to the app)
  - sgx.allowed_files = [ ... ]                (some files are passed through from untrusted host without verification)

Gramine will continue application execution, but this configuration must not be used in production!
-----------------------------------------------------------------------------------------------------------------------

Traceback (most recent call last):
  File "//./pytorchexample.py", line 12, in <module>
    alexnet = torch.load("alexnet-pretrained.pt", weights_only=False)
  File "/home/intel/jenkins/workspace/local_ci_graphene_sgx_22.04_6.2/gramine/CI-Examples/pytorch/my_venv/lib/python3.10/site-packages/torch/serialization.py", line 1525, in load
    return _load(
  File "/home/intel/jenkins/workspace/local_ci_graphene_sgx_22.04_6.2/gramine/CI-Examples/pytorch/my_venv/lib/python3.10/site-packages/torch/serialization.py", line 2114, in _load
    result = unpickler.load()
  File "/home/intel/jenkins/workspace/local_ci_graphene_sgx_22.04_6.2/gramine/CI-Examples/pytorch/my_venv/lib/python3.10/site-packages/torch/serialization.py", line 2078, in persistent_load
    typed_storage = load_tensor(
  File "/home/intel/jenkins/workspace/local_ci_graphene_sgx_22.04_6.2/gramine/CI-Examples/pytorch/my_venv/lib/python3.10/site-packages/torch/serialization.py", line 2031, in load_tensor
    zip_file.get_storage_from_record(name, numel, torch.UntypedStorage)
RuntimeError: [enforce fail at alloc_cpu.cpp:119] err == 0. DefaultCPUAllocator: can't allocate memory: you tried to allocate 150994944 bytes. Error code 12 (Cannot allocate memory)

This new issue is because there have been couple of new releases of Pytorch. The fix for this is very simple, in the manifest.template file we just have to increase the sgx.enclave_size from 4G to 8G.

Steps to reproduce:

Follow the ReadMe steps as mentioned in the pytorch workload.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions