Discussed in #763
Originally posted by yarray July 17, 2023
Although llama.cpp can now support GPU via cublas, it seems that exllama runs times faster if with a good enough GPU (3090 as an example). Is there any plan to support exllama, or in general, other loaders to load LLM?