-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Add falcon-e support #268
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add falcon-e support #268
Conversation
|
cc @tsong-ms let me know if there is anything else to do before successfully merging this PR - thanks very much ! |
|
@younesbelkada 2 issues:
Not sure if that's all that is missing yet. Gonna patch the files and hopefully it works Follow-up: |
|
Thank you very much @RobertAgee for testing everything ! I have few questions |
Hi younesbelkada, thank you for the PR, I have tested your model on M2-ultra which got a good inference speed as 3B size. ./build/bin/llama-bench -m ./models/Falcon-E-3B-Instruct/ggml-model-i2_s.gguf -ngl 0 -b 1 -t 4,12,8 -p 0 -n 128
Meanwhile, I left a comment on the PR in the submodule. After the submodule's PR is merged, please update the submodule reference in this PR and make the necessary changes. Thank you! |
|
Thank you very much @tsong-ms for running the benchmarks 🙏 |
…into add-falcon-e-final
|
@younesbelkada @tsong-ms , so unfortunately there's still a bit of work to be done with ensuring Falcon support in bitnet. Mostly, it actually doesn't have good model family support, and I would image most models are running at least not optimized for their model sizes, especially the newer ones. And with the H family coming, it would behoove the team to get this right so we can truly show folks what 1.58 is all about 💪
I did notice what I assume is a pretty big mixup, is that ALL Falcon models (1, 3, E, GGUF, anything you would try to build) are getting the same tensor tiling dimensions and ModelShapeDict from the 1.58bit-ized Lllama 100B->8B 1.58 quantized models, which I believe for the newer E models trained natively in 1.58, so this is way too big. These if not optimized can hurt model performance, so pretty big deal if not accurately assigned.
As far as your being successful in building, which model did you use? And how did you go about that? Because I did a direct pull from this PR, and followed the steps in the Bitnet instructions exactly, as well as on the Falcon E page (which these instructions also have 😬 ), and there were errors throughout setup and compilation. There's some upstream changes that llama.cpp needs as well (fairly minor but necessary) and related files for handling Falcon E models. I had to keep compiling til I found the error, delete the build, recompile, repeat until I was success. The actual changes that need to be made a very simple and minor, but just working through them all like this was challenging as I had to make educated guesses about what the correct settings were. I can make a PR tomorrow for the BitNet.cpp fixes, but I'd like someone better than me to confirm the tiling and model shape sizing, as that's pretty critical. I can also submit a PR for the llama.cpp fixes. Let me know what you think. |
|
Thank you very much @RobertAgee for your detailed analysis and work, good point for Base models, I added them in this PR so people can use it! |




As per title, adds support of Falcon-E models: https://huggingface.co/collections/tiiuae/falcon-edge-series-6804fd13344d6d8a8fa71130
Needs: Eddie-Wang1120/llama.cpp#8 to be merged