Feature implementation from commits 0e7dadb..6197e9f #1

yashuatla · 2025-06-23T17:38:42Z

This PR contains changes from a range of commits from the original repository.

Commit Range: 0e7dadb..6197e9f
Files Changed: 22 (13 programming files)
Programming Ratio: 59.1%

Commits included:

refine readme for gpu kernel
Merge pull request gpu branch microsoft/BitNet#266 from microsoft/gpu-dev
Init gpu branch
Fix build error with GCC by forcing Clang compiler in CMake on android/aarch64 (Fix build error with GCC by forcing Clang in CMake on android/aarch64 microsoft/BitNet#242)
Add run_inference_server.py for Running llama.cpp Built-in Server (Add run_inference_server.py for Running llama.cpp Built-in Server microsoft/BitNet#204)
Merge pull request Fix model architecture name microsoft/BitNet#212 from microsoft/arch-name-dev
Fix model architecture name
Merge pull request refine readme microsoft/BitNet#176 from microsoft/readme-dev
refine readme
Merge pull request add third-party demo microsoft/BitNet#175 from microsoft/readme-dev
... and 3 more commits

add two FAQs for windows build requestions.

add third-party demo

refine readme

Fix model architecture name

…crosoft#204) * Update CMakeLists.txt I added a CMake option to compile the Llama.cpp server. This update allows us to easily build and deploy the server using BitNet * Create run_inference_server.py same as run_inference, but for use with llama.cpp's built in server, for some extra comfort In particular: - The build directory is determined based on whether the system is running on Windows or not. - A list of arguments (`--model`, `-m` etc.) is created. - The main argument list is parsed and passed to the `subprocess.run()` method to execute the system command.

…d/aarch64 (microsoft#242) GCC does not recognize Clang-specific warning flags like -Wunreachable-code-break and -Wunreachable-code-return, which are passed by upstream submodules (e.g., ggml). This patch forces CMake to use Clang via command-line arguments, avoiding the need to patch nested submodules. This resolves compiler errors without modifying submodule source code.

codeowlai · 2025-06-23T18:29:50Z

gpu/convert_safetensors.py

+    def from_name(cls, name: str):
+        if name in transformer_configs:
+            return cls(**transformer_configs[name])
+        config = [k for k in transformer_configs if k in name.upper() or k in name]


🐛 Correctness Issue

Ambiguous model name matching logic.

The current substring matching can select the wrong config if one config name is a substring of another, causing runtime errors or incorrect model conversion.

Current Code (Diff):

- config = [k for k in transformer_configs if k in name.upper() or k in name] + config = [k for k in transformer_configs if k == name.upper() or k == name]

📝 Committable suggestion

‼️ IMPORTANT
Trust, but verify! 🕵️ Please review this suggestion with the care of a code archaeologist - check that it perfectly replaces the highlighted code, preserves all lines, maintains proper indentation, and won't break anything in production. Your future self will thank you! 🚀

Suggested change

config = [k for k in transformer_configs if k in name.upper() or k in name]

config = [k for k in transformer_configs if k == name.upper() or k == name]

codeowlai · 2025-06-23T18:29:53Z

gpu/convert_safetensors.py

+        help="Path to input .safetensors file"
+    )
+    parser.add_argument(
+        "--output", type=str, default="./checkpoints/model_state.pt",


🐛 Correctness Issue

Default output path assumes directory exists.

The default output path assumes './checkpoints' directory exists, which will cause file write failures if the directory doesn't exist.

Current Code (Diff):

- "--output", type=str, default="./checkpoints/model_state.pt", + "--output", type=str, default="./model_state.pt",

📝 Committable suggestion

‼️ IMPORTANT
Trust, but verify! 🕵️ Please review this suggestion with the care of a code archaeologist - check that it perfectly replaces the highlighted code, preserves all lines, maintains proper indentation, and won't break anything in production. Your future self will thank you! 🚀

Suggested change

"--output", type=str, default="./checkpoints/model_state.pt",

"--output", type=str, default="./model_state.pt",

codeowlai · 2025-06-23T18:29:57Z

gpu/generate.py

+
+        model_args_prefill = fast.ModelArgs(use_kernel=False)
+        model_args_decode = fast.ModelArgs(use_kernel=True)
+        tokenizer = Tokenizer("./tokenizer.model")


🐛 Correctness Issue

Hardcoded tokenizer path ignores parameter.

The code hardcodes the tokenizer path to './tokenizer.model' despite having an optional tokenizer_path parameter, which will cause failures if the tokenizer isn't in the expected location.

Current Code (Diff):

- tokenizer = Tokenizer("./tokenizer.model") + tokenizer = Tokenizer(tokenizer_path if tokenizer_path is not None else "./tokenizer.model")

📝 Committable suggestion

‼️ IMPORTANT
Trust, but verify! 🕵️ Please review this suggestion with the care of a code archaeologist - check that it perfectly replaces the highlighted code, preserves all lines, maintains proper indentation, and won't break anything in production. Your future self will thank you! 🚀

Suggested change

tokenizer = Tokenizer("./tokenizer.model")

tokenizer = Tokenizer(tokenizer_path if tokenizer_path is not None else "./tokenizer.model")

codeowlai · 2025-06-23T18:30:00Z

gpu/generate.py

+
+        eos_id = self.tokenizer.eot_id
+        for niter in range(1, gen_length):
+            kv_seqlen.add_(kv_seqlen < max_seq_length)


🐛 Correctness Issue

Incorrect tensor operation.

The expression kv_seqlen.add_(kv_seqlen < max_seq_length) attempts to add a boolean tensor to kv_seqlen, which will cause incorrect behavior or runtime errors.

Current Code (Diff):

- kv_seqlen.add_(kv_seqlen < max_seq_length) + kv_seqlen.add_((kv_seqlen < max_seq_length).to(torch.int))

📝 Committable suggestion

‼️ IMPORTANT
Trust, but verify! 🕵️ Please review this suggestion with the care of a code archaeologist - check that it perfectly replaces the highlighted code, preserves all lines, maintains proper indentation, and won't break anything in production. Your future self will thank you! 🚀

Suggested change

kv_seqlen.add_(kv_seqlen < max_seq_length)

kv_seqlen.add_((kv_seqlen < max_seq_length).to(torch.int))

codeowlai · 2025-06-23T18:30:03Z

gpu/generate.py

+        if use_sampling:
+            temp = 0.7
+            top_p = 0.95
+            probs = torch.softmax(logits / temp, dim=-1)
+            next_token = sample_utils.top_p(probs, top_p)


🐛 Correctness Issue

Hardcoded sampling parameters override GenArgs.

The code hardcodes temperature and top_p values instead of using the ones from GenArgs, making those parameters in GenArgs effectively unused.

Current Code (Diff):

- temp = 0.7 - top_p = 0.95 - probs = torch.softmax(logits / temp, dim=-1) - next_token = sample_utils.top_p(probs, top_p) + temp = self.gen_args.temperature + top_p = self.gen_args.top_p + probs = torch.softmax(logits / temp, dim=-1) + next_token = sample_utils.top_p(probs, top_p)

📝 Committable suggestion

‼️ IMPORTANT
Trust, but verify! 🕵️ Please review this suggestion with the care of a code archaeologist - check that it perfectly replaces the highlighted code, preserves all lines, maintains proper indentation, and won't break anything in production. Your future self will thank you! 🚀

Suggested change

if use_sampling:

temp = 0.7

top_p = 0.95

probs = torch.softmax(logits / temp, dim=-1)

next_token = sample_utils.top_p(probs, top_p)

temp = self.gen_args.temperature

top_p = self.gen_args.top_p

probs = torch.softmax(logits / temp, dim=-1)

next_token = sample_utils.top_p(probs, top_p)

codeowlai · 2025-06-23T18:30:07Z

gpu/generate.py

+            if use_sampling:
+                temp = 0.7
+                top_p = 0.95
+                probs = torch.softmax(logits / temp, dim=-1)
+                next_token = sample_utils.top_p(probs, top_p)


🐛 Correctness Issue

Duplicate hardcoded sampling parameters.

The code duplicates the hardcoded temperature and top_p values in the generation loop, ignoring the GenArgs parameters.

Current Code (Diff):

- temp = 0.7 - top_p = 0.95 - probs = torch.softmax(logits / temp, dim=-1) - next_token = sample_utils.top_p(probs, top_p) + temp = self.gen_args.temperature + top_p = self.gen_args.top_p + probs = torch.softmax(logits / temp, dim=-1) + next_token = sample_utils.top_p(probs, top_p)

📝 Committable suggestion

‼️ IMPORTANT
Trust, but verify! 🕵️ Please review this suggestion with the care of a code archaeologist - check that it perfectly replaces the highlighted code, preserves all lines, maintains proper indentation, and won't break anything in production. Your future self will thank you! 🚀

Suggested change

if use_sampling:

temp = 0.7

top_p = 0.95

probs = torch.softmax(logits / temp, dim=-1)

next_token = sample_utils.top_p(probs, top_p)

temp = self.gen_args.temperature

top_p = self.gen_args.top_p

probs = torch.softmax(logits / temp, dim=-1)

next_token = sample_utils.top_p(probs, top_p)

codeowlai · 2025-06-23T18:30:11Z

gpu/model.py

+)
+
+import ctypes
+bitnet_lib = ctypes.CDLL('bitnet_kernels/libbitnet.so')


🐛 Correctness Issue

Unsafe Library Loading Without Error Handling.

Loading a shared library with a hardcoded path without error handling will crash the application if the library is missing or incompatible.

Current Code (Diff):

- bitnet_lib = ctypes.CDLL('bitnet_kernels/libbitnet.so') + try: + bitnet_lib = ctypes.CDLL('bitnet_kernels/libbitnet.so') + except OSError as e: + raise ImportError(f"Failed to load BitNet library: {e}. Please ensure 'bitnet_kernels/libbitnet.so' is available.")

📝 Committable suggestion

‼️ IMPORTANT
Trust, but verify! 🕵️ Please review this suggestion with the care of a code archaeologist - check that it perfectly replaces the highlighted code, preserves all lines, maintains proper indentation, and won't break anything in production. Your future self will thank you! 🚀

Suggested change

bitnet_lib = ctypes.CDLL('bitnet_kernels/libbitnet.so')

try:

bitnet_lib = ctypes.CDLL('bitnet_kernels/libbitnet.so')

except OSError as e:

raise ImportError(f"Failed to load BitNet library: {e}. Please ensure 'bitnet_kernels/libbitnet.so' is available.")

codeowlai · 2025-06-23T18:30:14Z

gpu/pack_weight.py

+    return int8_weight
+
+
+def interleave_weight_int8(qweight, nbits=2):\


🐛 Correctness Issue

Syntax Error: Trailing backslash in function definition.

The function definition has a trailing backslash that will cause a Python syntax error when the code is executed, breaking any dependent code.

Current Code (Diff):

- def interleave_weight_int8(qweight, nbits=2):\ + def interleave_weight_int8(qweight, nbits=2):

📝 Committable suggestion

‼️ IMPORTANT
Trust, but verify! 🕵️ Please review this suggestion with the care of a code archaeologist - check that it perfectly replaces the highlighted code, preserves all lines, maintains proper indentation, and won't break anything in production. Your future self will thank you! 🚀

Suggested change

def interleave_weight_int8(qweight, nbits=2):\

def interleave_weight_int8(qweight, nbits=2):

🔄 Dependencies Affected

gpu/convert_checkpoint.py

Function: convert_int8_to_int2

Issue: This function calls convert_weight_int8_to_int2 which uses interleave_weight_int8, and will fail due to the syntax error

Suggestion: No change needed in this file, fix the syntax error in the original function

gpu/test.py

Function: Unknown

Issue: Line 44 uses convert_weight_int8_to_int2 which calls interleave_weight_int8, and will fail due to the syntax error

Suggestion: No change needed in this file, fix the syntax error in the original function

codeowlai · 2025-06-23T18:30:18Z

gpu/sample_utils.py

+    """
+    probs_sort, probs_idx = torch.sort(probs, dim=-1, descending=True)
+    probs_sum = torch.cumsum(probs_sort, dim=-1)
+    mask = probs_sum - probs_sort > p


🐛 Correctness Issue

Incorrect mask logic for top-p sampling.

The mask condition is reversed, causing the algorithm to exclude tokens that should be included in the sampling pool, which will lead to incorrect text generation results.

Current Code (Diff):

- mask = probs_sum - probs_sort > p + mask = probs_sum > p

📝 Committable suggestion

‼️ IMPORTANT
Trust, but verify! 🕵️ Please review this suggestion with the care of a code archaeologist - check that it perfectly replaces the highlighted code, preserves all lines, maintains proper indentation, and won't break anything in production. Your future self will thank you! 🚀

Suggested change

mask = probs_sum - probs_sort > p

mask = probs_sum > p

🔄 Dependencies Affected

gpu/generate.py

Function: generate_text

Issue: Text generation will produce incorrect results due to improper token filtering in top_p sampling

Suggestion: No change needed in the dependency, fix the implementation in sample_utils.py

codeowlai · 2025-06-23T18:30:21Z

gpu/stats.py

+    time: float
+
+    def show(self) -> str:
+        tps = self.tokens / self.time


🐛 Correctness Issue

Division by Zero Risk.

The code divides by self.time without checking if it's zero, which would cause a runtime crash when displaying statistics.

Current Code (Diff):

- tps = self.tokens / self.time + tps = self.tokens / self.time if self.time > 0 else 0

📝 Committable suggestion

‼️ IMPORTANT
Trust, but verify! 🕵️ Please review this suggestion with the care of a code archaeologist - check that it perfectly replaces the highlighted code, preserves all lines, maintains proper indentation, and won't break anything in production. Your future self will thank you! 🚀

Suggested change

tps = self.tokens / self.time

tps = self.tokens / self.time if self.time > 0 else 0

codeowlai · 2025-06-23T18:30:24Z

gpu/test.py

+torch.manual_seed(42)
+np.random.seed(42)
+
+bitnet_lib = ctypes.CDLL('bitnet_kernels/libbitnet.so')


🐛 Correctness Issue

No error handling for C library loading.

Loading the C library without error handling will crash the application if the library file doesn't exist at the specified path.

Current Code (Diff):

- bitnet_lib = ctypes.CDLL('bitnet_kernels/libbitnet.so') + try: + bitnet_lib = ctypes.CDLL('bitnet_kernels/libbitnet.so') + except OSError as e: + raise RuntimeError(f"Failed to load BitNet library: {e}")

📝 Committable suggestion

‼️ IMPORTANT
Trust, but verify! 🕵️ Please review this suggestion with the care of a code archaeologist - check that it perfectly replaces the highlighted code, preserves all lines, maintains proper indentation, and won't break anything in production. Your future self will thank you! 🚀

Suggested change

bitnet_lib = ctypes.CDLL('bitnet_kernels/libbitnet.so')

try:

bitnet_lib = ctypes.CDLL('bitnet_kernels/libbitnet.so')

except OSError as e:

raise RuntimeError(f"Failed to load BitNet library: {e}")

codeowlai · 2025-06-23T18:30:28Z

gpu/test.py

+    N = input1.shape[0]
+    K = input1.shape[1] * 4
+
+    bitnet_lib.bitlinear_int8xint2(*[ctypes.c_void_p(input0.data_ptr()), ctypes.c_void_p(input1.data_ptr()), ctypes.c_void_p(ret.data_ptr()), ctypes.c_void_p(s.data_ptr()), ctypes.c_void_p(ws.data_ptr()), ctypes.c_int(M), ctypes.c_int(N), ctypes.c_int(K), ctypes.c_void_p(stream.cuda_stream)])


🐛 Correctness Issue

No error handling for C library function call.

The C library function call has no error checking, which could lead to silent failures or crashes if the function encounters an error.

Current Code (Diff):

- bitnet_lib.bitlinear_int8xint2(*[ctypes.c_void_p(input0.data_ptr()), ctypes.c_void_p(input1.data_ptr()), ctypes.c_void_p(ret.data_ptr()), ctypes.c_void_p(s.data_ptr()), ctypes.c_void_p(ws.data_ptr()), ctypes.c_int(M), ctypes.c_int(N), ctypes.c_int(K), ctypes.c_void_p(stream.cuda_stream)]) + result = bitnet_lib.bitlinear_int8xint2(*[ctypes.c_void_p(input0.data_ptr()), ctypes.c_void_p(input1.data_ptr()), ctypes.c_void_p(ret.data_ptr()), ctypes.c_void_p(s.data_ptr()), ctypes.c_void_p(ws.data_ptr()), ctypes.c_int(M), ctypes.c_int(N), ctypes.c_int(K), ctypes.c_void_p(stream.cuda_stream)]) + if result != 0: # Assuming 0 is success, adjust based on actual API + raise RuntimeError(f"BitNet library call failed with error code: {result}")

📝 Committable suggestion

‼️ IMPORTANT
Trust, but verify! 🕵️ Please review this suggestion with the care of a code archaeologist - check that it perfectly replaces the highlighted code, preserves all lines, maintains proper indentation, and won't break anything in production. Your future self will thank you! 🚀

Suggested change

bitnet_lib.bitlinear_int8xint2(*[ctypes.c_void_p(input0.data_ptr()), ctypes.c_void_p(input1.data_ptr()), ctypes.c_void_p(ret.data_ptr()), ctypes.c_void_p(s.data_ptr()), ctypes.c_void_p(ws.data_ptr()), ctypes.c_int(M), ctypes.c_int(N), ctypes.c_int(K), ctypes.c_void_p(stream.cuda_stream)])

result = bitnet_lib.bitlinear_int8xint2(*[ctypes.c_void_p(input0.data_ptr()), ctypes.c_void_p(input1.data_ptr()), ctypes.c_void_p(ret.data_ptr()), ctypes.c_void_p(s.data_ptr()), ctypes.c_void_p(ws.data_ptr()), ctypes.c_int(M), ctypes.c_int(N), ctypes.c_int(K), ctypes.c_void_p(stream.cuda_stream)])

if result != 0: # Assuming 0 is success, adjust based on actual API

raise RuntimeError(f"BitNet library call failed with error code: {result}")

codeowlai · 2025-06-23T18:30:31Z

gpu/tokenizer.py

+        # tokens.extend(self.tokenizer.encode("\n\n", bos=False, eos=False))
+        return tokens
+
+    def encode_message(self, message: Message, return_target=False) -> List[int]:


🐛 Correctness Issue

Incorrect return type annotation.

The function is annotated to return List[int] but actually returns a tuple (List[int], List[int] or None), which could cause type errors when used by other components.

Current Code (Diff):

- def encode_message(self, message: Message, return_target=False) -> List[int]: + def encode_message(self, message: Message, return_target=False) -> Union[List[int], tuple[List[int], Union[List[int], None]]]:

📝 Committable suggestion

‼️ IMPORTANT
Trust, but verify! 🕵️ Please review this suggestion with the care of a code archaeologist - check that it perfectly replaces the highlighted code, preserves all lines, maintains proper indentation, and won't break anything in production. Your future self will thank you! 🚀

Suggested change

def encode_message(self, message: Message, return_target=False) -> List[int]:

def encode_message(self, message: Message, return_target=False) -> Union[List[int], tuple[List[int], Union[List[int], None]]]:

codeowlai · 2025-06-23T18:30:34Z

gpu/tokenizer.py

+
+        return tokens, None
+
+    def encode_dialog_prompt(self, dialog: Dialog, completion=False, return_target=False) -> List[int]:


🐛 Correctness Issue

Incorrect return type annotation.

The function is annotated to return List[int] but can also return a tuple (List[int], List[int]) when return_target=True, which could cause type errors.

Current Code (Diff):

- def encode_dialog_prompt(self, dialog: Dialog, completion=False, return_target=False) -> List[int]: + def encode_dialog_prompt(self, dialog: Dialog, completion=False, return_target=False) -> Union[List[int], tuple[List[int], List[int]]]:

📝 Committable suggestion

‼️ IMPORTANT
Trust, but verify! 🕵️ Please review this suggestion with the care of a code archaeologist - check that it perfectly replaces the highlighted code, preserves all lines, maintains proper indentation, and won't break anything in production. Your future self will thank you! 🚀

Suggested change

def encode_dialog_prompt(self, dialog: Dialog, completion=False, return_target=False) -> List[int]:

def encode_dialog_prompt(self, dialog: Dialog, completion=False, return_target=False) -> Union[List[int], tuple[List[int], List[int]]]:

codeowlai · 2025-06-23T18:30:38Z

run_inference_server.py

+        if not os.path.exists(server_path):
+            server_path = os.path.join(build_dir, "bin", "llama-server")
+    else:
+        server_path = os.path.join(build_dir, "bin", "llama-server")


🐛 Correctness Issue

Missing server executable validation.

The script doesn't verify if the server executable exists before attempting to run it, which will cause a runtime error if the file is missing.

Current Code (Diff):

- server_path = os.path.join(build_dir, "bin", "llama-server") + server_path = os.path.join(build_dir, "bin", "llama-server") + + if not os.path.exists(server_path): + print(f"Error: Server executable not found at {server_path}") + sys.exit(1)

📝 Committable suggestion

‼️ IMPORTANT
Trust, but verify! 🕵️ Please review this suggestion with the care of a code archaeologist - check that it perfectly replaces the highlighted code, preserves all lines, maintains proper indentation, and won't break anything in production. Your future self will thank you! 🚀

Suggested change

server_path = os.path.join(build_dir, "bin", "llama-server")

server_path = os.path.join(build_dir, "bin", "llama-server")

if not os.path.exists(server_path):

print(f"Error: Server executable not found at {server_path}")

sys.exit(1)

codeowlai · 2025-06-23T18:30:41Z

setup_env.py

        exit(0)
    logging.info("Compiling the code using CMake.")
-    run_command(["cmake", "-B", "build", *COMPILER_EXTRA_ARGS[arch], *OS_EXTRA_ARGS.get(platform.system(), [])], log_step="generate_build_files")
+    run_command(["cmake", "-B", "build", *COMPILER_EXTRA_ARGS[arch], *OS_EXTRA_ARGS.get(platform.system(), []), "-DCMAKE_C_COMPILER=clang", "-DCMAKE_CXX_COMPILER=clang++"], log_step="generate_build_files")


🐛 Correctness Issue

Hardcoded compiler breaks cross-platform compatibility.

Forcing clang/clang++ as compilers will break builds on systems where these compilers aren't available or where specific compiler requirements exist.

Current Code (Diff):

- run_command(["cmake", "-B", "build", *COMPILER_EXTRA_ARGS[arch], *OS_EXTRA_ARGS.get(platform.system(), []), "-DCMAKE_C_COMPILER=clang", "-DCMAKE_CXX_COMPILER=clang++"], log_step="generate_build_files") + run_command(["cmake", "-B", "build", *COMPILER_EXTRA_ARGS[arch], *OS_EXTRA_ARGS.get(platform.system(), [])], log_step="generate_build_files")

📝 Committable suggestion

‼️ IMPORTANT
Trust, but verify! 🕵️ Please review this suggestion with the care of a code archaeologist - check that it perfectly replaces the highlighted code, preserves all lines, maintains proper indentation, and won't break anything in production. Your future self will thank you! 🚀

Suggested change

run_command(["cmake", "-B", "build", *COMPILER_EXTRA_ARGS[arch], *OS_EXTRA_ARGS.get(platform.system(), []), "-DCMAKE_C_COMPILER=clang", "-DCMAKE_CXX_COMPILER=clang++"], log_step="generate_build_files")

run_command(["cmake", "-B", "build", *COMPILER_EXTRA_ARGS[arch], *OS_EXTRA_ARGS.get(platform.system(), [])], log_step="generate_build_files")

sd983527 and others added 13 commits April 15, 2025 17:07

Update README.md (microsoft#172)

8f75f99

add two FAQs for windows build requestions.

Update README.md

1c77bd8

add third-party demo

71fdd94

Merge pull request microsoft#175 from microsoft/readme-dev

034b34c

add third-party demo

refine readme

874e6bd

Merge pull request microsoft#176 from microsoft/readme-dev

fd9f1d6

refine readme

Fix model architecture name

488dc1e

Merge pull request microsoft#212 from microsoft/arch-name-dev

c17d1c5

Fix model architecture name

Init gpu branch

154c92b

Merge pull request microsoft#266 from microsoft/gpu-dev

6c2c08f

refine readme for gpu kernel

6197e9f

codeowlai bot reviewed Jun 23, 2025

View reviewed changes

	config = [k for k in transformer_configs if k in name.upper() or k in name]
	config = [k for k in transformer_configs if k == name.upper() or k == name]

	"--output", type=str, default="./checkpoints/model_state.pt",
	"--output", type=str, default="./model_state.pt",

	tokenizer = Tokenizer("./tokenizer.model")
	tokenizer = Tokenizer(tokenizer_path if tokenizer_path is not None else "./tokenizer.model")

	kv_seqlen.add_(kv_seqlen < max_seq_length)
	kv_seqlen.add_((kv_seqlen < max_seq_length).to(torch.int))

		return int8_weight


		def interleave_weight_int8(qweight, nbits=2):\

	tps = self.tokens / self.time
	tps = self.tokens / self.time if self.time > 0 else 0

-    bitnet_lib.bitlinear_int8xint2(*[ctypes.c_void_p(input0.data_ptr()), ctypes.c_void_p(input1.data_ptr()), ctypes.c_void_p(ret.data_ptr()), ctypes.c_void_p(s.data_ptr()), ctypes.c_void_p(ws.data_ptr()), ctypes.c_int(M), ctypes.c_int(N), ctypes.c_int(K), ctypes.c_void_p(stream.cuda_stream)])
+    result = bitnet_lib.bitlinear_int8xint2(*[ctypes.c_void_p(input0.data_ptr()), ctypes.c_void_p(input1.data_ptr()), ctypes.c_void_p(ret.data_ptr()), ctypes.c_void_p(s.data_ptr()), ctypes.c_void_p(ws.data_ptr()), ctypes.c_int(M), ctypes.c_int(N), ctypes.c_int(K), ctypes.c_void_p(stream.cuda_stream)])
+    if result != 0:  # Assuming 0 is success, adjust based on actual API
+        raise RuntimeError(f"BitNet library call failed with error code: {result}")

	def encode_message(self, message: Message, return_target=False) -> List[int]:
	def encode_message(self, message: Message, return_target=False) -> Union[List[int], tuple[List[int], Union[List[int], None]]]:


		return tokens, None

		def encode_dialog_prompt(self, dialog: Dialog, completion=False, return_target=False) -> List[int]:

	run_command(["cmake", "-B", "build", COMPILER_EXTRA_ARGS[arch], OS_EXTRA_ARGS.get(platform.system(), []), "-DCMAKE_C_COMPILER=clang", "-DCMAKE_CXX_COMPILER=clang++"], log_step="generate_build_files")
	run_command(["cmake", "-B", "build", COMPILER_EXTRA_ARGS[arch], OS_EXTRA_ARGS.get(platform.system(), [])], log_step="generate_build_files")

Feature implementation from commits 0e7dadb..6197e9f #1

Are you sure you want to change the base?

Feature implementation from commits 0e7dadb..6197e9f #1

Uh oh!

Conversation

yashuatla commented Jun 23, 2025

Uh oh!

codeowlai bot Jun 23, 2025

Choose a reason for hiding this comment

Uh oh!

codeowlai bot Jun 23, 2025

Choose a reason for hiding this comment

Uh oh!

codeowlai bot Jun 23, 2025

Choose a reason for hiding this comment

Uh oh!

codeowlai bot Jun 23, 2025

Choose a reason for hiding this comment

Uh oh!

codeowlai bot Jun 23, 2025

Choose a reason for hiding this comment

Uh oh!

codeowlai bot Jun 23, 2025

Choose a reason for hiding this comment

Uh oh!

codeowlai bot Jun 23, 2025

Choose a reason for hiding this comment

Uh oh!

codeowlai bot Jun 23, 2025

Choose a reason for hiding this comment

🔄 Dependencies Affected

gpu/convert_checkpoint.py

gpu/test.py

Uh oh!

codeowlai bot Jun 23, 2025

Choose a reason for hiding this comment

🔄 Dependencies Affected

gpu/generate.py

Uh oh!

codeowlai bot Jun 23, 2025

Choose a reason for hiding this comment

Uh oh!

codeowlai bot Jun 23, 2025

Choose a reason for hiding this comment

Uh oh!

codeowlai bot Jun 23, 2025

Choose a reason for hiding this comment

Uh oh!

codeowlai bot Jun 23, 2025

Choose a reason for hiding this comment

Uh oh!

codeowlai bot Jun 23, 2025

Choose a reason for hiding this comment

Uh oh!

codeowlai bot Jun 23, 2025

Choose a reason for hiding this comment

Uh oh!

codeowlai bot Jun 23, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants