MMLM (Multi-Modality Learning Model)

Introduction

data format(input/output):
instruction <audio> audio file path </audio> instruction <image> image file path </image> instruction

model.py: Define the MMLM module, including the weighted sum of audio/visual input features.

patch_llama_model.py:
1. Add audio and vision token to the tokenizer of Llama-3.2-3B-Instruct. View this file for details about the added tokens.
2. Push the llama with new tokenizer to huggingface.

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
mmlm		mmlm
patch		patch
.gitignore		.gitignore
README.md		README.md
__init__.py		__init__.py
example.py		example.py
example_asr.py		example_asr.py
example_tts.py		example_tts.py
requirements.txt		requirements.txt
setup.py		setup.py
test.py		test.py
train_asr.py		train_asr.py