English to Hindi Neural Machine Translation Model

Overview

This repository contains a Neural Machine Translation (NMT) model designed to translate English sentences into Hindi. The model uses an encoder-decoder architecture with Long Short-Term Memory (LSTM) layers and is trained on the Hindi-English Truncated Corpus dataset.

Key Features:

Data preprocessing for text normalization and tokenization.
Custom batch generator for memory-efficient training.
Encoder-Decoder architecture with embedding layers.
Trained using the Keras deep learning framework.

Dataset

The Hindi-English Truncated Corpus dataset is used for training the model. The dataset contains English and Hindi sentence pairs, sourced from TED talks. Only sentences with a maximum length of 20 words are used for training.

Preprocessing Steps:

Convert all text to lowercase.
Remove special characters, numbers, and extra spaces.
Add START_ and _END tokens to Hindi sentences for better decoding.

Model Architecture

The model is built using the Encoder-Decoder architecture with the following components:

Encoder:
- Embedding layer for English sentences.
- LSTM layer to generate context vectors (hidden and cell states).
Decoder:
- Embedding layer for Hindi sentences.
- LSTM layer initialized with encoder states.
- Dense layer with a softmax activation for predicting the target words.

Setup

Prerequisites

Ensure you have the following installed:

Python 3.7 or later
TensorFlow/Keras
Numpy
Pandas
Matplotlib
Seaborn

Installation

Clone the repository:

git clone https://github.com/mbithesss/Language-Translation-with-Deep-Learning.git
cd english-to-hindi-translation

Install the required libraries:
```
pip install -r requirements.txt
```

Training

The model is trained using the following parameters:

Optimizer: RMSprop
Loss Function: Categorical Crossentropy
Batch Size: 128
Epochs: 100

Checkpoints are saved after every epoch to ensure progress is not lost.

Model Checkpoints

Checkpoints are stored in checkpoints.h5 directory. To load a checkpoint:

model.load_weights('/checkpoint.h5')

Contributing

Contributions are welcome! Feel free to open issues or submit pull requests.

Steps to Contribute

Fork the repository.
Create a new branch:
```
git checkout -b feature-branch
```
Commit your changes and push them to your fork:
```
git push origin feature-branch
```
Open a pull request.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
Hindi_English_Truncated_Corpus.csv		Hindi_English_Truncated_Corpus.csv
Language_Translation_with_ML_.ipynb		Language_Translation_with_ML_.ipynb
README.md		README.md
X_test.pk1		X_test.pk1
X_train.pk1		X_train.pk1
checkpoint.h5		checkpoint.h5

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

English to Hindi Neural Machine Translation Model

Overview

Table of Contents

Dataset

Preprocessing Steps:

Model Architecture

Setup

Prerequisites

Installation

Training

Model Checkpoints

Contributing

Steps to Contribute

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

English to Hindi Neural Machine Translation Model

Overview

Table of Contents

Dataset

Preprocessing Steps:

Model Architecture

Setup

Prerequisites

Installation

Training

Model Checkpoints

Contributing

Steps to Contribute

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages