Size of data for continued pretraining

Hello!

Thank you so much for developing and releasing this model to the public. As a native Arabic speaker, I highly appreciate your efforts in enriching our beautiful language.

I have the following question related to the training process:

As per my understanding, the first step is continued pretending of Llama2 on Arabic data in a self supervised manner. 
My question is how big is the data used in this step?


Thanks in advance. 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Size of data for continued pretraining #7

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Size of data for continued pretraining #7

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions