The document introduces BLOOMChat 176B, a multilingual chat-based LLM. The model was developed by combining synthetic conversation data and high-quality human-written examples, and then trained on human-generated datasets Dolly 2.0 and OASST1. The model’s multilingual chat capability and cross-lingual task capability were evaluated through qualitative and quantitative measures. BLOOMChat achieved promising results in both experiments, surpassing other BLOOM variants and state-of-the-art open-source chat models in translation tasks. The model has limitations, such as hallucination and limited performance in generating accurate code or solving complex mathematical problems. The document also acknowledges the contributions of various researchers and open-source projects towards the development of BLOOMChat. The document provides links to the datasets used in the experiments and invites the community to discuss BLOOMChat or chat with the team.
