Introducing MPT-7B: A New Standard for Open-Source, Commercially Usable LLMs

MosaicML has released a new model series called MPT (MosaicML Pretrained Transformer) to provide a commercially-usable, open-source model that matches and surpasses LLaMA-7B. The MPT model series is licensed for commercial use, trained on a large amount of data, prepared to handle extremely long inputs, optimized for fast training and inference, and equipped with highly efficient open-source training code. MPT-7B-StoryWriter-65k+ is a model designed to read and write stories with super long context lengths. MPT-7B-Chat is a conversational version of MPT-7B. MPT models are GPT-style decoder-only transformers with several improvements. MosaicML’s StreamingDataset provides a number of advantages, including arbitrary mixing of data sources. MPT models can be uploaded to the HuggingFace Hub and deployed directly on MosaicML’s Inference service. MosaicML will continue to produce foundation models of higher and higher quality. The pretraining dataset includes mC4, C4, RedPajama, and The Stack. The evaluation suite includes Jeopardy, MMLU, TriviaQA, Winograd, and Winogrande.

full article

Leave a Reply