95 1 year ago

Zephyr is a series of language models that are trained to act as helpful assistants.

ollama run emsi/zephyr-orpo-141b-a35b-v0.1

Details

2 years ago

a8c43c0fb6b5 · 80GB ·

llama
·
141B
·
Q4_0
{ "stop": [ "<|system|>", "<|user|>", "<|assistant|>", "</s>"
{{- if .System }} <|system|> {{ .System }} </s> {{- end }} <|user|> {{ .Prompt }} </s> <|assistant|>

Readme

Model Card for Zephyr 141B-A35B

Zephyr

Zephyr is a series of language models that are trained to act as helpful assistants. Zephyr 141B-A35B is the latest model in the series, and is a fine-tuned version of mistral-community/Mixtral-8x22B-v0.1 that was trained using a novel alignment algorithm called Odds Ratio Preference Optimization (ORPO) with 7k instances for 1.3 hours on 4 nodes of 8 x H100s. ORPO does not require an SFT step to achieve high performance and is thus much more computationally efficient than methods like DPO and PPO. To train Zephyr-141B-A35B, we used the argilla/distilabel-capybara-dpo-7k-binarized preference dataset, which consists of synthetic, high-quality, multi-turn preferences that have been scored via LLMs.

Converted from: https://huggingface.co/HuggingFaceH4/zephyr-orpo-141b-A35b-v0.1