An extension of Llama 2 that supports a context of up to 128k tokens.
7b
13b
74.2K Pulls Updated 12 months ago
Updated 12 months ago
12 months ago
6f754622c8a4 · 5.1GB
model
archllama
·
parameters6.74B
·
quantizationQ5_1
5.1GB
params
{"num_ctx":131072}
18B
Readme
Yarn Llama 2 is a model based on Llama2 that extends its context size up to 128k context. It is developed by Nous Research by implementing the YaRN method to further train the model to support larger context windows.
CLI
64k context size:
ollama run yarn-llama2
128k context size:
ollama run yarn-llama2:7b-128k
API
Example:
curl -X POST http://localhost:11434/api/generate -d '{
"model": "yarn-llama2:7b-128k",
"prompt":"Here is a story about llamas eating grass"
}'
References
YaRN: Efficient Context Window Extension of Large Language Models