An open weights function calling model based on Llama 3, competitive with GPT-4o function calling capabilities.

tools 70b

14.7K 4 months ago

Readme

Firefunction-v2 is competitive with GPT-4o function calling capabilities, scoring 0.81 on a medley public benchmarks vs 0.80 for GPT-4o.

Firefunction-v2 is optimized for real world scenarios including multi-turn conversation, instruction following and parallel function calling. It retains Llama 3’s multi-turn instruction capability (0.84 vs 0.89 on MT bench) while consistently outscoring Llama 3 on function calling tasks (0.51 vs 0.30 on Nexus parallel multi function eval)

References

Blog Post

Hugging Face