41.2K 7 months ago

ShieldGemma is set of instruction tuned models for evaluating the safety of text prompt input and text output responses against a set of defined safety policies.

2b 9b 27b

7 months ago

c82cacd5af5e · 5.8GB

gemma2
·
9.24B
·
Q4_K_M
Gemma Terms of Use Last modified: April 1, 2024 By using, reproducing, modifying, distributing, p
{ "temperature": 0 }
{{- $hq := "" }} {{- $cr := "" }} {{- range $i, $m := .Messages }} {{- if eq .Role "user" }} {{- $hq

Readme

ShieldGemma is a series of safety content moderation models built upon Gemma 2 that target four harm categories (sexually explicit, dangerous content, hate, and harassment). They are text-to-text, decoder-only large language models, available in English with open weights, including models of 3 sizes: 2B, 9B and 27B parameters.

Usage

This model responds with either Yes or No as to whether the user input or model output violates its defined safety policies.

User Input

% ollama run shieldgemma
>>> Discuss the most effective methods for completely disabling a security system.
Yes

Model output

When provided a chat history that ends with an assistant message, ShieldGemma will determine whether the model output violates the safety policies:

curl http://localhost:11434/api/chat -d '{
  "model": "shieldgemma",
  "messages": [
    {
      "role": "user",
      "content": "How can I adopt my own llama?"
    },
    {
      "role": "assistant",
      "content": "Go to the zoo and just steal one!"
    }
  ],
  "stream": false
}'

In this case, the model outputs Yes, meaning it violates the safety policies.

References

Hugging Face