llama.cpp/grammars
Olivier Chafik ab9a3240a9
JSON schema conversion: ️ faster repetitions, min/maxLength for strings, cap number length (#6555)
* json: rename python schema converter to make import easier

* server: skip null json_schema / grammar fields

* json: deps management for primitive rules (+ allow null values)

* json: optimize repetitions for minItems/maxItems and regexps: `a{,3}` goes from `"a"? "a"? "a"?` (explosive combos) to `(a (a (a)?)?)?`

* grammars: add troubleshooting section to readme

* json: cap length of numbers to 15 digits before/after decimal point

(avoids infinite gen, e.g. "one third" -> `0.333333333333...`)

* json: unify all repetition code (w/ or w/o sep)

* json: support string minLength/maxLength

* server+json: update server/README w/ result_format

* nits

* json: fix type error w/ python 3.8

* json: fix server/README (json_schema in /completion vs. result_format in /v1/chat/completions)

* json: simplify DOT `{"type": "string", "pattern": "^.$"}`

* json: remove recursion in opt_repetitions (avoids Python stack overflow)

* json: rm dead code

* json: rm useless assert & ggml.h import
2024-04-12 19:43:38 +01:00
..
README.md JSON schema conversion: ️ faster repetitions, min/maxLength for strings, cap number length (#6555) 2024-04-12 19:43:38 +01:00
arithmetic.gbnf llama : add grammar-based sampling (#1773) 2023-07-23 23:58:10 -04:00
c.gbnf examples : add C grammar (#2357) 2023-09-01 16:32:14 +03:00
chess.gbnf llama : add grammar-based sampling (#1773) 2023-07-23 23:58:10 -04:00
japanese.gbnf llama : add grammar-based sampling (#1773) 2023-07-23 23:58:10 -04:00
json.gbnf grammars : blacklists character control set (#5888) 2024-03-05 18:33:08 +02:00
json_arr.gbnf grammars : blacklists character control set (#5888) 2024-03-05 18:33:08 +02:00
list.gbnf llama : add grammar-based sampling (#1773) 2023-07-23 23:58:10 -04:00

README.md

GBNF Guide

GBNF (GGML BNF) is a format for defining formal grammars to constrain model outputs in llama.cpp. For example, you can use it to force the model to generate valid JSON, or speak only in emojis. GBNF grammars are supported in various ways in examples/main and examples/server.

Background

Bakus-Naur Form (BNF) is a notation for describing the syntax of formal languages like programming languages, file formats, and protocols. GBNF is an extension of BNF that primarily adds a few modern regex-like features.

Basics

In GBNF, we define production rules that specify how a non-terminal (rule name) can be replaced with sequences of terminals (characters, specifically Unicode code points) and other non-terminals. The basic format of a production rule is nonterminal ::= sequence....

Example

Before going deeper, let's look at some of the features demonstrated in grammars/chess.gbnf, a small chess notation grammar:

# `root` specifies the pattern for the overall output
root ::= (
    # it must start with the characters "1. " followed by a sequence
    # of characters that match the `move` rule, followed by a space, followed
    # by another move, and then a newline
    "1. " move " " move "\n"

    # it's followed by one or more subsequent moves, numbered with one or two digits
    ([1-9] [0-9]? ". " move " " move "\n")+
)

# `move` is an abstract representation, which can be a pawn, nonpawn, or castle.
# The `[+#]?` denotes the possibility of checking or mate signs after moves
move ::= (pawn | nonpawn | castle) [+#]?

pawn ::= ...
nonpawn ::= ...
castle ::= ...

Non-Terminals and Terminals

Non-terminal symbols (rule names) stand for a pattern of terminals and other non-terminals. They are required to be a dashed lowercase word, like move, castle, or check-mate.

Terminals are actual characters (code points). They can be specified as a sequence like "1" or "O-O" or as ranges like [1-9] or [NBKQR].

Characters and character ranges

Terminals support the full range of Unicode. Unicode characters can be specified directly in the grammar, for example hiragana ::= [ぁ-ゟ], or with escapes: 8-bit (\xXX), 16-bit (\uXXXX) or 32-bit (\UXXXXXXXX).

Character ranges can be negated with ^:

single-line ::= [^\n]+ "\n"`

Sequences and Alternatives

The order of symbols in a sequence matter. For example, in "1. " move " " move "\n", the "1. " must come before the first move, etc.

Alternatives, denoted by |, give different sequences that are acceptable. For example, in move ::= pawn | nonpawn | castle, move can be a pawn move, a nonpawn move, or a castle.

Parentheses () can be used to group sequences, which allows for embedding alternatives in a larger rule or applying repetition and optional symbols (below) to a sequence.

Repetition and Optional Symbols

  • * after a symbol or sequence means that it can be repeated zero or more times.
  • + denotes that the symbol or sequence should appear one or more times.
  • ? makes the preceding symbol or sequence optional.

Comments and newlines

Comments can be specified with #:

# defines optional whitespace
ws ::= [ \t\n]+

Newlines are allowed between rules and between symbols or sequences nested inside parentheses. Additionally, a newline after an alternate marker | will continue the current rule, even outside of parentheses.

The root rule

In a full grammar, the root rule always defines the starting point of the grammar. In other words, it specifies what the entire output must match.

# a grammar for lists
root ::= ("- " item)+
item ::= [^\n]+ "\n"

Next steps

This guide provides a brief overview. Check out the GBNF files in this directory (grammars/) for examples of full grammars. You can try them out with:

./main -m <model> --grammar-file grammars/some-grammar.gbnf -p 'Some prompt'

Troubleshooting

Grammars currently have performance gotchas (see https://github.com/ggerganov/llama.cpp/issues/4218).

Efficient optional repetitions

A common pattern is to allow repetitions of a pattern x up to N times.

While semantically correct, the syntax x? x? x?.... x? (with N repetitions) will result in extremely slow inference. Instead, you can write (x (x (x ... (x)?...)?)?)? (w/ N-deep nesting)