Prepare preference tuning data for Gemini models

This document describes how to define a preference tuning dataset for a Gemini model.

About preference tuning datasets

The preference tuning dataset is created to capture human preference using signals like thumbs-up-thumbs-down, pairwise, and scored feedback.

Prepare customized preference tuning data

Following is an example script to generate preference tuning training and validation datasets. In this example:

An example is composed of a contents, a pair of completions, and an optional system_instruction. The sum of maximum input and maximum completion token counts must be <= 128K.

The contents field is similar to the format as supervised fine-tuning. It supports multi-turn text data that needs to end with a user turn. It doesn't support multi-modal data at the moment.

The completions field is composed of a pair of completion and their scores. The pair must have one preferred completion and one dispreferred completion.

A completion is a single model turn that indicates the model response. The score field indicates whether the completion is preferred or dispreferred. Zero represents the dispreferred completion, while one is the preferred completion.

The scores only serve the purpose of identifying good and bad responses. The value of the score itself doesn't change the tuning behavior.

We only train on the completions turn for each example.

import json preference_examples = [] for i in range(100): system_instruction = "You are a chat bot." prompt = "What is my favorite fruit?" preferred_response = "Apple! Apple! Apple!" dispreferred_response = "Your favorite fruit is Apple." pref_example = { # System instruction is optional "system_instruction": { "parts": [ { "text": system_instruction } ] }, # Contents can be multi turns that ends with user turn "contents": [ { "role": "user", "parts": [ { "text": prompt } ] }, ], # Pair of preferred and dispreferred responses. The score need to be 0 or 1. # 0 maps pre dispreferred response and 1 maps to preferred response. "completions": [ { "score": 1.0, "completion": { "role": "model", "parts": [ { "text": preferred_response } ] } }, { "score": 0.0, "completion": { "role": "model", "parts": [ { "text": dispreferred_response } ] } } ] } preference_examples.append(json.dumps(pref_example)) # print the first 2 examples in the pref_examples for demonstration print(preference_examples[:2])

What's next

Tune Gemini models by using preference tuning.