6868class EvalTask :
6969 """A class representing an EvalTask.
7070
71- An Evaluation Tasks is defined to measure the model's ability to perform a
72- certain task in response to specific prompts or inputs. Evaluation tasks must
73- contain an evaluation dataset, and a list of metrics to evaluate. Evaluation
74- tasks help developers compare prompt templates, track experiments, compare
75- models and their settings, and assess the quality of the model's generated
76- text.
71+ An evaluation task assesses the ability of a Gen AI model, agent or
72+ application to perform a specific task in response to prompts.
73+ Each evaluation task includes an evaluation dataset, which can be a set of
74+ test cases and a set of metrics for assessment. These tasks provide the
75+ framework for running evaluations in a standardized and repeatable way,
76+ allowing for comparative assessment with varying run-specific parameters.
77+
7778
7879 Dataset Details:
7980
@@ -82,6 +83,8 @@ class EvalTask:
8283 * reference_column_name: "reference"
8384 * response_column_name: "response"
8485 * baseline_model_response_column_name: "baseline_model_response"
86+ * rubrics_column_name: "rubrics"
87+
8588
8689 Requirement for different use cases:
8790 * Bring-your-own-response (BYOR): You already have the data that you
@@ -94,14 +97,14 @@ class EvalTask:
9497 `baseline_model_response` column is present while the
9598 corresponding model is specified, an error will be raised.
9699
97- * Perform model inference without a prompt template: You have a dataset
98- containing the input prompts to the model and want to perform model
100+ * Perform model/agent inference without a prompt template: You have a dataset
101+ containing the input prompts to the model/agent and want to perform
99102 inference before evaluation. A column named `prompt` is required
100- in the evaluation dataset and is used directly as input to the model.
103+ in the evaluation dataset and is used directly as input to the model/agent .
101104
102- * Perform model inference with a prompt template: You have a dataset
105+ * Perform model/agent inference with a prompt template: You have a dataset
103106 containing the input variables to the prompt template and want to
104- assemble the prompts for model inference. Evaluation dataset
107+ assemble the prompts for inference. Evaluation dataset
105108 must contain column names corresponding to the variable names in
106109 the prompt template. For example, if prompt template is
107110 "Instruction: {instruction}, context: {context}", the dataset must
@@ -111,9 +114,7 @@ class EvalTask:
111114
112115 The supported metrics descriptions, rating rubrics, and the required
113116 input variables can be found on the Vertex AI public documentation page.
114- [Evaluation methods and metrics](
115- https://cloud.google.com/vertex-ai/generative-ai/docs/models/determine-eval
116- ).
117+ [Evaluation methods and metrics](https://cloud.google.com/vertex-ai/generative-ai/docs/models/determine-eval).
117118
118119 Usage Examples:
119120
@@ -143,7 +144,7 @@ class EvalTask:
143144 ```
144145
145146 2. To perform evaluation with Gemini model inference, specify the `model`
146- parameter with a GenerativeModel instance. The input column name to the
147+ parameter with a ` GenerativeModel` instance. The input column name to the
147148 model is `prompt` and must be present in the dataset.
148149
149150 ```
@@ -209,8 +210,8 @@ def custom_model_fn(input: str) -> str:
209210 ```
210211
211212 5. To perform pairwise metric evaluation with model inference step, specify
212- the `baseline_model` input to a PairwiseMetric instance and the candidate
213- `model` input to the EvalTask.evaluate() function. The input column name
213+ the `baseline_model` input to a ` PairwiseMetric` instance and the candidate
214+ `model` input to the ` EvalTask.evaluate()` function. The input column name
214215 to both models is `prompt` and must be present in the dataset.
215216
216217 ```
@@ -221,7 +222,7 @@ def custom_model_fn(input: str) -> str:
221222 metric_prompt_template=MetricPromptTemplateExamples.get_prompt_template(
222223 "pairwise_groundedness"
223224 ),
224- baseline_model=baseline_model
225+ baseline_model=baseline_model,
225226 )
226227 eval_dataset = pd.DataFrame({
227228 "prompt" : [...],
@@ -232,7 +233,7 @@ def custom_model_fn(input: str) -> str:
232233 experiment="my-pairwise-experiment",
233234 ).evaluate(
234235 model=candidate_model,
235- experiment_run_name="gemini-pairwise-eval-run"
236+ experiment_run_name="gemini-pairwise-eval-run",
236237 )
237238 ```
238239 """
0 commit comments