Try Gemini 1.5 models, the latest multimodal models in Vertex AI, and see what you can build with up to a 2M token context window. Try Gemini 1.5 models, the latest multimodal models in Vertex AI, and see what you can build with up to a 2M token context window.

Method: projects.locations.evaluateInstances

Evaluates instances based on a given metric.

HTTP request

POST https://{service-endpoint}/v1/{location}:evaluateInstances

Where {service-endpoint} is one of the supported service endpoints.

Path parameters

Parameters

Parameters
`location`	`string` Required. The resource name of the Location to evaluate the instances. Format: `projects/{project}/locations/{location}`

location

string

Required. The resource name of the Location to evaluate the instances. Format: projects/{project}/locations/{location}

Request body

The request body contains data with the following structure:

JSON representation

JSON representation
{ // Union field `metric_inputs` can be only one of the following: "exactMatchInput": { object (`ExactMatchInput`) }, "bleuInput": { object (`BleuInput`) }, "rougeInput": { object (`RougeInput`) }, "fluencyInput": { object (`FluencyInput`) }, "coherenceInput": { object (`CoherenceInput`) }, "safetyInput": { object (`SafetyInput`) }, "groundednessInput": { object (`GroundednessInput`) }, "fulfillmentInput": { object (`FulfillmentInput`) }, "summarizationQualityInput": { object (`SummarizationQualityInput`) }, "pairwiseSummarizationQualityInput": { object (`PairwiseSummarizationQualityInput`) }, "summarizationHelpfulnessInput": { object (`SummarizationHelpfulnessInput`) }, "summarizationVerbosityInput": { object (`SummarizationVerbosityInput`) }, "questionAnsweringQualityInput": { object (`QuestionAnsweringQualityInput`) }, "pairwiseQuestionAnsweringQualityInput": { object (`PairwiseQuestionAnsweringQualityInput`) }, "questionAnsweringRelevanceInput": { object (`QuestionAnsweringRelevanceInput`) }, "questionAnsweringHelpfulnessInput": { object (`QuestionAnsweringHelpfulnessInput`) }, "questionAnsweringCorrectnessInput": { object (`QuestionAnsweringCorrectnessInput`) }, "toolCallValidInput": { object (`ToolCallValidInput`) }, "toolNameMatchInput": { object (`ToolNameMatchInput`) }, "toolParameterKeyMatchInput": { object (`ToolParameterKeyMatchInput`) }, "toolParameterKvMatchInput": { object (`ToolParameterKVMatchInput`) } // End of list of possible types for union field `metric_inputs`. }

{

  // Union field metric_inputs can be only one of the following:
  "exactMatchInput": {
    object (ExactMatchInput)
  },
  "bleuInput": {
    object (BleuInput)
  },
  "rougeInput": {
    object (RougeInput)
  },
  "fluencyInput": {
    object (FluencyInput)
  },
  "coherenceInput": {
    object (CoherenceInput)
  },
  "safetyInput": {
    object (SafetyInput)
  },
  "groundednessInput": {
    object (GroundednessInput)
  },
  "fulfillmentInput": {
    object (FulfillmentInput)
  },
  "summarizationQualityInput": {
    object (SummarizationQualityInput)
  },
  "pairwiseSummarizationQualityInput": {
    object (PairwiseSummarizationQualityInput)
  },
  "summarizationHelpfulnessInput": {
    object (SummarizationHelpfulnessInput)
  },
  "summarizationVerbosityInput": {
    object (SummarizationVerbosityInput)
  },
  "questionAnsweringQualityInput": {
    object (QuestionAnsweringQualityInput)
  },
  "pairwiseQuestionAnsweringQualityInput": {
    object (PairwiseQuestionAnsweringQualityInput)
  },
  "questionAnsweringRelevanceInput": {
    object (QuestionAnsweringRelevanceInput)
  },
  "questionAnsweringHelpfulnessInput": {
    object (QuestionAnsweringHelpfulnessInput)
  },
  "questionAnsweringCorrectnessInput": {
    object (QuestionAnsweringCorrectnessInput)
  },
  "toolCallValidInput": {
    object (ToolCallValidInput)
  },
  "toolNameMatchInput": {
    object (ToolNameMatchInput)
  },
  "toolParameterKeyMatchInput": {
    object (ToolParameterKeyMatchInput)
  },
  "toolParameterKvMatchInput": {
    object (ToolParameterKVMatchInput)
  }
  // End of list of possible types for union field metric_inputs.
}

Fields
Union field `metric_inputs`. Instances and specs for evaluation `metric_inputs` can be only one of the following:
`exactMatchInput`	`object (ExactMatchInput)` Auto metric instances. Instances and metric spec for exact match metric.
`bleuInput`	`object (BleuInput)` Instances and metric spec for bleu metric.
`rougeInput`	`object (RougeInput)` Instances and metric spec for rouge metric.
`fluencyInput`	`object (FluencyInput)` LLM-based metric instance. General text generation metrics, applicable to other categories. Input for fluency metric.
`coherenceInput`	`object (CoherenceInput)` Input for coherence metric.
`safetyInput`	`object (SafetyInput)` Input for safety metric.
`groundednessInput`	`object (GroundednessInput)` Input for groundedness metric.
`fulfillmentInput`	`object (FulfillmentInput)` Input for fulfillment metric.
`summarizationQualityInput`	`object (SummarizationQualityInput)` Input for summarization quality metric.
`pairwiseSummarizationQualityInput`	`object (PairwiseSummarizationQualityInput)` Input for pairwise summarization quality metric.
`summarizationHelpfulnessInput`	`object (SummarizationHelpfulnessInput)` Input for summarization helpfulness metric.
`summarizationVerbosityInput`	`object (SummarizationVerbosityInput)` Input for summarization verbosity metric.
`questionAnsweringQualityInput`	`object (QuestionAnsweringQualityInput)` Input for question answering quality metric.
`pairwiseQuestionAnsweringQualityInput`	`object (PairwiseQuestionAnsweringQualityInput)` Input for pairwise question answering quality metric.
`questionAnsweringRelevanceInput`	`object (QuestionAnsweringRelevanceInput)` Input for question answering relevance metric.
`questionAnsweringHelpfulnessInput`	`object (QuestionAnsweringHelpfulnessInput)` Input for question answering helpfulness metric.
`questionAnsweringCorrectnessInput`	`object (QuestionAnsweringCorrectnessInput)` Input for question answering correctness metric.
`toolCallValidInput`	`object (ToolCallValidInput)` Tool call metric instances. Input for tool call valid metric.
`toolNameMatchInput`	`object (ToolNameMatchInput)` Input for tool name match metric.
`toolParameterKeyMatchInput`	`object (ToolParameterKeyMatchInput)` Input for tool parameter key match metric.
`toolParameterKvMatchInput`	`object (ToolParameterKVMatchInput)` Input for tool parameter key value match metric.

Response body

Response message for EvaluationService.EvaluateInstances.

If successful, the response body contains data with the following structure:

JSON representation

JSON representation
{ // Union field `evaluation_results` can be only one of the following: "exactMatchResults": { object (`ExactMatchResults`) }, "bleuResults": { object (`BleuResults`) }, "rougeResults": { object (`RougeResults`) }, "fluencyResult": { object (`FluencyResult`) }, "coherenceResult": { object (`CoherenceResult`) }, "safetyResult": { object (`SafetyResult`) }, "groundednessResult": { object (`GroundednessResult`) }, "fulfillmentResult": { object (`FulfillmentResult`) }, "summarizationQualityResult": { object (`SummarizationQualityResult`) }, "pairwiseSummarizationQualityResult": { object (`PairwiseSummarizationQualityResult`) }, "summarizationHelpfulnessResult": { object (`SummarizationHelpfulnessResult`) }, "summarizationVerbosityResult": { object (`SummarizationVerbosityResult`) }, "questionAnsweringQualityResult": { object (`QuestionAnsweringQualityResult`) }, "pairwiseQuestionAnsweringQualityResult": { object (`PairwiseQuestionAnsweringQualityResult`) }, "questionAnsweringRelevanceResult": { object (`QuestionAnsweringRelevanceResult`) }, "questionAnsweringHelpfulnessResult": { object (`QuestionAnsweringHelpfulnessResult`) }, "questionAnsweringCorrectnessResult": { object (`QuestionAnsweringCorrectnessResult`) }, "toolCallValidResults": { object (`ToolCallValidResults`) }, "toolNameMatchResults": { object (`ToolNameMatchResults`) }, "toolParameterKeyMatchResults": { object (`ToolParameterKeyMatchResults`) }, "toolParameterKvMatchResults": { object (`ToolParameterKVMatchResults`) } // End of list of possible types for union field `evaluation_results`. }

{

  // Union field evaluation_results can be only one of the following:
  "exactMatchResults": {
    object (ExactMatchResults)
  },
  "bleuResults": {
    object (BleuResults)
  },
  "rougeResults": {
    object (RougeResults)
  },
  "fluencyResult": {
    object (FluencyResult)
  },
  "coherenceResult": {
    object (CoherenceResult)
  },
  "safetyResult": {
    object (SafetyResult)
  },
  "groundednessResult": {
    object (GroundednessResult)
  },
  "fulfillmentResult": {
    object (FulfillmentResult)
  },
  "summarizationQualityResult": {
    object (SummarizationQualityResult)
  },
  "pairwiseSummarizationQualityResult": {
    object (PairwiseSummarizationQualityResult)
  },
  "summarizationHelpfulnessResult": {
    object (SummarizationHelpfulnessResult)
  },
  "summarizationVerbosityResult": {
    object (SummarizationVerbosityResult)
  },
  "questionAnsweringQualityResult": {
    object (QuestionAnsweringQualityResult)
  },
  "pairwiseQuestionAnsweringQualityResult": {
    object (PairwiseQuestionAnsweringQualityResult)
  },
  "questionAnsweringRelevanceResult": {
    object (QuestionAnsweringRelevanceResult)
  },
  "questionAnsweringHelpfulnessResult": {
    object (QuestionAnsweringHelpfulnessResult)
  },
  "questionAnsweringCorrectnessResult": {
    object (QuestionAnsweringCorrectnessResult)
  },
  "toolCallValidResults": {
    object (ToolCallValidResults)
  },
  "toolNameMatchResults": {
    object (ToolNameMatchResults)
  },
  "toolParameterKeyMatchResults": {
    object (ToolParameterKeyMatchResults)
  },
  "toolParameterKvMatchResults": {
    object (ToolParameterKVMatchResults)
  }
  // End of list of possible types for union field evaluation_results.
}

Fields
Union field `evaluation_results`. Evaluation results will be served in the same order as presented in EvaluationRequest.instances. `evaluation_results` can be only one of the following:
`exactMatchResults`	`object (ExactMatchResults)` Auto metric evaluation results. Results for exact match metric.
`bleuResults`	`object (BleuResults)` Results for bleu metric.
`rougeResults`	`object (RougeResults)` Results for rouge metric.
`fluencyResult`	`object (FluencyResult)` LLM-based metric evaluation result. General text generation metrics, applicable to other categories. result for fluency metric.
`coherenceResult`	`object (CoherenceResult)` result for coherence metric.
`safetyResult`	`object (SafetyResult)` result for safety metric.
`groundednessResult`	`object (GroundednessResult)` result for groundedness metric.
`fulfillmentResult`	`object (FulfillmentResult)` result for fulfillment metric.
`summarizationQualityResult`	`object (SummarizationQualityResult)` Summarization only metrics. result for summarization quality metric.
`pairwiseSummarizationQualityResult`	`object (PairwiseSummarizationQualityResult)` result for pairwise summarization quality metric.
`summarizationHelpfulnessResult`	`object (SummarizationHelpfulnessResult)` result for summarization helpfulness metric.
`summarizationVerbosityResult`	`object (SummarizationVerbosityResult)` result for summarization verbosity metric.
`questionAnsweringQualityResult`	`object (QuestionAnsweringQualityResult)` Question answering only metrics. result for question answering quality metric.
`pairwiseQuestionAnsweringQualityResult`	`object (PairwiseQuestionAnsweringQualityResult)` result for pairwise question answering quality metric.
`questionAnsweringRelevanceResult`	`object (QuestionAnsweringRelevanceResult)` result for question answering relevance metric.
`questionAnsweringHelpfulnessResult`	`object (QuestionAnsweringHelpfulnessResult)` result for question answering helpfulness metric.
`questionAnsweringCorrectnessResult`	`object (QuestionAnsweringCorrectnessResult)` result for question answering correctness metric.
`toolCallValidResults`	`object (ToolCallValidResults)` Tool call metrics. Results for tool call valid metric.
`toolNameMatchResults`	`object (ToolNameMatchResults)` Results for tool name match metric.
`toolParameterKeyMatchResults`	`object (ToolParameterKeyMatchResults)` Results for tool parameter key match metric.
`toolParameterKvMatchResults`	`object (ToolParameterKVMatchResults)` Results for tool parameter key value match metric.

Authorization scopes

Requires the following OAuth scope:

https://www.googleapis.com/auth/cloud-platform

For more information, see the Authentication Overview.

ExactMatchInput

Input for exact match metric.

JSON representation
{ "metricSpec": { object (`ExactMatchSpec`) }, "instances": [ { object (`ExactMatchInstance`) } ] }

Fields

Fields
`metricSpec`	`object (ExactMatchSpec)` Required. Spec for exact match metric.
`instances[]`	`object (ExactMatchInstance)` Required. Repeated exact match instances.

metricSpec

object (ExactMatchSpec)

Required. Spec for exact match metric.

instances[]

object (ExactMatchInstance)

Required. Repeated exact match instances.

ExactMatchSpec

This type has no fields.

Spec for exact match metric - returns 1 if prediction and reference exactly matches, otherwise 0.

ExactMatchInstance

Spec for exact match instance.

JSON representation
{ "prediction": string, "reference": string }

Fields

prediction

string

Required. Output of the evaluated model.

reference

string

Required. Ground truth used to compare against the prediction.

BleuInput

Input for bleu metric.

JSON representation
{ "metricSpec": { object (`BleuSpec`) }, "instances": [ { object (`BleuInstance`) } ] }

Fields

Fields
`metricSpec`	`object (BleuSpec)` Required. Spec for bleu score metric.
`instances[]`	`object (BleuInstance)` Required. Repeated bleu instances.

metricSpec

object (BleuSpec)

Required. Spec for bleu score metric.

instances[]

object (BleuInstance)

Required. Repeated bleu instances.

BleuSpec

Spec for bleu score metric - calculates the precision of n-grams in the prediction as compared to reference - returns a score ranging between 0 to 1.

JSON representation
{ "useEffectiveOrder": boolean }

Fields

Fields
`useEffectiveOrder`	`boolean` Optional. Whether to useEffectiveOrder to compute bleu score.

useEffectiveOrder

boolean

Optional. Whether to useEffectiveOrder to compute bleu score.

BleuInstance

Spec for bleu instance.

JSON representation
{ "prediction": string, "reference": string }

Fields

prediction

string

Required. Output of the evaluated model.

reference

string

Required. Ground truth used to compare against the prediction.

RougeInput

Input for rouge metric.

JSON representation
{ "metricSpec": { object (`RougeSpec`) }, "instances": [ { object (`RougeInstance`) } ] }

Fields

Fields
`metricSpec`	`object (RougeSpec)` Required. Spec for rouge score metric.
`instances[]`	`object (RougeInstance)` Required. Repeated rouge instances.

metricSpec

object (RougeSpec)

Required. Spec for rouge score metric.

instances[]

object (RougeInstance)

Required. Repeated rouge instances.

RougeSpec

Spec for rouge score metric - calculates the recall of n-grams in prediction as compared to reference - returns a score ranging between 0 and 1.

JSON representation
{ "rougeType": string, "useStemmer": boolean, "splitSummaries": boolean }

Fields

Fields
`rougeType`	`string` Optional. Supported rouge types are rougen[1-9], rougeL, and rougeLsum.
`useStemmer`	`boolean` Optional. Whether to use stemmer to compute rouge score.
`splitSummaries`	`boolean` Optional. Whether to split summaries while using rougeLsum.

rougeType

string

Optional. Supported rouge types are rougen[1-9], rougeL, and rougeLsum.

useStemmer

boolean

Optional. Whether to use stemmer to compute rouge score.

splitSummaries

boolean

Optional. Whether to split summaries while using rougeLsum.

RougeInstance

Spec for rouge instance.

JSON representation
{ "prediction": string, "reference": string }

Fields

prediction

string

Required. Output of the evaluated model.

reference

string

Required. Ground truth used to compare against the prediction.

FluencyInput

Input for fluency metric.

JSON representation
{ "metricSpec": { object (`FluencySpec`) }, "instance": { object (`FluencyInstance`) } }

Fields

Fields
`metricSpec`	`object (FluencySpec)` Required. Spec for fluency score metric.
`instance`	`object (FluencyInstance)` Required. Fluency instance.

metricSpec

object (FluencySpec)

Required. Spec for fluency score metric.

instance

object (FluencyInstance)

Required. Fluency instance.

FluencySpec

Spec for fluency score metric.

JSON representation
{ "version": integer }

Fields

Fields
`version`	`integer` Optional. Which version to use for evaluation.

version

integer

Optional. Which version to use for evaluation.

FluencyInstance

Spec for fluency instance.

JSON representation
{ "prediction": string }

Fields

prediction

string

Required. Output of the evaluated model.

CoherenceInput

Input for coherence metric.

JSON representation
{ "metricSpec": { object (`CoherenceSpec`) }, "instance": { object (`CoherenceInstance`) } }

Fields

Fields
`metricSpec`	`object (CoherenceSpec)` Required. Spec for coherence score metric.
`instance`	`object (CoherenceInstance)` Required. Coherence instance.

metricSpec

object (CoherenceSpec)

Required. Spec for coherence score metric.

instance

object (CoherenceInstance)

Required. Coherence instance.

CoherenceSpec

Spec for coherence score metric.

JSON representation
{ "version": integer }

Fields

Fields
`version`	`integer` Optional. Which version to use for evaluation.

version

integer

Optional. Which version to use for evaluation.

CoherenceInstance

Spec for coherence instance.

JSON representation
{ "prediction": string }

Fields

prediction

string

Required. Output of the evaluated model.

SafetyInput

Input for safety metric.

JSON representation
{ "metricSpec": { object (`SafetySpec`) }, "instance": { object (`SafetyInstance`) } }

Fields

Fields
`metricSpec`	`object (SafetySpec)` Required. Spec for safety metric.
`instance`	`object (SafetyInstance)` Required. Safety instance.

metricSpec

object (SafetySpec)

Required. Spec for safety metric.

instance

object (SafetyInstance)

Required. Safety instance.

SafetySpec

Spec for safety metric.

JSON representation
{ "version": integer }

Fields

Fields
`version`	`integer` Optional. Which version to use for evaluation.

version

integer

Optional. Which version to use for evaluation.

SafetyInstance

Spec for safety instance.

JSON representation
{ "prediction": string }

Fields

prediction

string

Required. Output of the evaluated model.

GroundednessInput

Input for groundedness metric.

JSON representation
{ "metricSpec": { object (`GroundednessSpec`) }, "instance": { object (`GroundednessInstance`) } }

Fields

Fields
`metricSpec`	`object (GroundednessSpec)` Required. Spec for groundedness metric.
`instance`	`object (GroundednessInstance)` Required. Groundedness instance.

metricSpec

object (GroundednessSpec)

Required. Spec for groundedness metric.

instance

object (GroundednessInstance)

Required. Groundedness instance.

GroundednessSpec

Spec for groundedness metric.

JSON representation
{ "version": integer }

Fields

Fields
`version`	`integer` Optional. Which version to use for evaluation.

version

integer

Optional. Which version to use for evaluation.

GroundednessInstance

Spec for groundedness instance.

JSON representation
{ "prediction": string, "context": string }

Fields

prediction

string

Required. Output of the evaluated model.

context

string

Required. Background information provided in context used to compare against the prediction.

FulfillmentInput

Input for fulfillment metric.

JSON representation
{ "metricSpec": { object (`FulfillmentSpec`) }, "instance": { object (`FulfillmentInstance`) } }

Fields

Fields
`metricSpec`	`object (FulfillmentSpec)` Required. Spec for fulfillment score metric.
`instance`	`object (FulfillmentInstance)` Required. Fulfillment instance.

metricSpec

object (FulfillmentSpec)

Required. Spec for fulfillment score metric.

instance

object (FulfillmentInstance)

Required. Fulfillment instance.

FulfillmentSpec

Spec for fulfillment metric.

JSON representation
{ "version": integer }

Fields

Fields
`version`	`integer` Optional. Which version to use for evaluation.

version

integer

Optional. Which version to use for evaluation.

FulfillmentInstance

Spec for fulfillment instance.

JSON representation
{ "prediction": string, "instruction": string }

Fields

prediction

string

Required. Output of the evaluated model.

instruction

string

Required. Inference instruction prompt to compare prediction with.

SummarizationQualityInput

Input for summarization quality metric.

JSON representation
{ "metricSpec": { object (`SummarizationQualitySpec`) }, "instance": { object (`SummarizationQualityInstance`) } }

Fields

metricSpec

object (SummarizationQualitySpec)

Required. Spec for summarization quality score metric.

instance

object (SummarizationQualityInstance)

Required. Summarization quality instance.

SummarizationQualitySpec

Spec for summarization quality score metric.

JSON representation
{ "useReference": boolean, "version": integer }

Fields

useReference

boolean

Optional. Whether to use instance.reference to compute summarization quality.

version

integer

Optional. Which version to use for evaluation.

SummarizationQualityInstance

Spec for summarization quality instance.

JSON representation
{ "prediction": string, "reference": string, "context": string, "instruction": string }

Fields
`prediction`	`string` Required. Output of the evaluated model.
`reference`	`string` Optional. Ground truth used to compare against the prediction.
`context`	`string` Required. Text to be summarized.
`instruction`	`string` Required. Summarization prompt for LLM.

PairwiseSummarizationQualityInput

Input for pairwise summarization quality metric.

JSON representation
{ "metricSpec": { object (`PairwiseSummarizationQualitySpec`) }, "instance": { object (`PairwiseSummarizationQualityInstance`) } }

Fields

metricSpec

object (PairwiseSummarizationQualitySpec)

Required. Spec for pairwise summarization quality score metric.

instance

object (PairwiseSummarizationQualityInstance)

Required. Pairwise summarization quality instance.

PairwiseSummarizationQualitySpec

Spec for pairwise summarization quality score metric.

JSON representation
{ "useReference": boolean, "version": integer }

Fields

useReference

boolean

Optional. Whether to use instance.reference to compute pairwise summarization quality.

version

integer

Optional. Which version to use for evaluation.

PairwiseSummarizationQualityInstance

Spec for pairwise summarization quality instance.

JSON representation
{ "prediction": string, "baselinePrediction": string, "reference": string, "context": string, "instruction": string }

Fields
`prediction`	`string` Required. Output of the candidate model.
`baselinePrediction`	`string` Required. Output of the baseline model.
`reference`	`string` Optional. Ground truth used to compare against the prediction.
`context`	`string` Required. Text to be summarized.
`instruction`	`string` Required. Summarization prompt for LLM.

SummarizationHelpfulnessInput

Input for summarization helpfulness metric.

JSON representation
{ "metricSpec": { object (`SummarizationHelpfulnessSpec`) }, "instance": { object (`SummarizationHelpfulnessInstance`) } }

Fields

metricSpec

object (SummarizationHelpfulnessSpec)

Required. Spec for summarization helpfulness score metric.

instance

object (SummarizationHelpfulnessInstance)

Required. Summarization helpfulness instance.

SummarizationHelpfulnessSpec

Spec for summarization helpfulness score metric.

JSON representation
{ "useReference": boolean, "version": integer }

Fields

useReference

boolean

Optional. Whether to use instance.reference to compute summarization helpfulness.

version

integer

Optional. Which version to use for evaluation.

SummarizationHelpfulnessInstance

Spec for summarization helpfulness instance.

JSON representation
{ "prediction": string, "reference": string, "context": string, "instruction": string }

Fields
`prediction`	`string` Required. Output of the evaluated model.
`reference`	`string` Optional. Ground truth used to compare against the prediction.
`context`	`string` Required. Text to be summarized.
`instruction`	`string` Optional. Summarization prompt for LLM.

SummarizationVerbosityInput

Input for summarization verbosity metric.

JSON representation
{ "metricSpec": { object (`SummarizationVerbositySpec`) }, "instance": { object (`SummarizationVerbosityInstance`) } }

Fields

metricSpec

object (SummarizationVerbositySpec)

Required. Spec for summarization verbosity score metric.

instance

object (SummarizationVerbosityInstance)

Required. Summarization verbosity instance.

SummarizationVerbositySpec

Spec for summarization verbosity score metric.

JSON representation
{ "useReference": boolean, "version": integer }

Fields

useReference

boolean

Optional. Whether to use instance.reference to compute summarization verbosity.

version

integer

Optional. Which version to use for evaluation.

SummarizationVerbosityInstance

Spec for summarization verbosity instance.

JSON representation
{ "prediction": string, "reference": string, "context": string, "instruction": string }

Fields
`prediction`	`string` Required. Output of the evaluated model.
`reference`	`string` Optional. Ground truth used to compare against the prediction.
`context`	`string` Required. Text to be summarized.
`instruction`	`string` Optional. Summarization prompt for LLM.

QuestionAnsweringQualityInput

Input for question answering quality metric.

JSON representation
{ "metricSpec": { object (`QuestionAnsweringQualitySpec`) }, "instance": { object (`QuestionAnsweringQualityInstance`) } }

Fields

metricSpec

object (QuestionAnsweringQualitySpec)

Required. Spec for question answering quality score metric.

instance

object (QuestionAnsweringQualityInstance)

Required. Question answering quality instance.

QuestionAnsweringQualitySpec

Spec for question answering quality score metric.

JSON representation
{ "useReference": boolean, "version": integer }

Fields

useReference

boolean

Optional. Whether to use instance.reference to compute question answering quality.

version

integer

Optional. Which version to use for evaluation.

QuestionAnsweringQualityInstance

Spec for question answering quality instance.

JSON representation
{ "prediction": string, "reference": string, "context": string, "instruction": string }

Fields
`prediction`	`string` Required. Output of the evaluated model.
`reference`	`string` Optional. Ground truth used to compare against the prediction.
`context`	`string` Required. Text to answer the question.
`instruction`	`string` Required. Question Answering prompt for LLM.

PairwiseQuestionAnsweringQualityInput

Input for pairwise question answering quality metric.

JSON representation
{ "metricSpec": { object (`PairwiseQuestionAnsweringQualitySpec`) }, "instance": { object (`PairwiseQuestionAnsweringQualityInstance`) } }

Fields

metricSpec

object (PairwiseQuestionAnsweringQualitySpec)

Required. Spec for pairwise question answering quality score metric.

instance

object (PairwiseQuestionAnsweringQualityInstance)

Required. Pairwise question answering quality instance.

PairwiseQuestionAnsweringQualitySpec

Spec for pairwise question answering quality score metric.

JSON representation
{ "useReference": boolean, "version": integer }

Fields

useReference

boolean

Optional. Whether to use instance.reference to compute question answering quality.

version

integer

Optional. Which version to use for evaluation.

PairwiseQuestionAnsweringQualityInstance

Spec for pairwise question answering quality instance.

JSON representation
{ "prediction": string, "baselinePrediction": string, "reference": string, "context": string, "instruction": string }

Fields
`prediction`	`string` Required. Output of the candidate model.
`baselinePrediction`	`string` Required. Output of the baseline model.
`reference`	`string` Optional. Ground truth used to compare against the prediction.
`context`	`string` Required. Text to answer the question.
`instruction`	`string` Required. Question Answering prompt for LLM.

QuestionAnsweringRelevanceInput

Input for question answering relevance metric.

JSON representation
{ "metricSpec": { object (`QuestionAnsweringRelevanceSpec`) }, "instance": { object (`QuestionAnsweringRelevanceInstance`) } }

Fields

metricSpec

object (QuestionAnsweringRelevanceSpec)

Required. Spec for question answering relevance score metric.

instance

object (QuestionAnsweringRelevanceInstance)

Required. Question answering relevance instance.

QuestionAnsweringRelevanceSpec

Spec for question answering relevance metric.

JSON representation
{ "useReference": boolean, "version": integer }

Fields

useReference

boolean

Optional. Whether to use instance.reference to compute question answering relevance.

version

integer

Optional. Which version to use for evaluation.

QuestionAnsweringRelevanceInstance

Spec for question answering relevance instance.

JSON representation
{ "prediction": string, "reference": string, "context": string, "instruction": string }

Fields
`prediction`	`string` Required. Output of the evaluated model.
`reference`	`string` Optional. Ground truth used to compare against the prediction.
`context`	`string` Optional. Text provided as context to answer the question.
`instruction`	`string` Required. The question asked and other instruction in the inference prompt.

QuestionAnsweringHelpfulnessInput

Input for question answering helpfulness metric.

JSON representation
{ "metricSpec": { object (`QuestionAnsweringHelpfulnessSpec`) }, "instance": { object (`QuestionAnsweringHelpfulnessInstance`) } }

Fields

metricSpec

object (QuestionAnsweringHelpfulnessSpec)

Required. Spec for question answering helpfulness score metric.

instance

object (QuestionAnsweringHelpfulnessInstance)

Required. Question answering helpfulness instance.

QuestionAnsweringHelpfulnessSpec

Spec for question answering helpfulness metric.

JSON representation
{ "useReference": boolean, "version": integer }

Fields

useReference

boolean

Optional. Whether to use instance.reference to compute question answering helpfulness.

version

integer

Optional. Which version to use for evaluation.

QuestionAnsweringHelpfulnessInstance

Spec for question answering helpfulness instance.

JSON representation
{ "prediction": string, "reference": string, "context": string, "instruction": string }

Fields
`prediction`	`string` Required. Output of the evaluated model.
`reference`	`string` Optional. Ground truth used to compare against the prediction.
`context`	`string` Optional. Text provided as context to answer the question.
`instruction`	`string` Required. The question asked and other instruction in the inference prompt.

QuestionAnsweringCorrectnessInput

Input for question answering correctness metric.

JSON representation
{ "metricSpec": { object (`QuestionAnsweringCorrectnessSpec`) }, "instance": { object (`QuestionAnsweringCorrectnessInstance`) } }

Fields

metricSpec

object (QuestionAnsweringCorrectnessSpec)

Required. Spec for question answering correctness score metric.

instance

object (QuestionAnsweringCorrectnessInstance)

Required. Question answering correctness instance.

QuestionAnsweringCorrectnessSpec

Spec for question answering correctness metric.

JSON representation
{ "useReference": boolean, "version": integer }

Fields

useReference

boolean

Optional. Whether to use instance.reference to compute question answering correctness.

version

integer

Optional. Which version to use for evaluation.

QuestionAnsweringCorrectnessInstance

Spec for question answering correctness instance.

JSON representation
{ "prediction": string, "reference": string, "context": string, "instruction": string }

Fields
`prediction`	`string` Required. Output of the evaluated model.
`reference`	`string` Optional. Ground truth used to compare against the prediction.
`context`	`string` Optional. Text provided as context to answer the question.
`instruction`	`string` Required. The question asked and other instruction in the inference prompt.

ToolCallValidInput

Input for tool call valid metric.

JSON representation
{ "metricSpec": { object (`ToolCallValidSpec`) }, "instances": [ { object (`ToolCallValidInstance`) } ] }

Fields

metricSpec

object (ToolCallValidSpec)

Required. Spec for tool call valid metric.

instances[]

object (ToolCallValidInstance)

Required. Repeated tool call valid instances.

ToolCallValidSpec

This type has no fields.

Spec for tool call valid metric.

ToolCallValidInstance

Spec for tool call valid instance.

JSON representation
{ "prediction": string, "reference": string }

Fields

prediction

string

Required. Output of the evaluated model.

reference

string

Required. Ground truth used to compare against the prediction.

ToolNameMatchInput

Input for tool name match metric.

JSON representation
{ "metricSpec": { object (`ToolNameMatchSpec`) }, "instances": [ { object (`ToolNameMatchInstance`) } ] }

Fields

metricSpec

object (ToolNameMatchSpec)

Required. Spec for tool name match metric.

instances[]

object (ToolNameMatchInstance)

Required. Repeated tool name match instances.

ToolNameMatchSpec

This type has no fields.

Spec for tool name match metric.

ToolNameMatchInstance

Spec for tool name match instance.

JSON representation
{ "prediction": string, "reference": string }

Fields

prediction

string

Required. Output of the evaluated model.

reference

string

Required. Ground truth used to compare against the prediction.

ToolParameterKeyMatchInput

Input for tool parameter key match metric.

JSON representation
{ "metricSpec": { object (`ToolParameterKeyMatchSpec`) }, "instances": [ { object (`ToolParameterKeyMatchInstance`) } ] }

Fields

metricSpec

object (ToolParameterKeyMatchSpec)

Required. Spec for tool parameter key match metric.

instances[]

object (ToolParameterKeyMatchInstance)

Required. Repeated tool parameter key match instances.

ToolParameterKeyMatchSpec

This type has no fields.

Spec for tool parameter key match metric.

ToolParameterKeyMatchInstance

Spec for tool parameter key match instance.

JSON representation
{ "prediction": string, "reference": string }

Fields

prediction

string

Required. Output of the evaluated model.

reference

string

Required. Ground truth used to compare against the prediction.

ToolParameterKVMatchInput

Input for tool parameter key value match metric.

JSON representation
{ "metricSpec": { object (`ToolParameterKVMatchSpec`) }, "instances": [ { object (`ToolParameterKVMatchInstance`) } ] }

Fields

metricSpec

object (ToolParameterKVMatchSpec)

Required. Spec for tool parameter key value match metric.

instances[]

object (ToolParameterKVMatchInstance)

Required. Repeated tool parameter key value match instances.

ToolParameterKVMatchSpec

Spec for tool parameter key value match metric.

JSON representation
{ "useStrictStringMatch": boolean }

Fields

useStrictStringMatch

boolean

Optional. Whether to use STRCIT string match on parameter values.

ToolParameterKVMatchInstance

Spec for tool parameter key value match instance.

JSON representation
{ "prediction": string, "reference": string }

Fields

prediction

string

Required. Output of the evaluated model.

reference

string

Required. Ground truth used to compare against the prediction.

ExactMatchResults

Results for exact match metric.

JSON representation
{ "exactMatchMetricValues": [ { object (`ExactMatchMetricValue`) } ] }

Fields

exactMatchMetricValues[]

object (ExactMatchMetricValue)

Output only. Exact match metric values.

ExactMatchMetricValue

Exact match metric value for an instance.

JSON representation
{ "score": number }

Fields

score

number

Output only. Exact match score.

BleuResults

Results for bleu metric.

JSON representation
{ "bleuMetricValues": [ { object (`BleuMetricValue`) } ] }

Fields

bleuMetricValues[]

object (BleuMetricValue)

Output only. Bleu metric values.

BleuMetricValue

Bleu metric value for an instance.

JSON representation
{ "score": number }

Fields

score

number

Output only. Bleu score.

RougeResults

Results for rouge metric.

JSON representation
{ "rougeMetricValues": [ { object (`RougeMetricValue`) } ] }

Fields

rougeMetricValues[]

object (RougeMetricValue)

Output only. Rouge metric values.

RougeMetricValue

Rouge metric value for an instance.

JSON representation
{ "score": number }

Fields

score

number

Output only. Rouge score.

FluencyResult

Spec for fluency result.

JSON representation
{ "explanation": string, "score": number, "confidence": number }

Fields

explanation

string

Output only. Explanation for fluency score.

score

number

Output only. Fluency score.

confidence

number

Output only. confidence for fluency score.

CoherenceResult

Spec for coherence result.

JSON representation
{ "explanation": string, "score": number, "confidence": number }

Fields

explanation

string

Output only. Explanation for coherence score.

score

number

Output only. Coherence score.

confidence

number

Output only. confidence for coherence score.

SafetyResult

Spec for safety result.

JSON representation
{ "explanation": string, "score": number, "confidence": number }

Fields

explanation

string

Output only. Explanation for safety score.

score

number

Output only. Safety score.

confidence

number

Output only. confidence for safety score.

GroundednessResult

Spec for groundedness result.

JSON representation
{ "explanation": string, "score": number, "confidence": number }

Fields

explanation

string

Output only. Explanation for groundedness score.

score

number

Output only. Groundedness score.

confidence

number

Output only. confidence for groundedness score.

FulfillmentResult

Spec for fulfillment result.

JSON representation
{ "explanation": string, "score": number, "confidence": number }

Fields

explanation

string

Output only. Explanation for fulfillment score.

score

number

Output only. Fulfillment score.

confidence

number

Output only. confidence for fulfillment score.

SummarizationQualityResult

Spec for summarization quality result.

JSON representation
{ "explanation": string, "score": number, "confidence": number }

Fields

explanation

string

Output only. Explanation for summarization quality score.

score

number

Output only. Summarization Quality score.

confidence

number

Output only. confidence for summarization quality score.

PairwiseSummarizationQualityResult

Spec for pairwise summarization quality result.

JSON representation
{ "pairwiseChoice": enum (`PairwiseChoice`), "explanation": string, "confidence": number }

Fields

pairwiseChoice

enum (PairwiseChoice)

Output only. Pairwise summarization prediction choice.

explanation

string

Output only. Explanation for summarization quality score.

confidence

number

Output only. confidence for summarization quality score.

PairwiseChoice

Pairwise prediction autorater preference.

Enums
`PAIRWISE_CHOICE_UNSPECIFIED`	Unspecified prediction choice.
`BASELINE`	baseline prediction wins
`CANDIDATE`	Candidate prediction wins
`TIE`	Winner cannot be determined

SummarizationHelpfulnessResult

Spec for summarization helpfulness result.

JSON representation
{ "explanation": string, "score": number, "confidence": number }

Fields

explanation

string

Output only. Explanation for summarization helpfulness score.

score

number

Output only. Summarization Helpfulness score.

confidence

number

Output only. confidence for summarization helpfulness score.

SummarizationVerbosityResult

Spec for summarization verbosity result.

JSON representation
{ "explanation": string, "score": number, "confidence": number }

Fields

explanation

string

Output only. Explanation for summarization verbosity score.

score

number

Output only. Summarization Verbosity score.

confidence

number

Output only. confidence for summarization verbosity score.

QuestionAnsweringQualityResult

Spec for question answering quality result.

JSON representation
{ "explanation": string, "score": number, "confidence": number }

Fields

explanation

string

Output only. Explanation for question answering quality score.

score

number

Output only. Question Answering Quality score.

confidence

number

Output only. confidence for question answering quality score.

PairwiseQuestionAnsweringQualityResult

Spec for pairwise question answering quality result.

JSON representation
{ "pairwiseChoice": enum (`PairwiseChoice`), "explanation": string, "confidence": number }

Fields

pairwiseChoice

enum (PairwiseChoice)

Output only. Pairwise question answering prediction choice.

explanation

string

Output only. Explanation for question answering quality score.

confidence

number

Output only. confidence for question answering quality score.

QuestionAnsweringRelevanceResult

Spec for question answering relevance result.

JSON representation
{ "explanation": string, "score": number, "confidence": number }

Fields

explanation

string

Output only. Explanation for question answering relevance score.

score

number

Output only. Question Answering Relevance score.

confidence

number

Output only. confidence for question answering relevance score.

QuestionAnsweringHelpfulnessResult

Spec for question answering helpfulness result.

JSON representation
{ "explanation": string, "score": number, "confidence": number }

Fields

explanation

string

Output only. Explanation for question answering helpfulness score.

score

number

Output only. Question Answering Helpfulness score.

confidence

number

Output only. confidence for question answering helpfulness score.

QuestionAnsweringCorrectnessResult

Spec for question answering correctness result.

JSON representation
{ "explanation": string, "score": number, "confidence": number }

Fields

explanation

string

Output only. Explanation for question answering correctness score.

score

number

Output only. Question Answering Correctness score.

confidence

number

Output only. confidence for question answering correctness score.

ToolCallValidResults

Results for tool call valid metric.

JSON representation
{ "toolCallValidMetricValues": [ { object (`ToolCallValidMetricValue`) } ] }

Fields

toolCallValidMetricValues[]

object (ToolCallValidMetricValue)

Output only. Tool call valid metric values.

ToolCallValidMetricValue

Tool call valid metric value for an instance.

JSON representation
{ "score": number }

Fields

score

number

Output only. Tool call valid score.

ToolNameMatchResults

Results for tool name match metric.

JSON representation
{ "toolNameMatchMetricValues": [ { object (`ToolNameMatchMetricValue`) } ] }

Fields

toolNameMatchMetricValues[]

object (ToolNameMatchMetricValue)

Output only. Tool name match metric values.

ToolNameMatchMetricValue

Tool name match metric value for an instance.

JSON representation
{ "score": number }

Fields

score

number

Output only. Tool name match score.

ToolParameterKeyMatchResults

Results for tool parameter key match metric.

JSON representation
{ "toolParameterKeyMatchMetricValues": [ { object (`ToolParameterKeyMatchMetricValue`) } ] }

Fields

toolParameterKeyMatchMetricValues[]

object (ToolParameterKeyMatchMetricValue)

Output only. Tool parameter key match metric values.

ToolParameterKeyMatchMetricValue

Tool parameter key match metric value for an instance.

JSON representation
{ "score": number }

Fields

score

number

Output only. Tool parameter key match score.

ToolParameterKVMatchResults

Results for tool parameter key value match metric.

JSON representation
{ "toolParameterKvMatchMetricValues": [ { object (`ToolParameterKVMatchMetricValue`) } ] }

Fields

toolParameterKvMatchMetricValues[]

object (ToolParameterKVMatchMetricValue)

Output only. Tool parameter key value match metric values.

ToolParameterKVMatchMetricValue

Tool parameter key value match metric value for an instance.

JSON representation
{ "score": number }

Fields

score

number

Output only. Tool parameter key value match score.