REST Resource: projects.locations.dataLabelingJobs

Resource: DataLabelingJob

DataLabelingJob is used to trigger a human labeling job on unlabeled data from the following Dataset:

Fields
name string

Output only. Resource name of the DataLabelingJob.

displayName string

Required. The user-defined name of the DataLabelingJob. The name can be up to 128 characters long and can consist of any UTF-8 characters. Display name of a DataLabelingJob.

datasets[] string

Required. Dataset resource names. Right now we only support labeling from a single Dataset. Format: projects/{project}/locations/{location}/datasets/{dataset}

annotationLabels map (key: string, value: string)

Labels to assign to annotations generated by this DataLabelingJob.

label keys and values can be no longer than 64 characters (Unicode codepoints), can only contain lowercase letters, numeric characters, underscores and dashes. International characters are allowed. See https://goo.gl/xmQnxf for more information and examples of labels. System reserved label keys are prefixed with "aiplatform.googleapis.com/" and are immutable.

labelerCount integer

Required. Number of labelers to work on each DataItem.

instructionUri string

Required. The Google Cloud Storage location of the instruction pdf. This pdf is shared with labelers, and provides detailed description on how to label DataItems in Datasets.

inputsSchemaUri string

Required. Points to a YAML file stored on Google Cloud Storage describing the config for a specific type of DataLabelingJob. The schema files that can be used here are found in the https://storage.googleapis.com/google-cloud-aiplatform bucket in the /schema/datalabelingjob/inputs/ folder.

inputs value (Value format)

Required. Input config parameters for the DataLabelingJob.

state enum (JobState)

Output only. The detailed state of the job.

labelingProgress integer

Output only. Current labeling job progress percentage scaled in interval [0, 100], indicating the percentage of DataItems that has been finished.

currentSpend object (Money)

Output only. Estimated cost(in US dollars) that the DataLabelingJob has incurred to date.

createTime string (Timestamp format)

Output only. timestamp when this DataLabelingJob was created.

A timestamp in RFC3339 UTC "Zulu" format, with nanosecond resolution and up to nine fractional digits. Examples: "2014-10-02T15:01:23Z" and "2014-10-02T15:01:23.045123456Z".

updateTime string (Timestamp format)

Output only. timestamp when this DataLabelingJob was updated most recently.

A timestamp in RFC3339 UTC "Zulu" format, with nanosecond resolution and up to nine fractional digits. Examples: "2014-10-02T15:01:23Z" and "2014-10-02T15:01:23.045123456Z".

error object (Status)

Output only. DataLabelingJob errors. It is only populated when job's state is JOB_STATE_FAILED or JOB_STATE_CANCELLED.

labels map (key: string, value: string)

The labels with user-defined metadata to organize your DataLabelingJobs.

label keys and values can be no longer than 64 characters (Unicode codepoints), can only contain lowercase letters, numeric characters, underscores and dashes. International characters are allowed.

See https://goo.gl/xmQnxf for more information and examples of labels. System reserved label keys are prefixed with "aiplatform.googleapis.com/" and are immutable. Following system labels exist for each DataLabelingJob:

  • "aiplatform.googleapis.com/schema": output only, its value is the inputs_schema's title.
specialistPools[] string

The SpecialistPools' resource names associated with this job.

encryptionSpec object (EncryptionSpec)

Customer-managed encryption key spec for a DataLabelingJob. If set, this DataLabelingJob will be secured by this key.

Note: Annotations created in the DataLabelingJob are associated with the EncryptionSpec of the Dataset they are exported to.

activeLearningConfig object (ActiveLearningConfig)

Parameters that configure the active learning pipeline. Active learning will label the data incrementally via several iterations. For every iteration, it will select a batch of data based on the sampling strategy.

JSON representation
{
  "name": string,
  "displayName": string,
  "datasets": [
    string
  ],
  "annotationLabels": {
    string: string,
    ...
  },
  "labelerCount": integer,
  "instructionUri": string,
  "inputsSchemaUri": string,
  "inputs": value,
  "state": enum (JobState),
  "labelingProgress": integer,
  "currentSpend": {
    object (Money)
  },
  "createTime": string,
  "updateTime": string,
  "error": {
    object (Status)
  },
  "labels": {
    string: string,
    ...
  },
  "specialistPools": [
    string
  ],
  "encryptionSpec": {
    object (EncryptionSpec)
  },
  "activeLearningConfig": {
    object (ActiveLearningConfig)
  }
}

ActiveLearningConfig

Parameters that configure the active learning pipeline. Active learning will label the data incrementally by several iterations. For every iteration, it will select a batch of data based on the sampling strategy.

Fields
sampleConfig object (SampleConfig)

Active learning data sampling config. For every active learning labeling iteration, it will select a batch of data based on the sampling strategy.

trainingConfig object (TrainingConfig)

CMLE training config. For every active learning labeling iteration, system will train a machine learning model on CMLE. The trained model will be used by data sampling algorithm to select DataItems.

Union field human_labeling_budget. Required. Max human labeling DataItems. The rest part will be labeled by machine. human_labeling_budget can be only one of the following:
maxDataItemCount string (int64 format)

Max number of human labeled DataItems.

maxDataItemPercentage integer

Max percent of total DataItems for human labeling.

JSON representation
{
  "sampleConfig": {
    object (SampleConfig)
  },
  "trainingConfig": {
    object (TrainingConfig)
  },

  // Union field human_labeling_budget can be only one of the following:
  "maxDataItemCount": string,
  "maxDataItemPercentage": integer
  // End of list of possible types for union field human_labeling_budget.
}

SampleConfig

Active learning data sampling config. For every active learning labeling iteration, it will select a batch of data based on the sampling strategy.

Fields
sampleStrategy enum (SampleStrategy)

Field to choose sampling strategy. Sampling strategy will decide which data should be selected for human labeling in every batch.

Union field initial_batch_sample_size. Decides sample size for the initial batch. initial_batch_sample_percentage is used by default. initial_batch_sample_size can be only one of the following:
initialBatchSamplePercentage integer

The percentage of data needed to be labeled in the first batch.

Union field following_batch_sample_size. Decides sample size for the following batches. following_batch_sample_percentage is used by default. following_batch_sample_size can be only one of the following:
followingBatchSamplePercentage integer

The percentage of data needed to be labeled in each following batch (except the first batch).

JSON representation
{
  "sampleStrategy": enum (SampleStrategy),

  // Union field initial_batch_sample_size can be only one of the following:
  "initialBatchSamplePercentage": integer
  // End of list of possible types for union field initial_batch_sample_size.

  // Union field following_batch_sample_size can be only one of the following:
  "followingBatchSamplePercentage": integer
  // End of list of possible types for union field following_batch_sample_size.
}

SampleStrategy

Sample strategy decides which subset of DataItems should be selected for human labeling in every batch.

Enums
SAMPLE_STRATEGY_UNSPECIFIED Default will be treated as UNCERTAINTY.
UNCERTAINTY Sample the most uncertain data to label.

TrainingConfig

CMLE training config. For every active learning labeling iteration, system will train a machine learning model on CMLE. The trained model will be used by data sampling algorithm to select DataItems.

Fields
timeoutTrainingMilliHours string (int64 format)

The timeout hours for the CMLE training job, expressed in milli hours i.e. 1,000 value in this field means 1 hour.

JSON representation
{
  "timeoutTrainingMilliHours": string
}

Methods

cancel

Cancels a DataLabelingJob.

create

Creates a DataLabelingJob.

delete

Deletes a DataLabelingJob.

get

Gets a DataLabelingJob.

list

Lists DataLabelingJobs in a Location.