Long-running requests
Number	151
Permalink	aip-dev.github.io/aip.dev/151
State	Reviewing
Created	2019-07-25
Updated	2019-07-25

AIP-151

Long-running requests

Occasionally, a service may need to expose an operation that takes a significant amount of time to complete. In these situations, it is often a poor user experience to simply block while the task runs; rather, it is better to return some kind of promise to the user, and allow the user to check back in later.

The long-running request pattern is roughly analogous to a Future in Python or Java, or a Node.js Promise. Essentially, the user is given a token that can be used to track progress and retrieve the result.

Guidance

Operations that might take a significant amount of time to complete should return a 202 Accepted response along with an identifier that can be used to track the status of the request and ultimately retrieve the result.

Any single operation defined in an API surface must either always return 202 Accepted along with a request identifier, or never do so. A service must not return a 200 OK response with the result if it is "fast enough", and 202 Accepted if it is not fast enough, because such behavior adds significant burdens for clients.

Note: User expectations can vary on what is considered "a significant amount of time" depending on what work is being done. A good rule of thumb is 10 seconds.

Status monitor representation

The response to a long-running request should be a "status monitor" having the following common format:

interface StatusMonitor {
  // The identifier for this status monitor.
  id: string;

  // Whether the request is done.
  done: boolean;

  // The result of the request.
  // Only populated if the request is done and was successful.
  response: any;

  // The error that arose from the request.
  // Only populated if the request is done and was unsuccessful.
  error: Error;

  // Metadata associated with the request.
  // Populated throughout the life of the request, including after
  // it completes.
  metadata: any;
}

If the done field is true, then one and exactly one of the response and error fields must be populated.
- If the done field is false, then the response and error fields must not be populated.
The response and metadata fields may be any type that the service determines to be appropriate, but must always be the same type for any particular operation.
- The response and metadata types should be defined in the same API surface as the operation itself.
- The response and metadata types that need no data should use a custom-defined empty struct rather than a common void or empty type, to permit future extensibility.

Querying a status monitor

The service must provide an endpoint to query the status of the operation, which must accept the operation identifier and should not include other parameters:

GET /v1/statusMonitors/{status_monitor} HTTP/2
Host: library.googleapis.com
Accept: application/json

The endpoint must return a StatusMonitor as described above.

Standard methods

APIs may return an StatusMonitor from the Create, Update, or Delete standard methods if appropriate. In this case, the response field must be the standard and expected response type for that standard method.

When creating or deleting a resource with a long-running request, the resource should be included in List and Get calls; however, the resource should indicate that it is not usable, generally with a state enum.

Parallel requests

A resource may accept multiple requests that will work on it in parallel, but is not obligated to do so:

Resources that accept multiple parallel requests may place them in a queue rather than work on the requests simultaneously.
Resource that does not permit multiple requests in parallel (denying any new request until the one that is in progress finishes) must return 409 Conflict if a user attempts a parallel request, and include an error message explaining the situation.

Expiration

APIs may allow their status monitor resources to expire after sufficient time has elapsed after the request completed.

Note: A good rule of thumb for status monitor expiry is 30 days.

Errors

Errors that prevent a long-running request from starting must return an error response (AIP-193), similar to any other method.

Errors that occur over the course of a request may be placed in the metadata message. The errors themselves must still be represented with a canonical error object.

Interface Definitions

Protocol buffers

When using protocol buffers, the well-known type google.longrunning.Operation is used.

Note: For historical reasons, Google uses the term Operation to represent what this document describes as a StatusMonitor.

// Write a book.
rpc WriteBook(WriteBookRequest) returns (google.longrunning.Operation) {
  option (google.api.http) = {
    post: "/v1/{parent=publishers/*}/books:write"
    body: "*"
  };
  option (google.longrunning.operation_info) = {
    response_type: "WriteBookResponse"
    metadata_type: "WriteBookMetadata"
  };
}

The response type must be google.longrunning.Operation. The Operation proto definition must not be copied into individual APIs.
- The response must not be a streaming response.
The method must include a google.longrunning.operation_info annotation, which must define both response and metadata types.
- The response and metadata types must be defined in the file where the RPC appears, or a file imported by that file.
- If the response and metadata types are defined in another package, the fully-qualified message name must be used.
- The response type should not be google.protobuf.Empty (except for Delete methods), unless it is certain that response data will never be needed. If response data might be added in the future, define an empty message for the RPC response and use that.
- The metadata type is used to provide information such as progress, partial failures, and similar information on each GetOperation call. The metadata type should not be google.protobuf.Empty, unless it is certain that metadata will never be needed. If metadata might be added in the future, define an empty message for the RPC metadata and use that.
APIs with messages that return Operation must implement the Operations service. Individual APIs must not define their own interfaces for long-running operations to avoid inconsistency.

OpenAPI 3.0

paths:
  /v1/resources:
    post:
      operationId: write_book
      description: Write a book.
      responses:
        202:
          description: OK
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/WriteBookStatus'

202 must be the only success status code defined.
The 202 response must define an application/json response body and no other response content types.
The response body schema must be an object with name, done, and result properties as described above for a StatusMonitor
The response body schema may contain an object property named metadata to hold service-specific metadata associated with the operation, for example progress information and common metadata such as create time. The service should define the contents of the metadata object in a separate schema, which should specify additionalProperties: true to allow for future extensibility.
The response property must be a schema that defines the success response for the operation. For an operation that typically gives a 204 No Content response, such as a Delete, response should be defined as an empty object schema. For a standard Get/Create/Update operation, response should be a representation of the resource.
If a service has any long running operations, the service must define an StatusMonitor resource with a list operation to retrieve a potentially filtered list of status monitors and a get operation to retrieve a specific status monitor by its name.