This document in the Google Cloud Architecture Framework provides best practices to deploy your system based on compute requirements. You learn how to choose a compute platform and a migration approach, design and scale workloads, and manage operations and VM migrations.
Computation is at the core of many workloads, whether it refers to the execution of custom business logic or the application of complex computational algorithms against datasets. Most solutions use compute resources in some form, and it's critical that you select the right compute resources for your application needs.
Google Cloud provides several options for using time on a CPU. Options are based on CPU types, performance, and how your code is scheduled to run, including usage billing.
Google Cloud compute options include the following:
- Virtual machines (VM) with cloud-specific benefits like live migration.
- Bin-packing of containers on cluster-machines that can share CPUs.
- Functions and serverless approaches, where your use of CPU time can be metered to the work performed during a single HTTP request.
Choosing compute
This section provides best practices for choosing and migrating to a compute platform.
Choose a compute platform
When you choose a compute platform for your workload, consider the technical requirements of the workload, lifecycle automation processes, regionalization, and security.
Evaluate the nature of CPU usage by your app and the entire supporting system, including how your code is packaged and deployed, distributed, and invoked. While some scenarios might be compatible with multiple platform options, a portable workload should be capable and performant on a range of compute options.
The following table provides an overview of the recommended Google Cloud compute services for various use cases:
Compute platform | Use cases | Recommended products |
---|---|---|
Serverless |
|
|
Kubernetes | Build complex microservice architectures that need additional services like Istio to manage service mesh control. |
|
Virtual machines (VMs) | Run general purpose and specialized workloads on VMs. |
|
For more information, see Hosting Applications on Google Cloud.
Choose a compute migration approach
If you're migrating your existing applications from another cloud or from on-premises, use one of the following Google Cloud products to help you optimize for performance, scale, cost, and security.
Migration goal | Use case | Recommended product |
---|---|---|
Lift and shift | Migrate or extend your VMware workloads to Google Cloud in minutes. | Google Cloud VMware Engine |
Lift and shift | Move your VM-based applications to Compute Engine. | Migrate to Virtual Machines |
Upgrade to containers | Modernize applications into built-in containers on Google Kubernetes Engine. | Migrate to Containers |
To learn how to migrate your workloads while aligning internal teams, see VM Migration lifecycle and Building a Large Scale Migration Program with Google Cloud.
Designing workloads
This section provides best practices for designing workloads to support your system.
Evaluate serverless options for simple logic
Simple logic is a type of compute that doesn't require specialized hardware or machine types like CPU-optimized machines. Before you invest in Google Kubernetes Engine (GKE) or Compute Engine implementations to abstract operational overhead and optimize for cost and performance, evaluate serverless options for lightweight logic.
Decouple your applications to be stateless
Where possible, decouple your applications to be stateless to maximize use of serverless computing options. This approach lets you use managed compute offerings, scale applications based on demand, and optimize for cost and performance. For more information about decoupling your application to design for scale and high availability, see Design for scale and high availability.
Use caching logic when you decouple architectures
If your application is designed to be stateful, use caching logic to decouple and make your workload scalable. For more information, see Database best practices.
Use live migrations to facilitate upgrades
To facilitate Google maintenance upgrades, use live migration by setting instance availability policies. For more information, see Set VM host maintenance policy.
Scaling workloads
This section provides best practices for scaling workloads to support your system.
Use startup and shutdown scripts
For stateful applications, use startup and shutdown scripts where possible to start and stop your application state gracefully. A graceful startup is when a computer is turned on by a software function and the operating system is allowed to perform its tasks of safely starting processes and opening connections.
Graceful startups and shutdowns are important because stateful applications depend on immediate availability to the data that sits close to the compute, usually on local or persistent disks, or in RAM. To avoid running application data from the beginning for each startup, use a startup script to reload the last saved data and run the process from where it previously stopped on shutdown. To save the application memory state to avoid losing progress on shutdown, use a shutdown script. For example, use a shutdown script when a VM is scheduled to be shut down due to downscaling or Google maintenance events.
Use MIGs to support VM management
When you use Compute Engine VMs, managed instance groups (MIGs) support features like autohealing, load balancing, autoscaling, auto updating, and stateful workloads. You can create zonal or regional MIGs based on your availability goals. You can use MIGs for stateless serving or batch workloads and for stateful applications that need to preserve each VM's unique state.
Use pod autoscalers to scale your GKE workloads
Use horizontal and vertical Pod autoscalers to scale your workloads, and use node auto-provisioning to scale underlying compute resources.
Distribute application traffic
To scale your applications globally, use Cloud Load Balancing to distribute your application instances across more than one region or zone. Load balancers optimize packet routing from Google Cloud edge networks to the nearest zone, which increases serving traffic efficiency and minimizes serving costs. To optimize for end-user latency, use Cloud CDN to cache static content where possible.
Automate compute creation and management
Minimize human-induced errors in your production environment by automating compute creation and management.
Managing operations
This section provides best practices for managing operations to support your system.
Use Google-supplied public images
Use public images supplied by Google Cloud. The Google Cloud public images are regularly updated. For more information, see List of public images available on Compute Engine.
You can also create your own images with specific configurations and settings. Where possible, automate and centralize image creation in a separate project that you can share with authorized users within your organization. Creating and curating a custom image in a separate project lets you update, patch, and create a VM using your own configurations. You can then share the curated VM image with relevant projects.
Use snapshots for disk backups
Snapshots let you create backups of your disks. Snapshots are especially useful for stateful applications, which aren't flexible enough to maintain state or save progress when they experience abrupt shutdowns. If you frequently use snapshots to create new instances, you can optimize your backup process by creating a base image from that snapshot.
Use a machine image to enable VM instance creation
Although a snapshot only captures an image of the data inside a machine, a machine image captures machine configurations and settings, in addition to the data. Use a machine image to store all of the configurations, metadata, permissions, and data from one or more disks that are needed to create a VM instance.
When you create a machine from a snapshot, you must configure instance settings on the new VM instances, which requires a lot of work. Using machine images lets you copy those known settings to new machines, reducing overhead. For more information, see When to use a machine image.
Capacity, reservations, and isolation
This section provides best practices for managing capacity, reservations, and isolation to support your system.
Use committed-use discounts to reduce costs
You can reduce your operational expenditure (OPEX) cost for workloads that are always on by using committed use discounts. For more information, see the Cost optimization category.
Choose machine types to support cost and performance
Google Cloud offers machine types that let you choose compute based on cost and performance parameters. You can choose a low-performance offering to optimize for cost or choose a high-performance compute option at higher cost. For more information, see the Cost optimization category.
Use sole-tenant nodes to support compliance needs
Sole-tenant nodes are physical Compute Engine servers that are dedicated to hosting only your project's VMs. Sole-tenant nodes can help you to meet compliance requirements for physical isolation, including the following:
- Keep your VMs physically separated from VMs in other projects.
- Group your VMs together on the same host hardware.
- Isolate payments processing workloads.
For more information, see Sole-tenant nodes.
Use reservations to ensure resource availability
Google Cloud lets you define reservations for your workloads to ensure those resources are always available. There is no additional charge to create reservations, but you pay for the reserved resources even if you don't use them. For more information, see Consuming and managing reservations.
VM migration
This section provides best practices for migrating VMs to support your system.
Evaluate built-in migration tools
Evaluate built-in migration tools to move your workloads from another cloud or from on-premises. For more information, see Migration to Google Cloud. Google Cloud offers tools and services to help you migrate your workloads and optimize for cost and performance. To receive a free migration cost assessment based on your current IT landscape, see Google Cloud Rapid Assessment & Migration Program.
Use virtual disk import for customized operating systems
To import customized supported operating systems, see Importing virtual disks. Sole-tenant nodes can help you meet your hardware bring-your-own-license requirements for per-core or per-processor licenses. For more information, see Bringing your own licenses.
Recommendations
To apply the guidance in the Architecture Framework to your own environment, we recommend that you do following:
Review Google Cloud Marketplace offerings to evaluate whether your application is listed under a supported vendor. Google Cloud supports running various open source systems and various third-party software.
Consider Migrate to Containers and GKE to extract and package your VM-based application as a containerized application running on GKE.
Use Compute Engine to run your applications on Google Cloud. If you have legacy dependencies running in a VM-based application, verify whether they meet your vendor requirements.
Evaluate using a Google Cloud internal passthrough Network Load Balancer to scale your decoupled architecture. For more information, see Internal passthrough Network Load Balancer overview.
Evaluate your options for switching from conventional on-premises use cases like HA-Proxy usage. For more information, see best practice for floating IP address.
Use VM Manager to manage operating systems for your large VM fleets running windows or Linux on Compute Engine, and apply consistent configuration policies.
Consider using GKE Autopilot and let Google SRE fully manage your clusters.
Use Policy Controller and Config Sync for policy and configuration management across your GKE clusters.
Ensure availability and scalability of machines in specific regions and zones. Google Cloud can scale to support your compute needs. However, if you need a lot of specific machine types in a specific region or zone, work with your account teams to ensure availability. For more information, see Reservations for Compute Engine.
What's next
Learn networking design principles, including the following:
Design workload VPC architectures.
Design inter-VPC connectivity.
Explore other categories in the Architecture Framework such as reliability, operational excellence, and security, privacy, and compliance.