Use GKE Data Cache To Improve Data Access Performance

GKE Data Cache

Google Cloud is announcing the general availability (GA) of GKE Data Cache, a powerful new solution for Google Kubernetes Engine, to improve the performance of read-heavy stateful or stateless applications that use network-connected disks for persistent storage. Without requiring complicated manual configuration, GKE Data Cache helps you achieve greater queries per second (QPS) and lower read latency by strategically using fast local SSDs as a cache layer for permanent disks.

Using Postgres and GKE Data Cache, we found:

GKE PostgreSQL boosts transactions per second by 480%
PostgreSQL latency decrease of up to 80% on GKE

GKE Data Cache: What is it?

A controlled block storage solution created especially for Google Kubernetes Engine (GKE) is called GKE Data Cache. Though it can also help stateless applications that depend on persistent storage using network-attached disks, its main objective is to speed up the performance of read-heavy stateful workloads operating on GKE. As a General Availability (GA) feature, it is accessible.

How It works

As a cache layer for your underlying persistent storage, such as Persistent Disks or Hyperdisks, GKE Data Cache works by strategically using fast local SSDs that are connected to your GKE nodes. These low-latency local SSDs automatically cache frequently accessed data when you enable GKE Data Cache and set up your workloads to use it.

The slower, network-attached persistent disk is accessed far less frequently when subsequent read requests for this cached data are fulfilled directly from the local SSDs. Because the read latency is moved to the local SSD, this approach may also enable the usage of less system memory cache (RAM) to process requests rapidly.

All read/write Persistent Disk and Hyperdisk kinds are supported by GKE Data Cache as backup storage. Standard Google Cloud encryption is employed to secure the solution’s local SSDs while they are at rest. Because the SSDs are local to the node, data hydration the initial loading of data from persistent storage onto the local SSD and data rehydration restoring data on local SSDs following a node’s recycling are quicker.

Deployment modes

Two write handling modes are available in GKE Data Cache, which affects data consistency and performance:

Writethrough (Suggested): In this mode, applications write data simultaneously to the underlying persistent disk and the cache (local SSD). This mode is appropriate for the majority of production workloads and guards against data loss.
Writeback: Only the cache (local SSD) receives data at first, after which it is asynchronously written to the persistent disk in the background. Although dependability is impacted, write performance is enhanced in this mode. Unflushed cache data will be lost if the node abruptly shuts down before the data is flushed to the persistent disk. It works well for workloads when speed is a top priority.

Advantages

GKE Data Cache can result in significant performance gains:

Increased Queries Per Second (QPS): It considerably raises the number of queries processed per second for databases, both vector databases and more traditional databases like Postgres or MySQL.
Better Read Performance: By reducing disk delay, it enhances read performance for stateful applications.
Faster Data Hydration and Rehydration: Local SSDs speed up the data loading and restoration procedures.
Reduced Read Latency: Programs retrieve data more quickly, improving responsiveness and user experience.
Higher Throughput: Applications can manage higher loads and carry out more complex data operations when they can service more read requests concurrently.
Possible Cost Optimisation: You may be able to use smaller or lower-IOPS persistent disks for primary storage while still getting good performance by speeding up reads. Because memory is more expensive than local SSD capacity, you can use the local SSD cache to reduce the machine’s paging cache memory.
Simplified Management: It makes it easier to set up and maintain a high-performance caching system because it is a managed feature.
Particular Benefits Noted: OpenAI saw up to an 80% decrease in latency and a 480% increase in transactions per second using Postgres on GKE. When running directly from disk without full in-memory caching, Qdrant found vector search response times that are 2.5 times quicker than premium SSDs and 10 times faster than balanced disks.

The configuration and requirements

You must fulfil certain prerequisites and adhere to configuration instructions in order to use GKE Data Cache:

GKE Standard clusters are required for your cluster.
Version 1.32.3-gke.1440000 or later must be installed on the GKE cluster.
Node pools must employ SSD-compatible local machines.
GKE Data Cache is only compatible with new persistent disks that are requested through PersistentVolumeClaim (PVC); it cannot be configured on persistent disks that are already in place.

Configuration includes:

Configuring GKE nodes for Data Cache: Setting up GKE nodes for data caching requires that each node have its own local SSD resources. Since an existing node pool cannot be updated, GKE Data Cache must be enabled when establishing a new GKE cluster or adding a new node pool to an existing cluster. The gcloud container clusters create or gcloud container node-pool create commands are used in conjunction with the –data-cache-count flag to accomplish this.

This option indicates how many local SSD volumes on each node in that pool should be set aside just for data cache. The capacity of each dedicated local SSD is 375 gigabytes. The provided data-cache-count must be less than the total number of local SSD disks that are available on the machine for machines that are third-generation or later. Using the ephemeral-storage-local-ssd count flag, you can choose to allocate more Local SSDs for other ephemeral storage.

You can also read How DNS-Based Endpoints Enhance Security in GKE Clusters

Building a Data Cache StorageClass: To teach GKE how to use Data Cache to dynamically provide persistent volumes, you need a Kubernetes StorageClass. For each PVC referencing this StorageClass, this StorageClass manifest specifies attributes such as the backing disk type, data-cache-mode (such as writethrough), and data-cache-size, which determines how much local SSD capacity should be used as a read cache.

Using a PersistentVolumeClaim (PVC) to request storage: Make a PVC with the Data Cache StorageClass as a reference. When Data Cache is enabled, this PVC requests a persistent volume.

Developing a PVC-based deployment: The PVC should be mentioned in the deployment manifest for your application. The Pod can be scheduled onto a node that has the required Local SSD resources for GKE Data Cache by using a node selector, such as cloud.google.com/gke-data-cache-disk: “1”.

Pricing

The total provisioned capacity of your local SSDs used for data cache and the attached persistent disks determines how much you are charged for GKE Data Cache. The monthly fee is per GiB. Further details about disk price are available in the Compute Engine documentation.

Confirmation

You may confirm that your PVC has been successfully tied to a persistent volume after deployment. By examining the logs of the pdcsi-node Pod in the kube-system namespace, you can also verify that the Logical Volume Manager (LVM) Group for Data Cache was established on the node.