Google Kubernetes Engine (GKE)
This guide provides an overview of deploying Corridor on Google Kubernetes Engine (GKE).
Background¶
Corridor can be deployed on GKE using Kubernetes to manage containerized services. The deployment leverages GCP's managed services for databases, storage, and networking.
Before Installation¶
Prerequisites¶
- GCP Project with appropriate permissions
gcloudCLI installed and configuredkubectlCLI installed (via gcloud components)- Access to Corridor container images
- Sufficient GCP quota for required resources
Required GCP Services¶
-
Google Kubernetes Engine (GKE): Managed Kubernetes cluster
-
Private cluster recommended
- Node autoscaling configured
-
Cloud NAT for egress traffic
-
Cloud SQL: PostgreSQL database for metadata
-
PostgreSQL 14+ recommended
- High availability configuration
-
Automated backups enabled
-
Google Filestore: NFS storage for persistent volumes
- Standard tier: 1TB+ capacity
- Accessible from GKE cluster VPC
Optional GCP Services¶
- Cloud DNS: For domain management
- Secret Manager: For storing sensitive configuration
- Cloud Monitoring: For logging and monitoring
- Cloud Armor: For DDoS protection and WAF
- Cloud CDN: For static asset caching
Architecture Overview¶
GKE Cluster
├── Application Namespace
│ ├── corridor-app (Web UI)
│ ├── corridor-worker (Celery workers)
│ ├── corridor-jupyter (JupyterHub)
│ └── redis (Message Queue)
└── Cluster Components
├── NGINX Ingress Controller
├── cert-manager (TLS)
└── NFS Provisioner
Key Components:
- Persistent Volumes: NFS-backed storage via Google Filestore
- NGINX Ingress: HTTP routing and load balancing
- cert-manager: Automated Let's Encrypt TLS certificates
- Redis: Container-based Redis service for message queuing
Installation Overview¶
Step 1: Setup GKE Cluster¶
- Create GKE cluster
- Setup Cloud NAT for private cluster internet access
- Enable cluster autoscaling
- Configure authorized networks for cluster access
Step 2: Setup Supporting Infrastructure¶
- Create Cloud SQL PostgreSQL instance
- Create Google Filestore instance for NFS storage
- Configure VPC networking and firewall rules
Step 3: Install Cluster Components¶
Install one-time cluster-level components:
- NGINX Ingress Controller
- cert-manager for automated TLS certificates
- NFS subdir external provisioner
Step 4: Deploy Corridor¶
- Create Kubernetes namespace
- Create database in Cloud SQL
- Configure Kustomize overlay
- Create container registry pull secrets
- Apply Kubernetes manifests using Kustomize
- Configure DNS and verify deployment
Installation Steps¶
Contact Corridor support for:
- Complete GKE cluster creation scripts
- Terraform/Infrastructure-as-Code templates
- Kubernetes manifests and Kustomize overlays
- Client deployment configuration examples
Post Installation¶
After deployment:
- Configure DNS: Point domain to Ingress external IP address
- Verify TLS: Ensure certificates are issued by cert-manager (check after 5-10 minutes)
- Create Admin User: Initialize first admin user account
- Test Connectivity: Verify all services are accessible
Monitoring and Operations¶
View Logs¶
# View application logs
kubectl logs -n <namespace> deploy/corridor-app
# View worker logs
kubectl logs -n <namespace> deploy/corridor-worker
Access Services¶
# Access pod shell
kubectl exec -it -n <namespace> deploy/corridor-app -- /bin/bash
# View pod status
kubectl get pods -n <namespace>
# View all resources in namespace
kubectl get all -n <namespace>
Update Deployment¶
# Restart deployment (pull latest image)
kubectl rollout restart deployment corridor-app -n <namespace>
# Check rollout status
kubectl rollout status deployment corridor-app -n <namespace>
Common Operations¶
# View ingress and external IP
kubectl get ingress -n <namespace>
# Check certificate status
kubectl get certificate -n <namespace>
# View persistent volume claims
kubectl get pvc -n <namespace>
GKE-Specific Features¶
- GKE Autopilot (optional): Fully managed Kubernetes with optimized configurations
- Node Auto-scaling: Automatic cluster scaling based on workload demands
- Cloud Monitoring Integration: Built-in logging and metrics collection
- Binary Authorization: Enforce deployment policies and image attestation
Cost Optimization¶
- Use Preemptible Nodes: For non-production workloads (up to 80% savings)
- Enable Cluster Autoscaling: Scale down during off-hours
- Use Committed Use Discounts: For predictable production workloads
- Right-size Node Pools: Use appropriate machine types for workloads
- Use Cloud Storage lifecycle policies for backups
Security Best Practices¶
- Deploy in private subnets with Cloud NAT
- Enable Binary Authorization for image verification
- Configure Network Policies for pod-to-pod traffic
- Use Private GKE Clusters with authorized networks only
- Enable Shielded GKE Nodes for secure boot
- Store secrets in Kubernetes secrets
Example Configurations¶
Example Dockerfile¶
Below is an example Dockerfile for containerizing Corridor applications:
FROM redhat/ubi8
SHELL ["/bin/bash", "-o", "pipefail", "-c"]
ENV CORRIDOR_HOME=/opt/corridor
# Disable buffering in python to enable faster logging
ENV PYTHONUNBUFFERED=1
# System dependencies
RUN yum update -y \
&& yum install java-1.8.0-openjdk procps-ng https://dl.fedoraproject.org/pub/epel/epel-release-latest-8.noarch.rpm -y \
&& yum clean all
# Setup JAVA_HOME so Spark can find the java-1.8.0
ENV JAVA_HOME=/etc/alternatives/jre
COPY --from=ghcr.io/astral-sh/uv:latest /uv /uvx /bin/
# NOTE: Don't use UV_COMPILE_BYTECODE=1 here. It increases the image build time significantly
# Ref: https://docs.astral.sh/uv/guides/integration/docker/#caching
ENV UV_LINK_MODE=copy
WORKDIR $CORRIDOR_HOME
ENV PATH="/opt/corridor/venv/bin/:$PATH"
COPY uv.lock pyproject.toml $CORRIDOR_HOME/
# Install runtime dependencies
RUN --mount=type=cache,target=/root/.cache/uv \
uv venv $CORRIDOR_HOME/venv --python 3.11 --seed \
&& source $CORRIDOR_HOME/venv/bin/activate \
&& uv sync --active --frozen --no-install-workspace --no-group dev --no-group test --extra pyspark
# Install corridor packages
RUN --mount=target=/corridor_wheels,type=bind,source=corridor_wheels \
uv pip install --python $CORRIDOR_HOME/venv --no-cache-dir /corridor_wheels/*.whl
# Expose application ports
EXPOSE 5002 5003
# Health check (adjust for specific service)
HEALTHCHECK --interval=30s --timeout=5s --start-period=60s --retries=3 \
CMD curl -f http://localhost:5002/corr-api || exit 1
Building the Image:
# Build wheel distributions first
uv build --wheel
# Build Docker image with BuildKit
docker build -t corridor/ggx:latest .
# For multi-platform builds
docker buildx build --platform linux/amd64,linux/arm64 -t corridor/ggx:latest .
Using Different Services:
The same image can run different Corridor services by overriding the command:
# corridor-app
command: ["/opt/corridor/venv/bin/corridor-app", "run"]
# corridor-worker
command: ["/opt/corridor/venv/bin/corridor-worker", "run"]
# corridor-jupyter
command: ["/opt/corridor/venv/bin/corridor-jupyter", "run"]
Example Terraform Configuration¶
Below is an example Terraform configuration for provisioning GCP infrastructure for Corridor on GKE:
# terraform.tf - Main configuration
terraform {
required_version = ">= 1.0"
required_providers {
google = {
source = "hashicorp/google"
version = "~> 5.0"
}
}
}
provider "google" {
project = var.project_id
region = var.region
}
# Variables
variable "project_id" {
description = "GCP Project ID"
type = string
}
variable "region" {
description = "GCP Region"
type = string
default = "us-central1"
}
variable "zone" {
description = "GCP Zone"
type = string
default = "us-central1-a"
}
variable "cluster_name" {
description = "GKE Cluster Name"
type = string
default = "corridor-gke"
}
# VPC Network
resource "google_compute_network" "vpc" {
name = "corridor-vpc"
auto_create_subnetworks = false
}
resource "google_compute_subnetwork" "subnet" {
name = "corridor-subnet"
ip_cidr_range = "10.0.0.0/24"
region = var.region
network = google_compute_network.vpc.id
secondary_ip_range {
range_name = "pods"
ip_cidr_range = "10.1.0.0/16"
}
secondary_ip_range {
range_name = "services"
ip_cidr_range = "10.2.0.0/16"
}
}
# Cloud NAT for private cluster
resource "google_compute_router" "router" {
name = "corridor-router"
region = var.region
network = google_compute_network.vpc.id
}
resource "google_compute_router_nat" "nat" {
name = "corridor-nat"
router = google_compute_router.router.name
region = var.region
nat_ip_allocate_option = "AUTO_ONLY"
source_subnetwork_ip_ranges_to_nat = "ALL_SUBNETWORKS_ALL_IP_RANGES"
}
# GKE Cluster
resource "google_container_cluster" "primary" {
name = var.cluster_name
location = var.zone
# Private cluster configuration
private_cluster_config {
enable_private_nodes = true
enable_private_endpoint = false
master_ipv4_cidr_block = "172.16.0.0/28"
}
# Network configuration
network = google_compute_network.vpc.name
subnetwork = google_compute_subnetwork.subnet.name
ip_allocation_policy {
cluster_secondary_range_name = "pods"
services_secondary_range_name = "services"
}
# Remove default node pool
remove_default_node_pool = true
initial_node_count = 1
# Maintenance window
maintenance_policy {
daily_maintenance_window {
start_time = "03:00"
}
}
# Add-ons
addons_config {
http_load_balancing {
disabled = false
}
horizontal_pod_autoscaling {
disabled = false
}
}
}
# Node Pool
resource "google_container_node_pool" "primary_nodes" {
name = "primary-node-pool"
location = var.zone
cluster = google_container_cluster.primary.name
node_count = 3
autoscaling {
min_node_count = 3
max_node_count = 10
}
node_config {
machine_type = "n2-standard-4"
disk_size_gb = 100
disk_type = "pd-standard"
oauth_scopes = [
"https://www.googleapis.com/auth/cloud-platform"
]
metadata = {
disable-legacy-endpoints = "true"
}
}
management {
auto_repair = true
auto_upgrade = true
}
}
# Cloud SQL Instance
resource "google_sql_database_instance" "postgres" {
name = "corridor-db"
database_version = "POSTGRES_14"
region = var.region
settings {
tier = "db-custom-4-16384"
availability_type = "REGIONAL"
disk_size = 100
disk_type = "PD_SSD"
backup_configuration {
enabled = true
start_time = "03:00"
}
ip_configuration {
ipv4_enabled = true
private_network = google_compute_network.vpc.id
}
}
deletion_protection = true
}
resource "google_sql_database" "database" {
name = "corridor_db"
instance = google_sql_database_instance.postgres.name
}
resource "google_sql_user" "user" {
name = "corridor"
instance = google_sql_database_instance.postgres.name
password = var.db_password # Store in terraform.tfvars or use Secret Manager
}
# Filestore Instance
resource "google_filestore_instance" "nfs" {
name = "corridor-nfs"
location = var.zone
tier = "STANDARD"
file_shares {
capacity_gb = 1024
name = "corridor"
}
networks {
network = google_compute_network.vpc.name
modes = ["MODE_IPV4"]
}
}
# Outputs
output "cluster_name" {
value = google_container_cluster.primary.name
description = "GKE Cluster Name"
}
output "cluster_endpoint" {
value = google_container_cluster.primary.endpoint
description = "GKE Cluster Endpoint"
sensitive = true
}
output "cloudsql_connection_name" {
value = google_sql_database_instance.postgres.connection_name
description = "Cloud SQL Connection Name"
}
output "filestore_ip" {
value = google_filestore_instance.nfs.networks[0].ip_addresses[0]
description = "Filestore NFS IP Address"
}
Usage:
# Initialize Terraform
terraform init
# Create terraform.tfvars file
cat > terraform.tfvars <<EOF
project_id = "your-gcp-project-id"
region = "us-central1"
zone = "us-central1-a"
db_password = "secure-password-here"
EOF
# Plan the deployment
terraform plan
# Apply the configuration
terraform apply
# Get kubectl credentials
gcloud container clusters get-credentials corridor-gke \
--zone us-central1-a \
--project your-gcp-project-id
Notes:
- Store sensitive variables in
terraform.tfvars(add to.gitignore) - Consider using Terraform Cloud or GCS backend for remote state
- Adjust machine types and disk sizes based on your requirements
- Review and customize network policies and firewall rules