Installation on AWS
This guide provides an overview of deploying Corridor on Amazon Web Services. Corridor supports three deployment approaches on AWS.
Use Cases¶
Corridor GenGuardX is a Responsible AI Governance & Testing Automation Platform designed to help organizations deploy and manage Generative AI applications securely and compliantly. Common use cases include:
- IVR Systems: Deploy and monitor interactive voice response systems powered by GenAI
- Agent Assist Tools: Build and govern AI-powered customer service agent assistance applications
- Chatbots: Create, test, and deploy conversational AI chatbots with comprehensive governance
- RAG Applications: Deploy Retrieval-Augmented Generation systems with end-to-end testing and monitoring
- Multi-Agent Workflows: Orchestrate complex agent-based workflows with approval and monitoring capabilities
- Regulatory Compliance: Ensure GenAI applications meet Model Risk Management (MRM), Fair Lending, and other regulatory requirements
Corridor enables organizations to move from experimentation to production deployment of high-ROI GenAI use cases with strong end-to-end pipeline testing, regulatory governance, and continual human-in-the-loop monitoring.
Technical Prerequisites and Requirements¶
Before beginning the deployment, ensure the following technical prerequisites and requirements are met:
Operating System Requirements¶
- EC2 Deployments: Amazon Linux 2, Ubuntu 20.04 LTS or later, or RHEL 8+ for EC2-based installations
- EKS/ECS Deployments: Container-based deployments use Linux-based container images (no OS installation required on host)
- Python: Python 3.11+ is required for EC2 deployments
- Java: Java 8+ is required for Spark worker functionality (EC2 deployments)
Database Requirements¶
- Database Type: PostgreSQL 11.7+ (PostgreSQL 14+ recommended for production)
- Database Storage: Minimum 5 GB for metadata storage (production deployments should allocate 100 GB+)
- Database Configuration:
- Multi-AZ configuration recommended for production environments
- Automated backups enabled
- Encryption at rest enabled
- Alternative Databases: Oracle 19+ or MSSQL 2016+ are supported but PostgreSQL is recommended for AWS deployments
Storage Requirements¶
- File Storage: Minimum 50 GB for file management (production deployments should allocate 500 GB+)
- Storage Types:
- EKS/Fargate: Amazon EFS (Elastic File System) for persistent storage
- EC2: EBS volumes (gp3 recommended) or Amazon S3 for object storage
- Database Storage: EBS-backed storage for RDS instances (gp3 or io1/io2 for high-performance workloads)
- Backup Storage: Amazon S3 for database backups and snapshots
Compute Requirements¶
- Web Application & API Worker: Minimum 4 GB RAM, 4 CPU cores
- Spark Worker: Minimum 16 GB RAM, 8 CPU cores (for data processing workloads)
- Jupyter Notebook: Minimum 4 GB RAM, 4 CPU cores (scales with concurrent users)
- Redis: Minimum 1 GB RAM, 1 CPU core
- Database (RDS): Minimum db.t3.medium (2 vCPU, 4 GB RAM) for development; db.t3.xlarge+ recommended for production
Network Requirements¶
- VPC: Virtual Private Cloud with public and private subnets
- Subnets: Minimum 2 availability zones for production deployments
- Internet Connectivity: NAT Gateway for outbound internet access from private subnets
- DNS: Domain name system (DNS) configuration via Route 53 or external DNS provider
- SSL/TLS: SSL certificates for HTTPS (Let's Encrypt via cert-manager or AWS Certificate Manager)
AWS Service Quotas¶
Ensure sufficient service quotas (limits) are available in your AWS account: - EC2 instances (if using EC2 deployment) - EKS clusters and nodes (if using EKS deployment) - ECS tasks and services (if using Fargate deployment) - RDS instances - EFS file systems - VPCs and subnets - NAT Gateways - Application Load Balancers
For detailed component-specific requirements, refer to the Minimum Requirements documentation.
Prerequisite Skills or Specialized Knowledge¶
Successful deployment of Corridor on AWS requires familiarity with the following skills and knowledge areas:
AWS Knowledge¶
- AWS Fundamentals: Understanding of AWS core services, regions, availability zones, and account management
- AWS Networking: Knowledge of VPC, subnets, security groups, route tables, NAT Gateways, and Internet Gateways
- AWS IAM: Understanding of IAM roles, policies, and service roles for resource access
- AWS Compute Services: Familiarity with at least one of the following:
- EKS: Kubernetes concepts, EKS cluster management, node groups, and Kubernetes resource management
- ECS: ECS clusters, task definitions, services, and Fargate launch types
- EC2: EC2 instance management, AMIs, security groups, and instance types
AWS Services Knowledge¶
Deployers should be familiar with the following AWS services used in Corridor deployments:
- Amazon RDS: Database instance creation, configuration, Multi-AZ setup, and backup management
- Amazon EFS: File system creation, mount targets, and access point configuration (for EKS/Fargate)
- Amazon S3: Bucket creation, IAM policies, and object storage management
- Application Load Balancer: ALB configuration, target groups, listeners, and SSL certificate management
- Route 53: DNS zone management, record creation, and health checks
- AWS Secrets Manager: Secret storage and retrieval for sensitive configuration
- CloudWatch: Logging, monitoring, and alerting configuration
- AWS WAF: Web application firewall rules and DDoS protection (optional)
Scripting and Programming Languages¶
- Bash/Shell Scripting: Required for automation scripts, installation procedures, and system administration tasks
- Python: Python 3.11+ knowledge is helpful for troubleshooting and custom configuration (EC2 deployments)
- Terraform or CloudFormation: Infrastructure-as-Code (IaC) knowledge is recommended for automated deployments
- YAML/JSON: Understanding of YAML and JSON syntax for configuration files (Kubernetes manifests, ECS task definitions, etc.)
Additional Skills (by Deployment Method)¶
For EKS Deployments: - Kubernetes: Strong understanding of Kubernetes concepts including pods, services, deployments, ConfigMaps, Secrets, and Ingress - kubectl: Proficiency with kubectl command-line tool - eksctl: Familiarity with eksctl for EKS cluster management (optional but recommended) - Kustomize or Helm: Knowledge of Kubernetes package management tools
For ECS Fargate Deployments: - Docker: Understanding of container concepts and Docker images - ECS CLI: Basic familiarity with ECS command-line tools
For EC2 Deployments: - Linux System Administration: SSH access, package management, service management (systemd), and file system operations - Network Configuration: Understanding of network interfaces, firewall rules, and routing
Recommended Experience Level¶
- Minimum: 1-2 years of AWS experience with hands-on deployment experience
- Recommended: 3+ years of AWS experience with production deployment experience
- For EKS: Additional 1+ years of Kubernetes experience
Environment Configuration¶
Before deploying Corridor, ensure the following environment configuration is in place:
AWS Account Requirements¶
- Active AWS Account: A valid AWS account with appropriate billing and payment methods configured
- IAM Permissions: AWS account or IAM user with sufficient permissions to create and manage:
- EC2 instances, EKS clusters, or ECS clusters (depending on deployment method)
- RDS database instances
- VPC, subnets, security groups, and networking resources
- EFS file systems or S3 buckets
- Application Load Balancers
- IAM roles and policies
- Route 53 hosted zones (if using Route 53)
- CloudWatch logs and metrics
- Service Quotas: Verify that AWS service quotas (limits) are sufficient for your deployment size
- Billing Alerts: Configure AWS billing alerts to monitor costs
Operating System Configuration¶
- EC2 Deployments:
- EC2 instances must be launched with a supported AMI (Amazon Linux 2, Ubuntu 20.04+, or RHEL 8+)
- SSH access configured with key pairs or AWS Systems Manager Session Manager
- System updates applied and security patches installed
- EKS/ECS Deployments:
- No OS configuration required on host systems (container-based)
- Container images must be based on Linux
Licensing Requirements¶
- Corridor Software: Valid Corridor license and access to Corridor installation bundle or container images
- Container Registry Access: Access credentials for Corridor container registry (for EKS/ECS deployments)
- Third-Party Software:
- PostgreSQL database (managed by AWS RDS - no separate license required)
- Redis (open-source, no license required)
- Python 3.11+ (open-source)
- Java 8+ (open-source OpenJDK)
- Spark 3.3+ (open-source, if using Spark workers)
DNS Configuration¶
- Domain Name: A registered domain name for accessing the Corridor instance (e.g.,
corridor.example.com) - DNS Provider: Either:
- AWS Route 53: Hosted zone configured in Route 53
- External DNS Provider: DNS provider (e.g., Cloudflare, GoDaddy) with ability to create A and CNAME records
- DNS Records: Ability to create DNS records pointing to the Application Load Balancer
- SSL Certificate: SSL/TLS certificate for HTTPS (can be obtained via Let's Encrypt or AWS Certificate Manager)
Network Configuration¶
- VPC Setup:
- VPC with CIDR block (e.g., 10.0.0.0/16)
- Public subnets for load balancers and NAT Gateways
- Private subnets for application components and databases
- Minimum 2 availability zones for production deployments
- Security Groups: Security group rules configured for:
- Application Load Balancer (inbound HTTPS/HTTP from internet)
- Application components (inbound from ALB, outbound to internet)
- Database (inbound from application components only)
- EFS mount targets (inbound from application components)
- NAT Gateway: NAT Gateway configured in public subnet for outbound internet access from private subnets
Access and Authentication¶
- AWS CLI: AWS CLI installed and configured with appropriate credentials
- kubectl (EKS only): kubectl CLI installed and configured to access EKS cluster
- eksctl (EKS only, optional): eksctl CLI installed for EKS cluster management
- SSH Access (EC2 only): SSH key pair created and configured for EC2 instance access
- Container Registry: Access credentials configured for pulling Corridor container images (EKS/ECS deployments)
Security Configuration¶
This section provides comprehensive security guidance for deploying Corridor on AWS, including access controls, encryption, network security, and data protection measures.
AWS Account Root User Warning¶
Do Not Use AWS Account Root User
CRITICAL SECURITY REQUIREMENT: The Corridor deployment and operation must not require the use of AWS account root user credentials. Using the root account for deployment or operations poses significant security risks.
Required Actions: - Create IAM users or roles with appropriate permissions for deployment and operations - Never use root account credentials for day-to-day operations - Enable MFA (Multi-Factor Authentication) on the root account and store credentials securely - Use the root account only for initial account setup and critical account-level operations
For more information on securing your root account, refer to the AWS Security Best Practices documentation.
Principle of Least Privilege¶
All IAM roles, policies, and access configurations must follow the principle of least privilege, granting only the minimum permissions necessary for each component to function.
Guidance for IAM Roles and Policies:
- EKS Cluster Service Role: Grants permissions for EKS to create and manage AWS resources on your behalf
- Required permissions:
eks:CreateCluster,eks:DescribeCluster,ec2:CreateNetworkInterface,ec2:DescribeNetworkInterfaces -
Should NOT include: Full EC2 access, S3 write access, or other unnecessary permissions
-
EKS Node Group Role: Allows EKS nodes to join the cluster and access required AWS services
- Required permissions:
ec2:DescribeInstances,ec2:DescribeInstanceTypes,ecr:GetAuthorizationToken,ecr:BatchCheckLayerAvailability,ecr:GetDownloadUrlForLayer,ecr:BatchGetImage -
Should NOT include: S3 write access (unless specifically required), RDS access, or other unrelated services
-
ECS Task Execution Role: Allows ECS tasks to pull container images and write logs
- Required permissions:
ecr:GetAuthorizationToken,ecr:BatchCheckLayerAvailability,ecr:GetDownloadUrlForLayer,ecr:BatchGetImage,logs:CreateLogStream,logs:PutLogEvents -
Should NOT include: S3 access, RDS access, or other unnecessary permissions
-
ECS Task Role: Grants permissions for application code running in containers
- Should be scoped to only the specific AWS services the application needs (e.g., S3 read/write for specific buckets, Secrets Manager read for specific secrets)
-
Should NOT include: Broad S3 access, RDS admin access, or other overly permissive policies
-
EC2 Instance Profile: For EC2 deployments, grants permissions for the instance to access AWS services
- Should be scoped to only required services (e.g., S3 read/write for specific buckets, Secrets Manager read)
- Should NOT include: Full EC2 access, RDS admin access, or other unnecessary permissions
Best Practices: - Review and audit IAM policies regularly - Use AWS IAM Policy Simulator to test permissions before deployment - Implement resource-level permissions (e.g., restrict S3 access to specific buckets) - Use condition keys to further restrict access (e.g., IP address restrictions, time-based access) - Separate read and write permissions where possible - Document the purpose of each permission granted
Public Resources Documentation¶
Corridor deployments on AWS are designed to minimize public exposure. The following resources may be configured as public:
Public Resources:
- Application Load Balancer (ALB): The ALB is deployed in public subnets and accepts inbound HTTPS/HTTP traffic from the internet. This is required for users to access the Corridor web interface.
- Security: ALB should be configured with security groups that only allow HTTPS (port 443) and optionally HTTP (port 80) for redirects
-
Recommendation: Use AWS WAF in front of the ALB for additional protection
-
Route 53 Public Hosted Zone (if using Route 53): Public DNS records are required for domain resolution
- Security: DNS records themselves are public by design; ensure DNS records point to the ALB, not directly to application instances
Private Resources (Not Publicly Accessible):
- EC2 Instances: Deployed in private subnets, not directly accessible from the internet
- EKS Nodes: Deployed in private subnets
- ECS Tasks: Deployed in private subnets
- RDS Database: Deployed in private subnets, accessible only from application components
- EFS File Systems: Mount targets in private subnets, accessible only from application components
- S3 Buckets: By default, S3 buckets are private. Bucket policies should NOT allow public access unless specifically required for a use case (which is not typical for Corridor deployments)
No Public S3 Buckets Required: Corridor deployments do not require any S3 buckets with public access policies. All S3 buckets used for file storage, backups, or other purposes should be configured as private with appropriate IAM policies for access control.
IAM Roles and Policies Purpose¶
The following IAM roles and policies are created during Corridor deployment. Each role serves a specific purpose:
EKS Deployments:
- EKS Cluster Service Role (
AmazonEKSClusterPolicy,AmazonEKSVPCResourceController) - Purpose: Allows the EKS service to create and manage AWS resources (VPC, ENIs, security groups) on your behalf
- Used by: EKS service when creating/managing the cluster
-
Scope: Cluster-level operations
-
EKS Node Group Role (
AmazonEKSWorkerNodePolicy,AmazonEKS_CNI_Policy,AmazonEC2ContainerRegistryReadOnly,AmazonEKS_EFS_CSI_Driver_Policy) - Purpose: Allows EKS worker nodes to join the cluster, pull container images from ECR, and access EFS for persistent storage
- Used by: EC2 instances in the EKS node group
-
Scope: Node-level operations
-
EKS Pod Service Account Role (Application-specific)
- Purpose: Grants permissions for application pods to access AWS services (S3, Secrets Manager, etc.)
- Used by: Corridor application containers running in pods
- Scope: Application-level operations, scoped to specific services and resources
ECS Fargate Deployments:
- ECS Task Execution Role (
AmazonECSTaskExecutionRolePolicy) - Purpose: Allows ECS tasks to pull container images from ECR and write logs to CloudWatch
- Used by: ECS service when starting tasks
-
Scope: Task lifecycle operations
-
ECS Task Role (Application-specific)
- Purpose: Grants permissions for application code running in containers to access AWS services
- Used by: Corridor application containers
- Scope: Application runtime operations, scoped to specific services
EC2 Deployments:
- EC2 Instance Profile Role (Application-specific)
- Purpose: Grants permissions for the EC2 instance to access AWS services (S3, Secrets Manager, CloudWatch)
- Used by: Corridor application components running on the EC2 instance
- Scope: Instance-level operations
Common Policies:
- Secrets Manager Read Policy: Allows reading specific secrets from AWS Secrets Manager
- Purpose: Retrieve database credentials and other sensitive configuration
-
Scope: Read-only access to specific secret ARNs
-
S3 Access Policy: Allows read/write access to specific S3 buckets
- Purpose: Store and retrieve files, backups, and artifacts
- Scope: Bucket-level or object-level permissions
Keys Purpose and Location¶
The following keys and credentials are created or used during Corridor deployment:
SSH Key Pairs (EC2 Deployments Only):
- Purpose: Secure access to EC2 instances for initial setup and troubleshooting
- Location:
- Public key: Stored in AWS EC2 Key Pairs service, associated with EC2 instances
- Private key: Stored securely on the deployer's local machine (never stored in AWS)
- Usage: Used for SSH access to EC2 instances
- Security: Private keys should be stored with appropriate file permissions (chmod 600) and never shared or committed to version control
Database Credentials:
- Purpose: Authentication to the RDS PostgreSQL database
- Location:
- Primary Storage: AWS Secrets Manager (recommended)
- Alternative: Environment variables or configuration files (less secure)
- Usage: Used by Corridor application components to connect to the database
- Security:
- Rotate credentials regularly (every 90 days recommended)
- Use Secrets Manager for automatic rotation if possible
- Never hardcode credentials in application code or configuration files
Container Registry Credentials (EKS/ECS Deployments):
- Purpose: Authentication to pull Corridor container images from the container registry
- Location:
- EKS: Stored as Kubernetes image pull secrets in the cluster
- ECS: Stored in ECS task definition or retrieved via IAM role (if using ECR)
- Usage: Used by Kubernetes/ECS to authenticate and pull container images
- Security:
- Use IAM roles for ECR access when possible (no credentials needed)
- If using external registries, store credentials in AWS Secrets Manager
SSL/TLS Certificates:
- Purpose: Encrypt HTTPS traffic between users and the application
- Location:
- EKS: Managed by cert-manager, stored as Kubernetes secrets
- ECS/EC2: AWS Certificate Manager (ACM) or Let's Encrypt certificates
- Usage: Used by Application Load Balancer for SSL/TLS termination
- Security: Certificates are automatically renewed by cert-manager or ACM
API Keys and Application Secrets:
- Purpose: Authentication for external integrations (LLM providers, data sources, etc.)
- Location: AWS Secrets Manager (recommended) or environment variables
- Usage: Used by Corridor application components for external API access
- Security: Rotate regularly and use Secrets Manager for centralized management
Secrets Management¶
Corridor deployments use AWS Secrets Manager for secure storage and management of sensitive credentials and configuration.
Secrets Stored in AWS Secrets Manager:
- Database Credentials: RDS PostgreSQL username and password
- Application Secrets: API keys, tokens, and other sensitive configuration
- External Integration Credentials: LLM provider API keys, data source credentials
Maintaining Stored Secrets:
-
Initial Secret Creation:
-
Secret Rotation:
- Manual Rotation: Update secrets via AWS Secrets Manager console or CLI
- Automatic Rotation: Configure RDS automatic password rotation (if supported)
-
Rotation Schedule: Rotate secrets every 90 days or as per organizational policy
-
Secret Access:
- Application components retrieve secrets at runtime via AWS SDK
- Never log or expose secret values in application logs
-
Use IAM policies to restrict secret access to specific roles/users
-
Secret Updates:
- Update secrets via AWS Secrets Manager console or CLI
- Application components should handle secret updates gracefully (may require restart)
-
Test secret updates in non-production environments first
-
Secret Deletion:
- Use recovery window (7-30 days) when deleting secrets
- Ensure applications are updated before permanently deleting secrets
- Document all secrets and their purposes
Best Practices: - Use separate secrets for different environments (dev, staging, production) - Implement secret versioning to track changes - Monitor secret access via CloudTrail - Use resource-based policies to restrict secret access - Regularly audit secret access permissions
Sensitive Customer Data Storage¶
Corridor stores customer data in the following locations. All storage locations implement encryption at rest and in transit:
Database Storage (Amazon RDS PostgreSQL):
- Location: RDS PostgreSQL database instance
- Data Stored:
- User account information (usernames, email addresses, hashed passwords)
- Application metadata (pipelines, prompts, models, RAG configurations)
- Evaluation results and reports
- Audit logs and activity history
- Encryption: Encryption at rest enabled (AWS KMS), encryption in transit (TLS)
- Access Control: Database accessible only from application components via security groups and IAM authentication
File Storage (Amazon EFS or EBS):
- Location:
- EKS/Fargate: Amazon EFS file system
- EC2: EBS volumes attached to EC2 instances
- Data Stored:
- Uploaded files and datasets
- Generated artifacts and outputs
- Temporary files and cache
- Encryption: Encryption at rest enabled (AWS KMS for EFS, EBS encryption for volumes), encryption in transit (TLS for EFS)
- Access Control: File systems accessible only from application components via security groups and mount targets
Object Storage (Amazon S3):
- Location: Amazon S3 buckets (optional, for backups and large files)
- Data Stored:
- Database backups and snapshots
- Large files and datasets
- Exported reports and artifacts
- Encryption: Server-side encryption enabled (SSE-S3 or SSE-KMS), encryption in transit (HTTPS)
- Access Control: Buckets are private; access controlled via IAM policies and bucket policies
Application Memory (Temporary):
- Location: Application server memory (Redis, application cache)
- Data Stored:
- Session data
- Temporary task data
- Cache entries
- Encryption: Data in memory is not encrypted (standard for in-memory storage)
- Access Control: Redis accessible only from application components via security groups
Log Storage (Amazon CloudWatch Logs):
- Location: CloudWatch Logs
- Data Stored:
- Application logs
- Access logs
- Error logs
- Encryption: Encryption at rest enabled (AWS KMS), encryption in transit (TLS)
- Access Control: Log groups accessible only via IAM policies
Data Residency: Customer data remains within the selected AWS region unless explicitly configured for cross-region replication (backups only).
Data Encryption Configurations¶
Corridor deployments implement encryption at rest and in transit for all data storage and communication:
Encryption at Rest:
- Amazon RDS PostgreSQL:
- Encryption Method: AWS KMS (Key Management Service) encryption
- Configuration: Encryption enabled during RDS instance creation
- Key Management: Customer-managed KMS keys (CMK) or AWS-managed keys
-
Scope: All database data, logs, backups, and snapshots
-
Amazon EFS:
- Encryption Method: AWS KMS encryption
- Configuration: Encryption enabled during EFS file system creation
- Key Management: Customer-managed KMS keys (CMK) or AWS-managed keys
-
Scope: All files stored in EFS
-
Amazon EBS Volumes (EC2 deployments):
- Encryption Method: EBS encryption using AWS KMS
- Configuration: Encryption enabled during volume creation or via instance-level encryption
- Key Management: Customer-managed KMS keys (CMK) or AWS-managed keys
-
Scope: All data on EBS volumes, including root volumes and data volumes
-
Amazon S3:
- Encryption Method: Server-Side Encryption (SSE-S3 or SSE-KMS)
- Configuration: Default encryption enabled on S3 buckets
- Key Management: AWS-managed keys (SSE-S3) or customer-managed KMS keys (SSE-KMS)
-
Scope: All objects stored in S3 buckets
-
Amazon CloudWatch Logs:
- Encryption Method: AWS KMS encryption
- Configuration: Encryption enabled on log groups
- Key Management: Customer-managed KMS keys (CMK) or AWS-managed keys
- Scope: All log data stored in CloudWatch Logs
Encryption in Transit:
- HTTPS/TLS: All communication between users and the application is encrypted using TLS 1.2 or higher
- SSL/TLS Termination: Application Load Balancer handles SSL/TLS termination
-
Certificate Management: Certificates managed via AWS Certificate Manager (ACM) or Let's Encrypt (via cert-manager)
-
Database Connections: All connections to RDS PostgreSQL are encrypted using TLS
- Configuration: TLS required in database connection strings
-
Certificate Validation: RDS SSL certificates validated
-
EFS Connections: All connections to EFS are encrypted using TLS
-
Configuration: TLS encryption enabled for EFS mount operations
-
S3 Connections: All connections to S3 use HTTPS
-
Configuration: S3 endpoints use HTTPS by default
-
Internal Communication: Application components communicate over encrypted channels within the VPC
Key Management:
- AWS KMS: Use AWS Key Management Service for managing encryption keys
- Key Rotation: Enable automatic key rotation for KMS keys (annual rotation)
- Key Access: Restrict KMS key access via IAM policies following least privilege principles
- Audit: Monitor key usage via CloudTrail
Network Configuration Details¶
Corridor deployments involve multiple network components working together to provide secure, isolated networking:
Virtual Private Cloud (VPC):
- Purpose: Provides isolated network environment for Corridor resources
- Configuration:
- CIDR block: Typically 10.0.0.0/16 (65,536 IP addresses)
- DNS resolution and DNS hostnames enabled
- Internet gateway attached for public subnet internet access
- Location: Single AWS region (multi-region deployments use separate VPCs per region)
Subnets:
- Public Subnets:
- Purpose: Host Application Load Balancer and NAT Gateways
- Configuration:
- CIDR blocks: e.g., 10.0.1.0/24, 10.0.2.0/24 (one per availability zone)
- Route table: Routes internet traffic (0.0.0.0/0) to Internet Gateway
- Auto-assign public IP: Enabled
-
Location: Minimum 2 availability zones for high availability
-
Private Subnets:
- Purpose: Host application components, databases, and internal resources
- Configuration:
- CIDR blocks: e.g., 10.0.10.0/24, 10.0.11.0/24 (one per availability zone)
- Route table: Routes internet traffic (0.0.0.0/0) to NAT Gateway
- Auto-assign public IP: Disabled
- Location: Minimum 2 availability zones for high availability
Security Groups:
- ALB Security Group:
- Inbound Rules:
- HTTPS (443) from 0.0.0.0/0 (internet)
- HTTP (80) from 0.0.0.0/0 (for redirects to HTTPS)
- Outbound Rules:
- All traffic to application security group (ports 80, 443)
-
Purpose: Controls access to the Application Load Balancer
-
Application Security Group (EKS/ECS/EC2):
- Inbound Rules:
- HTTP (80) from ALB security group
- HTTPS (443) from ALB security group
- Outbound Rules:
- HTTPS (443) to internet (0.0.0.0/0) for external API calls
- TCP (5432) to RDS security group for database access
- TCP (2049) to EFS security group for file system access
-
Purpose: Controls access to application components
-
RDS Security Group:
- Inbound Rules:
- PostgreSQL (5432) from application security group only
- Outbound Rules: None (database does not initiate connections)
-
Purpose: Restricts database access to application components only
-
EFS Security Group:
- Inbound Rules:
- NFS (2049) from application security group only
- Outbound Rules: None
- Purpose: Restricts file system access to application components only
Network ACLs (NACLs):
- Purpose: Additional network-level security controls (stateless firewall)
- Configuration:
- Default NACLs allow all traffic (can be customized for stricter controls)
- Custom NACLs can be created for subnet-level restrictions
- Best Practice: Use security groups for primary access control (stateful), NACLs for additional network-level restrictions if needed
Route Tables:
- Public Route Table:
- Routes:
- 10.0.0.0/16 → Local (VPC internal)
- 0.0.0.0/0 → Internet Gateway (public internet)
-
Associated With: Public subnets
-
Private Route Table:
- Routes:
- 10.0.0.0/16 → Local (VPC internal)
- 0.0.0.0/0 → NAT Gateway (outbound internet)
- Associated With: Private subnets
NAT Gateway:
- Purpose: Provides outbound internet access for resources in private subnets
- Configuration:
- Deployed in public subnet
- Elastic IP address associated
- Route table configured to route private subnet traffic to NAT Gateway
- High Availability: Deploy NAT Gateway in each availability zone for redundancy (recommended for production)
VPC Endpoints (Optional):
- Purpose: Private connectivity to AWS services without internet gateway or NAT Gateway
- Services: Can be configured for S3, Secrets Manager, CloudWatch Logs
- Benefits: Reduced data transfer costs, improved security, lower latency
Instance Metadata Service (IMDS) Configuration¶
Corridor deployments support disabling Instance Metadata Service Version 1 (IMDSv1) to enhance security.
IMDSv1 Disable Support:
- Requirement: All components hosted in the customer's AWS account must support disabling IMDSv1
- Compliance: Corridor uses the latest AWS SDK versions, which automatically use IMDSv2 (Token-based authentication) when available
Configuration:
- EKS Deployments:
- EKS node groups can be configured to require IMDSv2
- Set
metadata_options.http_tokenstorequiredin node group configuration -
Application pods use IMDSv2 automatically via AWS SDK
-
ECS Fargate Deployments:
- ECS tasks automatically use IMDSv2 when available
-
No additional configuration required
-
EC2 Deployments:
- EC2 instances can be configured to require IMDSv2
- Set
MetadataOptions.HttpTokenstorequiredduring instance launch - Application code uses IMDSv2 automatically via AWS SDK
Verification:
# Verify IMDSv1 is disabled (should return 401 Unauthorized)
curl http://169.254.169.254/latest/meta-data/
# Verify IMDSv2 works (should return metadata)
TOKEN=$(curl -X PUT "http://169.254.169.254/latest/api/token" -H "X-aws-ec2-metadata-token-ttl-seconds: 21600")
curl -H "X-aws-ec2-metadata-token: $TOKEN" http://169.254.169.254/latest/meta-data/
AWS SDK Compatibility:
- Corridor uses AWS SDK versions that support IMDSv2 automatically
- SDKs automatically fall back to IMDSv2 when IMDSv1 is disabled
- No code changes required when disabling IMDSv1
Typical Customer Deployment Overview¶
A typical Corridor deployment on AWS includes the following resources and components:
Infrastructure Resources¶
- Virtual Private Cloud (VPC): Isolated network environment with public and private subnets
- Subnets: Multiple availability zones for high availability (minimum 2 AZs for production)
- NAT Gateway: For outbound internet access from private subnets
- Internet Gateway: For public subnet internet access
- Route Tables: Network routing configuration
- Security Groups: Firewall rules for network access control
- Network ACLs: Additional network-level security controls
Compute Resources¶
- EKS Cluster (Kubernetes option): Managed Kubernetes cluster with node groups
- ECS Cluster (Fargate option): Serverless container orchestration cluster
- EC2 Instances (VM option): Virtual machines running Corridor components
- Application Load Balancer (ALB): HTTP(S) load balancing and SSL termination
- Target Groups: Backend service routing configuration
Storage Resources¶
- Amazon RDS PostgreSQL: Managed database for metadata storage (Multi-AZ for production)
- Amazon EFS: Network file system for persistent storage (EKS/Fargate deployments)
- Amazon S3: Object storage for file management and backups
- EBS Volumes: Block storage for EC2 instances
Application Components¶
- corridor-app: Web application server providing the analytical UI
- corridor-api: API server for business logic
- corridor-worker-api: Celery worker for asynchronous API tasks
- corridor-worker-spark: Celery worker for asynchronous Spark tasks
- corridor-jupyter: Jupyter Notebook server for analytical use
- Redis: Message queue for task orchestration
Supporting Services¶
- Route 53: Domain name management and DNS configuration
- AWS Secrets Manager: Secure storage for sensitive configuration and credentials
- CloudWatch: Logging, monitoring, and alerting
- AWS WAF: Web application firewall for DDoS protection
- CloudFront: Content delivery network for static asset caching (optional)
- cert-manager: Automated TLS certificate management (EKS deployments)
Post-Deployment Resources¶
- IAM Roles and Policies: Service roles for AWS resource access
- Service Accounts: Kubernetes service accounts (EKS deployments)
- ConfigMaps and Secrets: Kubernetes configuration management (EKS deployments)
- Ingress Resources: HTTP routing configuration (EKS deployments)
Upon completion of deployment, customers have a fully functional Corridor instance accessible via HTTPS with all components running, database initialized, and monitoring configured.
Deployment Options¶
Corridor supports multiple deployment configurations on AWS to meet different availability and geographic requirements:
Single-Availability Zone (Single-AZ)¶
A single-AZ deployment runs all resources within a single AWS Availability Zone. This configuration is suitable for:
- Development and testing environments
- Non-production workloads
- Cost-optimized deployments
- Scenarios where downtime tolerance is acceptable
Characteristics: - Lower cost (no cross-AZ data transfer fees) - Simpler network configuration - Single point of failure for infrastructure components - RDS can be configured as single-AZ (not recommended for production)
Multi-Availability Zone (Multi-AZ)¶
A multi-AZ deployment distributes resources across multiple Availability Zones within a single AWS Region. This configuration provides:
- High availability and fault tolerance
- Automatic failover capabilities
- Production-ready resilience
- Data redundancy
Characteristics: - RDS Multi-AZ deployment with automatic failover - Application Load Balancer across multiple AZs - EFS mount targets in each AZ - EKS nodes distributed across AZs - Higher cost due to cross-AZ data transfer
Recommended for: Production environments requiring high availability.
Multi-Region¶
A multi-region deployment spans multiple AWS Regions for disaster recovery and geographic distribution. This configuration provides:
- Geographic redundancy and disaster recovery
- Reduced latency for global users
- Compliance with data residency requirements
- Cross-region failover capabilities
Characteristics: - Primary and secondary region deployments - Cross-region data replication - Route 53 health checks and failover routing - More complex networking and data synchronization - Highest cost due to cross-region data transfer
Recommended for: Enterprise deployments requiring disaster recovery and global presence.
All three deployment options (EKS, ECS Fargate, and EC2) support single-AZ, multi-AZ, and multi-region configurations. The specific implementation details vary by deployment method and are documented in the respective installation guides.
Expected Deployment Time¶
The expected time to complete a Corridor deployment on AWS is 30 minutes for a standard single-AZ or multi-AZ deployment. This includes:
- Infrastructure provisioning (VPC, subnets, security groups)
- Database setup (RDS PostgreSQL)
- Storage configuration (EFS or S3)
- Application component deployment
- Initial configuration and verification
Note: Multi-region deployments may take longer (typically 45-60 minutes) due to additional networking and replication setup requirements.
Actual deployment time may vary based on: - AWS service provisioning times - Network configuration complexity - Size of deployment (number of nodes/instances) - Automation tooling used (Terraform, CloudFormation, etc.)
Supported AWS Regions¶
Corridor can be deployed in all AWS Regions where the required services (EKS, ECS, EC2, RDS, EFS, S3) are available. Supported regions include:
US Regions¶
- US East (N. Virginia) -
us-east-1 - US East (Ohio) -
us-east-2 - US West (N. California) -
us-west-1 - US West (Oregon) -
us-west-2
Europe Regions¶
- Europe (Ireland) -
eu-west-1 - Europe (London) -
eu-west-2 - Europe (Frankfurt) -
eu-central-1 - Europe (Paris) -
eu-west-3 - Europe (Stockholm) -
eu-north-1 - Europe (Milan) -
eu-south-1
Asia Pacific Regions¶
- Asia Pacific (Tokyo) -
ap-northeast-1 - Asia Pacific (Seoul) -
ap-northeast-2 - Asia Pacific (Singapore) -
ap-southeast-1 - Asia Pacific (Sydney) -
ap-southeast-2 - Asia Pacific (Mumbai) -
ap-south-1 - Asia Pacific (Hong Kong) -
ap-east-1
Other Regions¶
- Canada (Central) -
ca-central-1 - South America (São Paulo) -
sa-east-1 - Middle East (Bahrain) -
me-south-1 - Middle East (UAE) -
me-central-1 - Africa (Cape Town) -
af-south-1
Note: Some AWS services may have limited availability in certain regions. Verify service availability in your target region before deployment. For the most current list of supported regions and service availability, refer to the AWS Regional Services List.
Cost and Licensing¶
This section provides information about AWS billable services, cost models, and licensing costs for Corridor deployments on AWS.
Billable AWS Services¶
The following AWS services are used in Corridor deployments. Each service is categorized as mandatory or optional:
Mandatory Services¶
Compute Services:
- Amazon EKS (Kubernetes deployments) - Mandatory
- Cost Model: Cluster management fee ($0.10/hour per cluster) + EC2 node costs
- Billing: Hourly charges for cluster management and node instances
-
Notes: Required for EKS-based deployments
-
Amazon ECS Fargate (Serverless deployments) - Mandatory
- Cost Model: Pay-per-use based on vCPU and memory resources consumed
- Billing: Charges for vCPU-hours and GB-hours of memory
-
Notes: Required for Fargate-based deployments
-
Amazon EC2 (VM deployments) - Mandatory
- Cost Model: On-demand or reserved instance pricing based on instance type
- Billing: Hourly charges for running instances
- Notes: Required for EC2-based deployments
Database Services:
- Amazon RDS PostgreSQL - Mandatory
- Cost Model: Instance pricing based on instance class, storage, and Multi-AZ configuration
- Billing: Hourly charges for database instances + storage costs
- Notes: Required for all deployment types. Multi-AZ adds ~2x cost for high availability
Storage Services:
- Amazon EFS (EKS/Fargate deployments) - Mandatory
- Cost Model: Pay-per-use based on storage capacity (GB-month) + data transfer costs
- Billing: Monthly charges for storage + data transfer fees
-
Notes: Required for EKS and Fargate deployments for persistent file storage
-
Amazon EBS (EC2 deployments) - Mandatory
- Cost Model: Volume pricing based on volume type and size (GB-month)
- Billing: Monthly charges for provisioned storage
-
Notes: Required for EC2 deployments for local storage
-
Amazon S3 - Mandatory
- Cost Model: Pay-per-use based on storage (GB-month), requests, and data transfer
- Billing: Monthly charges for storage, PUT/GET requests, and data transfer
- Notes: Required for backups and optional file storage
Networking Services:
- Application Load Balancer (ALB) - Mandatory
- Cost Model: Hourly charges + data processing charges (per GB)
- Billing: Hourly charges for load balancer + charges for data processed
-
Notes: Required for all deployments to provide HTTPS access
-
NAT Gateway - Mandatory
- Cost Model: Hourly charges + data processing charges (per GB)
- Billing: Hourly charges + charges for data processed through NAT Gateway
-
Notes: Required for outbound internet access from private subnets
-
VPC - Mandatory (No additional charge)
- Cost Model: No additional charge for VPC itself
- Billing: Included in AWS account (no separate charge)
-
Notes: Required for network isolation. Charges apply to resources within VPC
-
Data Transfer - Mandatory (Usage-based)
- Cost Model: Charges for data transfer out of AWS (per GB)
- Billing: Charges for internet data transfer, cross-AZ transfer, cross-region transfer
- Notes: Charges apply based on actual data transfer usage
Optional Services¶
DNS and Domain Services:
- Amazon Route 53 - Optional
- Cost Model: Hosted zone charges ($0.50/month) + query charges ($0.40 per million queries)
- Billing: Monthly charges for hosted zones + per-query charges
- Notes: Optional if using external DNS provider (e.g., Cloudflare, GoDaddy)
Security and Compliance Services:
- AWS Secrets Manager - Optional
- Cost Model: $0.40 per secret per month + $0.05 per 10,000 API calls
- Billing: Monthly charges per secret + API call charges
-
Notes: Optional but recommended for secure credential management. Can use environment variables or configuration files instead
-
AWS WAF - Optional
- Cost Model: $5 per web ACL per month + $1 per million requests
- Billing: Monthly charges for web ACLs + per-request charges
- Notes: Optional but recommended for production deployments for DDoS protection
Monitoring and Logging Services:
- Amazon CloudWatch - Optional
- Cost Model: Free tier includes 10 custom metrics, 5 GB log ingestion, 10 GB log storage. Beyond free tier: $0.30 per custom metric, $0.50 per GB log ingestion, $0.03 per GB log storage
- Billing: Charges for metrics, log ingestion, and log storage beyond free tier
-
Notes: Optional but recommended for production monitoring. Basic monitoring is included with EC2/EKS
-
CloudWatch Logs Insights - Optional
- Cost Model: $0.005 per GB of data scanned
- Billing: Charges for log data scanned during queries
- Notes: Optional for advanced log analysis
Content Delivery:
- Amazon CloudFront - Optional
- Cost Model: Pay-per-use based on data transfer (per GB) and requests (per 10,000 requests)
- Billing: Charges for data transfer out + request charges
- Notes: Optional for static asset caching and CDN functionality
Key Management:
- AWS KMS - Optional (but recommended for encryption)
- Cost Model: $1 per month per customer-managed key + $0.03 per 10,000 API requests
- Billing: Monthly charges for keys + API request charges
- Notes: Optional if using AWS-managed keys (free). Required for customer-managed encryption keys
Certificate Management:
- AWS Certificate Manager (ACM) - Optional (but recommended)
- Cost Model: Free for public SSL/TLS certificates
- Billing: No charge for public certificates
- Notes: Optional if using Let's Encrypt (via cert-manager). Recommended for simplified certificate management
Cost Model¶
Corridor deployments on AWS follow a pay-as-you-go cost model with the following cost components:
Infrastructure Costs:
- Compute: Costs vary by deployment method:
- EKS: Cluster management fee + EC2 node costs (on-demand or reserved instances)
- ECS Fargate: Pay-per-use based on vCPU and memory consumed
-
EC2: On-demand or reserved instance pricing based on instance type and size
-
Database: RDS PostgreSQL instance costs based on:
- Instance class (e.g., db.t3.medium, db.t3.xlarge)
- Storage type and size (gp3, io1, io2)
- Multi-AZ configuration (adds ~2x cost for high availability)
-
Backup storage (first 100% of provisioned storage is free)
-
Storage:
- EFS: Pay-per-use based on storage capacity (Standard tier: ~$0.30/GB-month)
- EBS: Provisioned storage costs (gp3: ~$0.08/GB-month)
- S3: Tiered pricing based on storage class (Standard: ~$0.023/GB-month)
Network Costs:
- Load Balancer: Hourly charges (~$0.0225/hour) + data processing charges (~$0.008/GB)
- NAT Gateway: Hourly charges (~$0.045/hour) + data processing charges (~$0.045/GB)
- Data Transfer:
- Internet data transfer out: First 100 GB/month free, then ~$0.09/GB
- Cross-AZ data transfer: ~$0.01/GB
- Cross-region data transfer: ~$0.02/GB
Operational Costs:
- Monitoring: CloudWatch charges for metrics, logs, and alarms (free tier available)
- Backup: RDS backup storage costs (first 100% of provisioned storage free)
- Secrets Management: Secrets Manager charges if used (~$0.40/secret/month)
Cost Optimization Recommendations:
- Use Reserved Instances: For predictable workloads, use EC2 or RDS Reserved Instances to save up to 75% compared to on-demand pricing
- Right-Size Resources: Start with minimum required resources and scale based on actual usage
- Use Spot Instances: For non-production environments, consider EC2 Spot Instances for EKS nodes (up to 90% savings)
- Optimize Storage: Use appropriate storage classes (S3 Intelligent-Tiering, EFS Infrequent Access) for cost optimization
- Monitor Costs: Set up AWS Cost Explorer and billing alerts to track and optimize spending
- Single-AZ for Development: Use single-AZ deployments for development/testing to reduce costs
- Data Transfer Optimization: Minimize cross-AZ and cross-region data transfer where possible
Estimated Monthly Costs (Example - Production Multi-AZ Deployment):
- Small Deployment (10-50 users):
- Compute: $200-400/month
- Database: $150-300/month
- Storage: $50-100/month
- Networking: $50-100/month
-
Total: ~$450-900/month
-
Medium Deployment (50-200 users):
- Compute: $400-800/month
- Database: $300-600/month
- Storage: $100-200/month
- Networking: $100-200/month
-
Total: ~$900-1,800/month
-
Large Deployment (200+ users):
- Compute: $800-2,000/month
- Database: $600-1,200/month
- Storage: $200-500/month
- Networking: $200-400/month
- Total: ~$1,800-4,100/month
Note: Actual costs vary significantly based on: - Deployment method (EKS vs ECS vs EC2) - Instance sizes and types - Data transfer volumes - Storage requirements - Multi-AZ vs single-AZ configuration - Region selected - Reserved Instance usage
Use the AWS Pricing Calculator for accurate cost estimates based on your specific requirements.
Licensing Costs¶
Corridor Software Licensing:
- License Model: Corridor GenGuardX is licensed on a subscription basis
- Pricing: Contact Corridor sales for current licensing pricing and terms
- License Components:
- Base platform license (required)
- Optional add-ons and premium features
- Support and maintenance (included or optional tier)
- License Scope: License covers the Corridor software platform and does not include AWS infrastructure costs
- Billing: Licensing costs are separate from AWS infrastructure costs and billed directly by Corridor
Third-Party Software Licensing:
- PostgreSQL: Open-source (no license cost when using Amazon RDS)
- Redis: Open-source (no license cost)
- Python: Open-source (no license cost)
- Java: Open-source OpenJDK (no license cost)
- Apache Spark: Open-source (no license cost)
- Kubernetes: Open-source (no license cost when using Amazon EKS)
AWS Service Licensing:
- All AWS services are billed on a pay-as-you-go basis with no upfront licensing fees
- AWS services do not require separate software licenses
- Charges are based on actual usage of AWS resources
Total Cost of Ownership (TCO):
The total cost of running Corridor on AWS includes: 1. Corridor Software License: Subscription fee (contact sales for pricing) 2. AWS Infrastructure Costs: Pay-as-you-go charges for AWS services (see cost estimates above) 3. Support Costs: Included in license or optional support tiers 4. Operational Costs: Staff time for deployment, maintenance, and operations
Note: For accurate licensing pricing and terms, please contact Corridor sales or your account representative.
Resource Sizing and Provisioning¶
This section provides guidance on selecting the appropriate type and size for AWS resources required for Corridor deployments. Automated provisioning scripts and Infrastructure-as-Code (IaC) templates are available in the detailed installation guides for each deployment method.
Resource Provisioning Options¶
Corridor provides two approaches for provisioning AWS resources:
-
Automated Provisioning Scripts: Terraform and CloudFormation templates are available in the detailed installation guides (EKS, ECS Fargate, EC2) that automate the creation of all required resources.
-
Manual Provisioning Guidance: Detailed guidance below for selecting resource types and sizes when provisioning manually or customizing automated scripts.
Compute Resource Sizing¶
EKS Deployments¶
EKS Cluster:
- Type: Amazon EKS (Managed Kubernetes Service)
- Sizing: Single cluster per environment (development, staging, production)
Node Groups:
- Development/Testing:
- Instance Type:
m5.largeorm5.xlarge - Node Count: 2-3 nodes
-
Auto-scaling: Min 2, Max 5 nodes
-
Production (Small - 10-50 users):
- Instance Type:
m5.xlarge(4 vCPU, 16 GB RAM) - Node Count: 3 nodes (minimum for high availability)
-
Auto-scaling: Min 3, Max 8 nodes
-
Production (Medium - 50-200 users):
- Instance Type:
m5.2xlarge(8 vCPU, 32 GB RAM) - Node Count: 3-5 nodes
-
Auto-scaling: Min 3, Max 12 nodes
-
Production (Large - 200+ users):
- Instance Type:
m5.4xlarge(16 vCPU, 64 GB RAM) orm5.8xlarge(32 vCPU, 128 GB RAM) - Node Count: 5+ nodes
- Auto-scaling: Min 5, Max 20 nodes
Selection Criteria:
- Start with m5.xlarge for production and scale based on actual usage
- Use m5 instance family for general-purpose workloads
- Consider c5 instances if CPU-intensive, r5 if memory-intensive
- Distribute nodes across multiple availability zones for high availability
ECS Fargate Deployments¶
ECS Cluster: - Type: Amazon ECS with Fargate launch type - Sizing: Single cluster per environment
Task Definitions (CPU and Memory): - Development/Testing: - corridor-app: 1 vCPU, 2 GB memory - corridor-worker: 1 vCPU, 2 GB memory - corridor-jupyter: 1 vCPU, 2 GB memory - redis: 0.5 vCPU, 1 GB memory - Production (Small - 10-50 users): - corridor-app: 2 vCPU, 4 GB memory - corridor-worker: 2 vCPU, 4 GB memory - corridor-jupyter: 2 vCPU, 4 GB memory - redis: 1 vCPU, 2 GB memory - Production (Medium - 50-200 users): - corridor-app: 4 vCPU, 8 GB memory - corridor-worker: 4 vCPU, 8 GB memory (scale horizontally) - corridor-jupyter: 4 vCPU, 8 GB memory - redis: 2 vCPU, 4 GB memory - Production (Large - 200+ users): - corridor-app: 4-8 vCPU, 8-16 GB memory - corridor-worker: 4-8 vCPU, 8-16 GB memory (scale horizontally) - corridor-jupyter: 4-8 vCPU, 8-16 GB memory - redis: 2-4 vCPU, 4-8 GB memory
Selection Criteria: - Start with smaller sizes and use auto-scaling to scale based on demand - Fargate automatically scales tasks based on CPU and memory utilization - Monitor CloudWatch metrics to optimize task sizes
EC2 Deployments¶
EC2 Instances:
- Development/Testing:
- Instance Type: t3.xlarge (4 vCPU, 16 GB RAM)
- Instance Count: 1 instance
- Production (Small - 10-50 users):
- Instance Type: t3.2xlarge (8 vCPU, 32 GB RAM) - Recommended minimum
- Instance Count: 1 instance (can add more for high availability)
- Production (Medium - 50-200 users):
- Instance Type: m5.2xlarge (8 vCPU, 32 GB RAM) or m5.4xlarge (16 vCPU, 64 GB RAM)
- Instance Count: 1-2 instances
- Production (Large - 200+ users):
- Instance Type: m5.4xlarge (16 vCPU, 64 GB RAM) or m5.8xlarge (32 vCPU, 128 GB RAM)
- Instance Count: 2+ instances (load balanced)
Selection Criteria:
- Use t3 instances for burstable workloads (development, small production)
- Use m5 instances for consistent performance requirements (medium/large production)
- Consider c5 instances for CPU-intensive workloads, r5 for memory-intensive workloads
- Ensure instance meets minimum requirements: 4 GB RAM, 4 CPU cores (see Technical Prerequisites and Requirements)
Database Resource Sizing¶
Amazon RDS PostgreSQL:
- Development/Testing:
- Instance Class:
db.t3.medium(2 vCPU, 4 GB RAM) - Storage: 20-50 GB (gp3)
- Multi-AZ: Disabled (single-AZ)
- Production (Small - 10-50 users):
- Instance Class:
db.t3.xlarge(4 vCPU, 16 GB RAM) - Recommended minimum - Storage: 100 GB (gp3)
- Multi-AZ: Enabled (recommended)
- Production (Medium - 50-200 users):
- Instance Class:
db.m5.xlarge(4 vCPU, 16 GB RAM) ordb.m5.2xlarge(8 vCPU, 32 GB RAM) - Storage: 200-500 GB (gp3)
- Multi-AZ: Enabled
- Production (Large - 200+ users):
- Instance Class:
db.m5.2xlarge(8 vCPU, 32 GB RAM) ordb.m5.4xlarge(16 vCPU, 64 GB RAM) - Storage: 500 GB - 1 TB (gp3 or io1/io2 for high IOPS)
- Multi-AZ: Enabled
Storage Type Selection: - gp3: General-purpose SSD (recommended for most workloads) - io1/io2: Provisioned IOPS SSD (for high-performance database workloads requiring consistent IOPS)
Selection Criteria:
- Start with db.t3.xlarge for production and scale based on database performance metrics
- Monitor CloudWatch metrics (CPUUtilization, DatabaseConnections, ReadLatency, WriteLatency)
- Enable Multi-AZ for production environments for high availability
- Provisioned storage can be increased without downtime (storage autoscaling recommended)
Storage Resource Sizing¶
Amazon EFS (EKS/Fargate Deployments)¶
- Development/Testing:
- Storage: 50-100 GB
- Performance Mode: General Purpose
- Throughput Mode: Bursting
- Production (Small - 10-50 users):
- Storage: 100-500 GB
- Performance Mode: General Purpose
- Throughput Mode: Bursting or Provisioned (if consistent performance needed)
- Production (Medium - 50-200 users):
- Storage: 500 GB - 2 TB
- Performance Mode: General Purpose
- Throughput Mode: Provisioned (for consistent performance)
- Production (Large - 200+ users):
- Storage: 2 TB+
- Performance Mode: General Purpose or Max I/O (for high concurrency)
- Throughput Mode: Provisioned
Selection Criteria: - EFS storage scales automatically (pay for what you use) - Use General Purpose performance mode for most workloads - Use Max I/O performance mode for applications with high levels of aggregate throughput and operations per second - Enable lifecycle management to move infrequently accessed files to Infrequent Access (IA) storage class
Amazon EBS (EC2 Deployments)¶
- Development/Testing:
- Volume Size: 50-100 GB
- Volume Type: gp3
- IOPS: 3,000 (gp3 baseline)
- Production (Small - 10-50 users):
- Volume Size: 100-200 GB
- Volume Type: gp3
- IOPS: 3,000-6,000
- Production (Medium - 50-200 users):
- Volume Size: 200-500 GB
- Volume Type: gp3 or io1/io2
- IOPS: 6,000-10,000
- Production (Large - 200+ users):
- Volume Size: 500 GB - 1 TB
- Volume Type: gp3 or io1/io2
- IOPS: 10,000+
Selection Criteria: - Use gp3 for most workloads (cost-effective, good performance) - Use io1/io2 for workloads requiring consistent IOPS (database workloads, high-transaction applications) - Enable encryption at rest for all EBS volumes - Monitor CloudWatch metrics (VolumeReadOps, VolumeWriteOps, VolumeQueueLength)
Amazon S3¶
- Development/Testing:
- Storage: 10-50 GB
- Storage Class: Standard
- Production:
- Storage: Varies based on backup retention and file storage needs
- Storage Class: Standard (frequently accessed), Intelligent-Tiering (unknown access patterns), Standard-IA (infrequently accessed)
Selection Criteria: - Use Standard storage class for frequently accessed data - Use Intelligent-Tiering for data with unknown or changing access patterns - Use Standard-IA or Glacier for backups and archival data - Enable lifecycle policies to automatically transition objects to appropriate storage classes
Network Resource Sizing¶
Application Load Balancer (ALB)¶
- Type: Application Load Balancer
- Sizing: Single ALB per environment (can handle high traffic volumes)
- Configuration:
- Scheme: Internet-facing
- IP Address Type: IPv4
- Subnets: Public subnets across multiple availability zones
- Security Groups: Allow HTTPS (443) and HTTP (80) from internet
Selection Criteria: - ALB automatically scales to handle traffic (no size selection needed) - Deploy ALB in at least 2 availability zones for high availability - Use AWS WAF in front of ALB for additional protection (optional)
NAT Gateway¶
- Type: NAT Gateway
- Sizing: One NAT Gateway per availability zone (for high availability)
- Configuration:
- Elastic IP: Required (one per NAT Gateway)
- Subnets: Public subnets
Selection Criteria: - Deploy one NAT Gateway per availability zone for high availability (recommended for production) - Single NAT Gateway is sufficient for development/testing environments - NAT Gateway automatically scales (no size selection needed)
VPC and Subnets¶
- VPC CIDR:
/16(e.g., 10.0.0.0/16) - provides 65,536 IP addresses - Public Subnets:
/24per availability zone (e.g., 10.0.1.0/24, 10.0.2.0/24) - 256 IP addresses each - Private Subnets:
/24per availability zone (e.g., 10.0.10.0/24, 10.0.11.0/24) - 256 IP addresses each
Selection Criteria:
- Use /16 CIDR for VPC to allow for future growth
- Allocate /24 subnets per availability zone (sufficient for most deployments)
- Ensure subnets don't overlap with existing network ranges
Resource Sizing Summary Table¶
| Resource | Development | Small Production | Medium Production | Large Production |
|---|---|---|---|---|
| EKS Nodes | m5.large (2-3 nodes) | m5.xlarge (3 nodes) | m5.2xlarge (3-5 nodes) | m5.4xlarge+ (5+ nodes) |
| ECS Tasks | 1 vCPU, 2 GB | 2 vCPU, 4 GB | 4 vCPU, 8 GB | 4-8 vCPU, 8-16 GB |
| EC2 Instance | t3.xlarge | t3.2xlarge | m5.2xlarge | m5.4xlarge+ |
| RDS Instance | db.t3.medium | db.t3.xlarge | db.m5.xlarge | db.m5.2xlarge+ |
| RDS Storage | 20-50 GB | 100 GB | 200-500 GB | 500 GB - 1 TB |
| EFS/EBS Storage | 50-100 GB | 100-500 GB | 500 GB - 2 TB | 2 TB+ |
Automated Provisioning Scripts¶
Terraform and CloudFormation templates are available in the detailed installation guides that automate the provisioning of all required resources with recommended sizing:
- EKS Deployment: See EKS Installation Guide for Terraform templates and Kubernetes manifests
- ECS Fargate Deployment: See Fargate Installation Guide for CloudFormation templates and ECS task definitions
- EC2 Deployment: See EC2 Installation Guide for Terraform templates and AWS CLI scripts
These scripts include: - Pre-configured resource sizes based on deployment type (development, production) - Security best practices (encryption, security groups, IAM roles) - High availability configurations (Multi-AZ, auto-scaling) - Cost optimization settings
Using Automated Scripts:
1. Review the Terraform/CloudFormation templates in the respective installation guides
2. Customize variables (instance types, storage sizes) based on your requirements
3. Deploy using terraform apply or CloudFormation console/CLI
4. Monitor resource utilization and adjust sizes as needed
Manual Provisioning: If provisioning manually, follow the sizing guidance above and refer to the Technical Prerequisites and Requirements section for minimum requirements.
Monitoring and Right-Sizing¶
After deployment, monitor resource utilization using AWS CloudWatch and adjust sizes as needed:
- CPU Utilization: Target 40-70% average utilization
- Memory Utilization: Target 60-80% average utilization
- Storage Utilization: Monitor and scale before reaching 80% capacity
- Network Throughput: Monitor data transfer and optimize if needed
Use AWS Cost Explorer and CloudWatch metrics to identify opportunities for right-sizing and cost optimization.
Step-by-Step Deployment Instructions¶
This section provides step-by-step instructions for deploying Corridor on AWS according to the typical deployment architecture described in the Typical Customer Deployment Overview section. The deployment follows a multi-AZ production architecture with high availability.
Step 1: Provision Network Infrastructure¶
1.1 Create VPC and Subnets
# Create VPC
aws ec2 create-vpc --cidr-block 10.0.0.0/16 --tag-specifications 'ResourceType=vpc,Tags=[{Key=Name,Value=corridor-vpc}]'
# Create public subnets (one per availability zone)
aws ec2 create-subnet --vpc-id <vpc-id> --cidr-block 10.0.1.0/24 --availability-zone us-east-1a
aws ec2 create-subnet --vpc-id <vpc-id> --cidr-block 10.0.2.0/24 --availability-zone us-east-1b
# Create private subnets (one per availability zone)
aws ec2 create-subnet --vpc-id <vpc-id> --cidr-block 10.0.10.0/24 --availability-zone us-east-1a
aws ec2 create-subnet --vpc-id <vpc-id> --cidr-block 10.0.11.0/24 --availability-zone us-east-1b
1.2 Create Internet Gateway and NAT Gateway
# Create and attach Internet Gateway
aws ec2 create-internet-gateway --tag-specifications 'ResourceType=internet-gateway,Tags=[{Key=Name,Value=corridor-igw}]'
aws ec2 attach-internet-gateway --internet-gateway-id <igw-id> --vpc-id <vpc-id>
# Allocate Elastic IP for NAT Gateway
aws ec2 allocate-address --domain vpc
# Create NAT Gateway in public subnet
aws ec2 create-nat-gateway --subnet-id <public-subnet-id> --allocation-id <eip-id>
1.3 Configure Route Tables
# Create route table for public subnets
aws ec2 create-route-table --vpc-id <vpc-id>
aws ec2 create-route --route-table-id <public-rt-id> --destination-cidr-block 0.0.0.0/0 --gateway-id <igw-id>
# Create route table for private subnets
aws ec2 create-route-table --vpc-id <vpc-id>
aws ec2 create-route --route-table-id <private-rt-id> --destination-cidr-block 0.0.0.0/0 --nat-gateway-id <nat-gw-id>
1.4 Create Security Groups
# Security group for Application Load Balancer
aws ec2 create-security-group --group-name corridor-alb-sg --description "Security group for ALB" --vpc-id <vpc-id>
aws ec2 authorize-security-group-ingress --group-id <alb-sg-id> --protocol tcp --port 443 --cidr 0.0.0.0/0
aws ec2 authorize-security-group-ingress --group-id <alb-sg-id> --protocol tcp --port 80 --cidr 0.0.0.0/0
# Security group for application components
aws ec2 create-security-group --group-name corridor-app-sg --description "Security group for application" --vpc-id <vpc-id>
aws ec2 authorize-security-group-ingress --group-id <app-sg-id> --protocol tcp --port 80 --source-group <alb-sg-id>
aws ec2 authorize-security-group-ingress --group-id <app-sg-id> --protocol tcp --port 443 --source-group <alb-sg-id>
# Security group for RDS database
aws ec2 create-security-group --group-name corridor-db-sg --description "Security group for RDS" --vpc-id <vpc-id>
aws ec2 authorize-security-group-ingress --group-id <db-sg-id> --protocol tcp --port 5432 --source-group <app-sg-id>
# Security group for EFS (if using EKS/Fargate)
aws ec2 create-security-group --group-name corridor-efs-sg --description "Security group for EFS" --vpc-id <vpc-id>
aws ec2 authorize-security-group-ingress --group-id <efs-sg-id> --protocol tcp --port 2049 --source-group <app-sg-id>
Step 2: Provision Database Infrastructure¶
2.1 Create DB Subnet Group
aws rds create-db-subnet-group \
--db-subnet-group-name corridor-db-subnet \
--db-subnet-group-description "Subnet group for Corridor RDS" \
--subnet-ids <private-subnet-1-id> <private-subnet-2-id>
2.2 Create RDS PostgreSQL Instance
aws rds create-db-instance \
--db-instance-identifier corridor-db \
--db-instance-class db.t3.xlarge \
--engine postgres \
--engine-version 14.9 \
--master-username corridor \
--master-user-password <secure-password> \
--allocated-storage 100 \
--storage-type gp3 \
--storage-encrypted \
--multi-az \
--db-name corridor \
--db-subnet-group-name corridor-db-subnet \
--vpc-security-group-ids <db-sg-id> \
--backup-retention-period 7 \
--enable-cloudwatch-logs-export postgresql
2.3 Store Database Credentials in Secrets Manager
aws secretsmanager create-secret \
--name corridor/database/credentials \
--secret-string '{"username":"corridor","password":"<secure-password>","host":"<rds-endpoint>","port":"5432","database":"corridor"}'
Step 3: Provision Storage Infrastructure¶
For EKS/Fargate Deployments:
3.1 Create EFS File System
# Create EFS file system
aws efs create-file-system \
--creation-token corridor-efs \
--performance-mode generalPurpose \
--throughput-mode bursting \
--encrypted
# Create mount targets in each private subnet
aws efs create-mount-target \
--file-system-id <efs-id> \
--subnet-id <private-subnet-1-id> \
--security-groups <efs-sg-id>
aws efs create-mount-target \
--file-system-id <efs-id> \
--subnet-id <private-subnet-2-id> \
--security-groups <efs-sg-id>
For EC2 Deployments:
3.2 Create EBS Volume (if needed)
aws ec2 create-volume \
--availability-zone us-east-1a \
--size 100 \
--volume-type gp3 \
--encrypted
3.3 Create S3 Bucket for Backups
aws s3 mb s3://corridor-backups-<unique-id> --region us-east-1
aws s3api put-bucket-encryption \
--bucket corridor-backups-<unique-id> \
--server-side-encryption-configuration '{"Rules":[{"ApplyServerSideEncryptionByDefault":{"SSEAlgorithm":"AES256"}}]}'
Step 4: Provision Compute Infrastructure¶
For EKS Deployment:
4.1 Create EKS Cluster
# Create IAM role for EKS cluster
aws iam create-role --role-name corridor-eks-cluster-role \
--assume-role-policy-document file://eks-cluster-trust-policy.json
# Attach required policies
aws iam attach-role-policy --role-name corridor-eks-cluster-role \
--policy-arn arn:aws:iam::aws:policy/AmazonEKSClusterPolicy
# Create EKS cluster
aws eks create-cluster \
--name corridor-cluster \
--role-arn arn:aws:iam::<account-id>:role/corridor-eks-cluster-role \
--resources-vpc-config subnetIds=<private-subnet-1-id>,<private-subnet-2-id>,endpointPrivateAccess=true,endpointPublicAccess=false \
--version 1.28
# Create node group
aws eks create-nodegroup \
--cluster-name corridor-cluster \
--nodegroup-name corridor-nodes \
--node-role arn:aws:iam::<account-id>:role/corridor-eks-node-role \
--subnets <private-subnet-1-id> <private-subnet-2-id> \
--instance-types m5.xlarge \
--scaling-config minSize=3,maxSize=10,desiredSize=3
For ECS Fargate Deployment:
4.2 Create ECS Cluster
For EC2 Deployment:
4.3 Launch EC2 Instance
aws ec2 run-instances \
--image-id ami-<latest-amazon-linux-2> \
--instance-type t3.2xlarge \
--subnet-id <private-subnet-id> \
--security-group-ids <app-sg-id> \
--iam-instance-profile Name=corridor-instance-profile \
--user-data file://install-corridor.sh \
--block-device-mappings '[{"DeviceName":"/dev/xvda","Ebs":{"VolumeSize":100,"VolumeType":"gp3","Encrypted":true}}]'
Step 5: Create Application Load Balancer¶
5.1 Create ALB
aws elbv2 create-load-balancer \
--name corridor-alb \
--subnets <public-subnet-1-id> <public-subnet-2-id> \
--security-groups <alb-sg-id> \
--scheme internet-facing \
--type application \
--ip-address-type ipv4
5.2 Create Target Group
aws elbv2 create-target-group \
--name corridor-targets \
--protocol HTTP \
--port 80 \
--vpc-id <vpc-id> \
--health-check-path /health \
--health-check-interval-seconds 30 \
--health-check-timeout-seconds 5 \
--healthy-threshold-count 2 \
--unhealthy-threshold-count 3
5.3 Create Listener
# Create HTTPS listener (requires SSL certificate)
aws elbv2 create-listener \
--load-balancer-arn <alb-arn> \
--protocol HTTPS \
--port 443 \
--certificates CertificateArn=<acm-certificate-arn> \
--default-actions Type=forward,TargetGroupArn=<target-group-arn>
# Create HTTP listener (redirects to HTTPS)
aws elbv2 create-listener \
--load-balancer-arn <alb-arn> \
--protocol HTTP \
--port 80 \
--default-actions Type=redirect,RedirectConfig='{Protocol=HTTPS,Port=443,StatusCode=HTTP_301}'
Step 6: Deploy Corridor Application¶
For EKS Deployment:
6.1 Install Cluster Components
# Install AWS Load Balancer Controller
kubectl apply -f https://github.com/kubernetes-sigs/aws-load-balancer-controller/releases/download/v2.7.0/v2_7_0_full.yaml
# Install cert-manager
kubectl apply -f https://github.com/cert-manager/cert-manager/releases/download/v1.13.0/cert-manager.yaml
# Install EFS CSI Driver
kubectl apply -k "github.com/kubernetes-sigs/aws-efs-csi-driver/deploy/kubernetes/overlays/stable/?ref=release-1.7"
6.2 Deploy Corridor
# Create namespace
kubectl create namespace corridor
# Create secrets for database credentials
kubectl create secret generic corridor-db-credentials \
--from-literal=username=corridor \
--from-literal=password=<password> \
--from-literal=host=<rds-endpoint> \
--namespace corridor
# Apply Kubernetes manifests
kubectl apply -f corridor-manifests.yaml -n corridor
For ECS Fargate Deployment:
6.3 Create Task Definition and Service
# Register task definition
aws ecs register-task-definition --cli-input-json file://corridor-task-definition.json
# Create ECS service
aws ecs create-service \
--cluster corridor-cluster \
--service-name corridor-service \
--task-definition corridor-task \
--desired-count 2 \
--launch-type FARGATE \
--network-configuration "awsvpcConfiguration={subnets=[<private-subnet-1-id>,<private-subnet-2-id>],securityGroups=[<app-sg-id>],assignPublicIp=DISABLED}" \
--load-balancers targetGroupArn=<target-group-arn>,containerName=corridor-app,containerPort=80
For EC2 Deployment:
6.4 Install Corridor Components
# SSH into EC2 instance
ssh -i <key-pair>.pem ec2-user@<ec2-instance-ip>
# Install system dependencies
sudo yum update -y
sudo yum install -y python3.11 python3.11-devel java-1.8.0-openjdk redis nginx
# Extract and install Corridor bundle
cd /tmp
unzip corridor-bundle.zip
sudo ./corridor-bundle/install app -i /opt/corridor
sudo ./corridor-bundle/install api -i /opt/corridor
sudo ./corridor-bundle/install worker-api -i /opt/corridor
sudo ./corridor-bundle/install worker-spark -i /opt/corridor
sudo ./corridor-bundle/install jupyter -i /opt/corridor
# Configure database connection
sudo vi /opt/corridor/instances/default/config/api_config.py
# Update SQLALCHEMY_DATABASE_URI with RDS endpoint
# Initialize database
/opt/corridor/venv-api/bin/corridor-api db upgrade
# Start services
sudo systemctl start corridor-app corridor-api corridor-worker-api corridor-worker-spark corridor-jupyter
sudo systemctl enable corridor-app corridor-api corridor-worker-api corridor-worker-spark corridor-jupyter
Step 7: Configure DNS and SSL¶
7.1 Configure DNS
# If using Route 53
aws route53 change-resource-record-sets \
--hosted-zone-id <hosted-zone-id> \
--change-batch '{
"Changes": [{
"Action": "UPSERT",
"ResourceRecordSet": {
"Name": "corridor.example.com",
"Type": "A",
"AliasTarget": {
"HostedZoneId": "<alb-hosted-zone-id>",
"DNSName": "<alb-dns-name>",
"EvaluateTargetHealth": true
}
}
}]
}'
# If using external DNS provider, create A record pointing to ALB DNS name
7.2 Verify SSL Certificate
# For EKS with cert-manager, check certificate status
kubectl get certificate -n corridor
# For ACM certificates, verify in AWS Console or CLI
aws acm describe-certificate --certificate-arn <certificate-arn>
Step 8: Post-Deployment Configuration¶
8.1 Initialize Admin User
# Access Corridor web interface via HTTPS
# Navigate to https://corridor.example.com
# Follow initial setup wizard to create admin user
8.2 Configure Monitoring
# Enable CloudWatch Container Insights (for EKS/ECS)
aws eks update-cluster-config --name corridor-cluster --logging '{"enable":[{"types":["api","audit","authenticator","controllerManager","scheduler"]}]}'
# Create CloudWatch alarms for key metrics
aws cloudwatch put-metric-alarm \
--alarm-name corridor-high-cpu \
--alarm-description "Alert when CPU exceeds 80%" \
--metric-name CPUUtilization \
--namespace AWS/ECS \
--statistic Average \
--period 300 \
--threshold 80 \
--comparison-operator GreaterThanThreshold
8.3 Verify Deployment
# Check application health endpoint
curl https://corridor.example.com/health
# Verify all services are running
# For EKS: kubectl get pods -n corridor
# For ECS: aws ecs describe-services --cluster corridor-cluster --services corridor-service
# For EC2: sudo systemctl status corridor-app corridor-api
Testing and Troubleshooting¶
This section provides prescriptive guidance for testing your Corridor deployment and troubleshooting common issues.
Pre-Deployment Testing¶
Before deploying to production, test the deployment in a development or staging environment:
1. Infrastructure Testing
# Verify VPC connectivity
aws ec2 describe-vpcs --vpc-ids <vpc-id>
# Test security group rules
aws ec2 describe-security-groups --group-ids <sg-id>
# Verify NAT Gateway is working
# From an instance in private subnet, test internet connectivity
ping 8.8.8.8
2. Database Connectivity Testing
# Test RDS connectivity from application subnet
psql -h <rds-endpoint> -U corridor -d corridor
# Verify database credentials from Secrets Manager
aws secretsmanager get-secret-value --secret-id corridor/database/credentials
3. Storage Testing
# For EFS: Test mount from application instance
sudo mount -t efs <efs-id>:/ /mnt/efs
echo "test" > /mnt/efs/test.txt
cat /mnt/efs/test.txt
# For S3: Test bucket access
aws s3 ls s3://corridor-backups-<unique-id>/
aws s3 cp test.txt s3://corridor-backups-<unique-id>/
Post-Deployment Testing¶
1. Application Health Checks
# Test health endpoint
curl https://corridor.example.com/health
# Expected response: {"status": "healthy", "version": "x.x.x"}
# Test API endpoint
curl https://corridor.example.com/api/v1/status
# Test web interface
curl -I https://corridor.example.com
2. Component Testing
For EKS:
# Check pod status
kubectl get pods -n corridor
# Check pod logs
kubectl logs -n corridor deploy/corridor-app --tail=100
# Test pod connectivity
kubectl exec -it -n corridor deploy/corridor-app -- curl http://localhost:8000/health
For ECS:
# Check task status
aws ecs describe-tasks --cluster corridor-cluster --tasks <task-id>
# Check task logs
aws logs tail /ecs/corridor --follow
# Test task connectivity
aws ecs execute-command --cluster corridor-cluster --task <task-id> --container corridor-app --command "/bin/sh"
For EC2:
# Check service status
sudo systemctl status corridor-app corridor-api corridor-worker-api
# Check service logs
sudo journalctl -u corridor-app -n 100
sudo tail -f /opt/corridor/logs/app.log
# Test service connectivity
curl http://localhost:8000/health
3. Load Balancer Testing
# Check ALB target health
aws elbv2 describe-target-health --target-group-arn <target-group-arn>
# Test ALB routing
curl -H "Host: corridor.example.com" http://<alb-dns-name>/health
# Verify SSL certificate
openssl s_client -connect corridor.example.com:443 -servername corridor.example.com
4. Database Testing
# Test database connectivity from application
# Connect to application pod/instance and test database connection
python3 -c "import psycopg2; conn = psycopg2.connect(host='<rds-endpoint>', user='corridor', password='<password>', dbname='corridor'); print('Connected successfully')"
# Check database tables
psql -h <rds-endpoint> -U corridor -d corridor -c "\dt"
Common Issues and Troubleshooting¶
Issue 1: Application Not Accessible
Symptoms: Cannot access application via HTTPS URL
Troubleshooting Steps:
-
Verify DNS Configuration
-
Check Load Balancer Status
-
Verify Security Groups
-
Check SSL Certificate
Issue 2: Database Connection Failures
Symptoms: Application cannot connect to RDS database
Troubleshooting Steps:
-
Verify Database Endpoint
-
Check Security Groups
-
Verify Credentials
-
Check Database Status
Issue 3: Pod/Container Failures (EKS/ECS)
Symptoms: Pods or containers are crashing or not starting
Troubleshooting Steps:
For EKS:
# Check pod status
kubectl get pods -n corridor
# Describe pod for events
kubectl describe pod <pod-name> -n corridor
# Check pod logs
kubectl logs <pod-name> -n corridor --previous
# Check resource limits
kubectl top pod <pod-name> -n corridor
For ECS:
# Check task status
aws ecs describe-tasks --cluster corridor-cluster --tasks <task-id>
# Check task logs
aws logs tail /ecs/corridor --follow
# Check task definition
aws ecs describe-task-definition --task-definition corridor-task
Issue 4: Storage Mount Failures
Symptoms: EFS or EBS volumes not mounting correctly
Troubleshooting Steps:
-
For EFS:
-
For EBS:
Issue 5: High CPU or Memory Usage
Symptoms: Application performance degradation, timeouts
Troubleshooting Steps:
-
Monitor Resource Usage
# For EKS kubectl top nodes kubectl top pods -n corridor # For ECS aws cloudwatch get-metric-statistics \ --namespace AWS/ECS \ --metric-name CPUUtilization \ --dimensions Name=ServiceName,Value=corridor-service \ --start-time <start-time> \ --end-time <end-time> \ --period 300 \ --statistics Average -
Check Application Logs
-
Scale Resources
Getting Additional Help¶
If you encounter issues not covered in this guide:
- Check Application Logs: Review application logs for error messages
- Review AWS CloudWatch: Check CloudWatch metrics and logs for infrastructure issues
- Contact Corridor Support: Reach out to Corridor support with:
- Deployment method (EKS/ECS/EC2)
- Error messages and logs
- Steps to reproduce the issue
- AWS region and resource IDs
- AWS Support: For AWS infrastructure issues, contact AWS Support
Health Monitoring and Assessment¶
This section provides step-by-step instructions for assessing and monitoring the health and proper function of the Corridor application deployed on AWS. Regular health monitoring ensures the application is operating correctly and helps identify issues before they impact users.
Step 1: Configure Application Health Checks¶
1.1 Application Health Endpoint
The Corridor application exposes a health endpoint at /health that returns the application status:
# Test health endpoint
curl https://corridor.example.com/health
# Expected response:
# {"status": "healthy", "version": "x.x.x", "timestamp": "2024-01-01T00:00:00Z"}
1.2 Configure Load Balancer Health Checks
For Application Load Balancer:
# Update target group health check settings
aws elbv2 modify-target-group \
--target-group-arn <target-group-arn> \
--health-check-path /health \
--health-check-interval-seconds 30 \
--health-check-timeout-seconds 5 \
--healthy-threshold-count 2 \
--unhealthy-threshold-count 3 \
--health-check-protocol HTTP \
--health-check-port 80
# Verify health check configuration
aws elbv2 describe-target-groups --target-group-arns <target-group-arn>
For EKS Deployments:
# Add health check to Kubernetes deployment
apiVersion: apps/v1
kind: Deployment
metadata:
name: corridor-app
spec:
template:
spec:
containers:
- name: corridor-app
livenessProbe:
httpGet:
path: /health
port: 5002
initialDelaySeconds: 60
periodSeconds: 30
timeoutSeconds: 5
failureThreshold: 3
readinessProbe:
httpGet:
path: /health
port: 5002
initialDelaySeconds: 30
periodSeconds: 10
timeoutSeconds: 5
failureThreshold: 3
For ECS Deployments:
{
"healthCheck": {
"command": ["CMD-SHELL", "curl -f http://localhost:5002/health || exit 1"],
"interval": 30,
"timeout": 5,
"retries": 3,
"startPeriod": 60
}
}
1.3 Verify Health Check Status
# Check target health status
aws elbv2 describe-target-health --target-group-arn <target-group-arn>
# Expected output shows "healthy" status for all targets
Step 2: Set Up CloudWatch Metrics and Alarms¶
2.1 Enable CloudWatch Container Insights (EKS/ECS)
For EKS:
# Enable Container Insights
aws eks update-cluster-config \
--name corridor-cluster \
--logging '{
"enable": [
{"types": ["api", "audit", "authenticator", "controllerManager", "scheduler"]}
]
}'
# Install CloudWatch Container Insights
kubectl apply -f https://raw.githubusercontent.com/aws-samples/amazon-cloudwatch-container-insights/latest/k8s-deployment-manifest-templates/deployment-mode/daemonset/container-insights-monitoring/quickstart/cwagent-fluentd-quickstart.yaml
For ECS:
# Container Insights is automatically enabled for ECS Fargate
# Verify in ECS console: Cluster → Monitoring → Container Insights
2.2 Create CloudWatch Alarms for Application Health
Alarm 1: Application Health Check Failures
aws cloudwatch put-metric-alarm \
--alarm-name corridor-health-check-failures \
--alarm-description "Alert when application health checks fail" \
--metric-name HealthyHostCount \
--namespace AWS/ApplicationELB \
--statistic Minimum \
--period 60 \
--threshold 1 \
--comparison-operator LessThanThreshold \
--evaluation-periods 2 \
--alarm-actions arn:aws:sns:us-east-1:<account-id>:corridor-alerts
Alarm 2: High CPU Utilization
aws cloudwatch put-metric-alarm \
--alarm-name corridor-high-cpu \
--alarm-description "Alert when CPU exceeds 80%" \
--metric-name CPUUtilization \
--namespace AWS/ECS \
--statistic Average \
--period 300 \
--threshold 80 \
--comparison-operator GreaterThanThreshold \
--evaluation-periods 2 \
--dimensions Name=ServiceName,Value=corridor-service Name=ClusterName,Value=corridor-cluster
Alarm 3: High Memory Utilization
aws cloudwatch put-metric-alarm \
--alarm-name corridor-high-memory \
--alarm-description "Alert when memory exceeds 85%" \
--metric-name MemoryUtilization \
--namespace AWS/ECS \
--statistic Average \
--period 300 \
--threshold 85 \
--comparison-operator GreaterThanThreshold \
--evaluation-periods 2 \
--dimensions Name=ServiceName,Value=corridor-service Name=ClusterName,Value=corridor-cluster
Alarm 4: Database Connection Errors
aws cloudwatch put-metric-alarm \
--alarm-name corridor-db-connection-errors \
--alarm-description "Alert on database connection errors" \
--metric-name DatabaseConnections \
--namespace AWS/RDS \
--statistic Average \
--period 300 \
--threshold 80 \
--comparison-operator GreaterThanThreshold \
--evaluation-periods 1 \
--dimensions Name=DBInstanceIdentifier,Value=corridor-db
2.3 Create SNS Topic for Alarms
# Create SNS topic for alerts
aws sns create-topic --name corridor-alerts
# Subscribe email to topic
aws sns subscribe \
--topic-arn arn:aws:sns:us-east-1:<account-id>:corridor-alerts \
--protocol email \
--notification-endpoint admin@example.com
Step 3: Monitor Application Logs¶
3.1 Configure Log Aggregation
For EKS:
# Verify Fluentd/CloudWatch Logs integration
kubectl get daemonset -n amazon-cloudwatch
# Check log streams
aws logs describe-log-streams --log-group-name /aws/eks/corridor-cluster/cluster
For ECS:
# Logs are automatically sent to CloudWatch Logs
# Check log groups
aws logs describe-log-groups --log-group-name-prefix /ecs/corridor
For EC2:
# Install CloudWatch Agent
wget https://s3.amazonaws.com/amazoncloudwatch-agent/amazon_linux/amd64/latest/amazon-cloudwatch-agent.rpm
sudo rpm -U ./amazon-cloudwatch-agent.rpm
# Configure CloudWatch Agent
sudo /opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent-config-wizard
3.2 Monitor Application Logs
# View recent application logs
aws logs tail /ecs/corridor --follow
# Filter for errors
aws logs filter-log-events \
--log-group-name /ecs/corridor \
--filter-pattern "ERROR" \
--start-time $(date -d '1 hour ago' +%s)000
# Search for specific patterns
aws logs filter-log-events \
--log-group-name /ecs/corridor \
--filter-pattern "database connection" \
--start-time $(date -d '24 hours ago' +%s)000
Step 4: Monitor Database Health¶
4.1 Check RDS Instance Status
# Check database instance status
aws rds describe-db-instances \
--db-instance-identifier corridor-db \
--query 'DBInstances[0].[DBInstanceStatus,DBInstanceClass,MultiAZ,BackupRetentionPeriod]'
# Expected output:
# - DBInstanceStatus: "available"
# - MultiAZ: true (for production)
# - BackupRetentionPeriod: 7 (or higher)
4.2 Monitor Database Metrics
# Check database CPU utilization
aws cloudwatch get-metric-statistics \
--namespace AWS/RDS \
--metric-name CPUUtilization \
--dimensions Name=DBInstanceIdentifier,Value=corridor-db \
--start-time $(date -u -d '1 hour ago' +%Y-%m-%dT%H:%M:%S) \
--end-time $(date -u +%Y-%m-%dT%H:%M:%S) \
--period 300 \
--statistics Average
# Check database connections
aws cloudwatch get-metric-statistics \
--namespace AWS/RDS \
--metric-name DatabaseConnections \
--dimensions Name=DBInstanceIdentifier,Value=corridor-db \
--start-time $(date -u -d '1 hour ago' +%Y-%m-%dT%H:%M:%S) \
--end-time $(date -u +%Y-%m-%dT%H:%M:%S) \
--period 300 \
--statistics Average,Maximum
4.3 Test Database Connectivity
# Test database connection from application
# For EKS:
kubectl exec -it -n corridor deploy/corridor-app -- \
python3 -c "import psycopg2; conn = psycopg2.connect(host='<rds-endpoint>', user='corridor', password='<password>', dbname='corridor'); print('Connected successfully')"
# For ECS:
aws ecs execute-command \
--cluster corridor-cluster \
--task <task-id> \
--container corridor-app \
--command "python3 -c \"import psycopg2; conn = psycopg2.connect(host='<rds-endpoint>', user='corridor', password='<password>', dbname='corridor'); print('Connected successfully')\""
# For EC2:
psql -h <rds-endpoint> -U corridor -d corridor -c "SELECT version();"
4.4 Check Database Backup Status
# List recent backups
aws rds describe-db-snapshots \
--db-instance-identifier corridor-db \
--query 'DBSnapshots[*].[DBSnapshotIdentifier,Status,SnapshotCreateTime]' \
--output table
# Verify automated backups are enabled
aws rds describe-db-instances \
--db-instance-identifier corridor-db \
--query 'DBInstances[0].[BackupRetentionPeriod,AutomatedBackups]'
Step 5: Monitor Storage Health¶
5.1 Monitor EFS File System (EKS/Fargate)
# Check EFS file system status
aws efs describe-file-systems --file-system-id <efs-id>
# Check mount target status
aws efs describe-mount-targets --file-system-id <efs-id>
# Monitor EFS metrics
aws cloudwatch get-metric-statistics \
--namespace AWS/EFS \
--metric-name StorageBytes \
--dimensions Name=FileSystemId,Value=<efs-id> \
--start-time $(date -u -d '1 hour ago' +%Y-%m-%dT%H:%M:%S) \
--end-time $(date -u +%Y-%m-%dT%H:%M:%S) \
--period 300 \
--statistics Average
5.2 Monitor EBS Volumes (EC2)
# Check EBS volume status
aws ec2 describe-volumes \
--volume-ids <volume-id> \
--query 'Volumes[0].[State,Size,VolumeType,Iops]'
# Monitor EBS metrics
aws cloudwatch get-metric-statistics \
--namespace AWS/EBS \
--metric-name VolumeReadOps \
--dimensions Name=VolumeId,Value=<volume-id> \
--start-time $(date -u -d '1 hour ago' +%Y-%m-%dT%H:%M:%S) \
--end-time $(date -u +%Y-%m-%dT%H:%M:%S) \
--period 300 \
--statistics Sum
5.3 Monitor S3 Bucket Health
# Check S3 bucket metrics
aws cloudwatch get-metric-statistics \
--namespace AWS/S3 \
--metric-name BucketSizeBytes \
--dimensions Name=BucketName,Value=corridor-backups Name=StorageType,Value=StandardStorage \
--start-time $(date -u -d '1 day ago' +%Y-%m-%dT%H:%M:%S) \
--end-time $(date -u +%Y-%m-%dT%H:%M:%S) \
--period 3600 \
--statistics Average
Step 6: Monitor Network Health¶
6.1 Monitor Load Balancer Health
# Check target health
aws elbv2 describe-target-health --target-group-arn <target-group-arn>
# Monitor ALB metrics
aws cloudwatch get-metric-statistics \
--namespace AWS/ApplicationELB \
--metric-name TargetResponseTime \
--dimensions Name=LoadBalancer,Value=<alb-arn-suffix> \
--start-time $(date -u -d '1 hour ago' +%Y-%m-%dT%H:%M:%S) \
--end-time $(date -u +%Y-%m-%dT%H:%M:%S) \
--period 300 \
--statistics Average
# Check HTTP error rates
aws cloudwatch get-metric-statistics \
--namespace AWS/ApplicationELB \
--metric-name HTTPCode_Target_5XX_Count \
--dimensions Name=LoadBalancer,Value=<alb-arn-suffix> \
--start-time $(date -u -d '1 hour ago' +%Y-%m-%dT%H:%M:%S) \
--end-time $(date -u +%Y-%m-%dT%H:%M:%S) \
--period 300 \
--statistics Sum
6.2 Monitor NAT Gateway Health
# Check NAT Gateway status
aws ec2 describe-nat-gateways --nat-gateway-ids <nat-gateway-id>
# Monitor NAT Gateway metrics
aws cloudwatch get-metric-statistics \
--namespace AWS/NATGateway \
--metric-name BytesOutToDestination \
--dimensions Name=NatGatewayId,Value=<nat-gateway-id> \
--start-time $(date -u -d '1 hour ago' +%Y-%m-%dT%H:%M:%S) \
--end-time $(date -u +%Y-%m-%dT%H:%M:%S) \
--period 300 \
--statistics Sum
Step 7: Perform Regular Health Assessments¶
7.1 Daily Health Check Routine
Perform these checks daily:
# 1. Check application health endpoint
curl https://corridor.example.com/health
# 2. Verify all targets are healthy
aws elbv2 describe-target-health --target-group-arn <target-group-arn> | grep -c "healthy"
# 3. Check for recent errors in logs
aws logs filter-log-events \
--log-group-name /ecs/corridor \
--filter-pattern "ERROR" \
--start-time $(date -d '24 hours ago' +%s)000
# 4. Verify database is available
aws rds describe-db-instances --db-instance-identifier corridor-db --query 'DBInstances[0].DBInstanceStatus'
# 5. Check CloudWatch alarms
aws cloudwatch describe-alarms --alarm-names corridor-health-check-failures corridor-high-cpu corridor-high-memory
7.2 Weekly Health Assessment
Perform these checks weekly:
# 1. Review CloudWatch metrics trends
# Check CPU and memory trends over the past week
aws cloudwatch get-metric-statistics \
--namespace AWS/ECS \
--metric-name CPUUtilization \
--dimensions Name=ServiceName,Value=corridor-service \
--start-time $(date -u -d '7 days ago' +%Y-%m-%dT%H:%M:%S) \
--end-time $(date -u +%Y-%m-%dT%H:%M:%S) \
--period 3600 \
--statistics Average,Maximum
# 2. Review database performance
aws cloudwatch get-metric-statistics \
--namespace AWS/RDS \
--metric-name ReadLatency \
--dimensions Name=DBInstanceIdentifier,Value=corridor-db \
--start-time $(date -u -d '7 days ago' +%Y-%m-%dT%H:%M:%S) \
--end-time $(date -u +%Y-%m-%dT%H:%M:%S) \
--period 3600 \
--statistics Average,Maximum
# 3. Verify backup completion
aws rds describe-db-snapshots \
--db-instance-identifier corridor-db \
--snapshot-type automated \
--query 'DBSnapshots[0].[DBSnapshotIdentifier,SnapshotCreateTime]'
# 4. Review storage usage trends
aws cloudwatch get-metric-statistics \
--namespace AWS/EFS \
--metric-name StorageBytes \
--dimensions Name=FileSystemId,Value=<efs-id> \
--start-time $(date -u -d '7 days ago' +%Y-%m-%dT%H:%M:%S) \
--end-time $(date -u +%Y-%m-%dT%H:%M:%S) \
--period 3600 \
--statistics Average,Maximum
7.3 Monthly Health Review
Perform these checks monthly:
# 1. Review cost trends
aws ce get-cost-and-usage \
--time-period Start=$(date -u -d '1 month ago' +%Y-%m-01),End=$(date -u +%Y-%m-01) \
--granularity MONTHLY \
--metrics BlendedCost
# 2. Review security group rules
aws ec2 describe-security-groups --group-ids <sg-id>
# 3. Review IAM access patterns
aws cloudtrail lookup-events \
--lookup-attributes AttributeKey=EventName,AttributeValue=AssumeRole \
--start-time $(date -u -d '1 month ago' +%Y-%m-%dT%H:%M:%S) \
--max-results 100
# 4. Review application performance trends
# Compare response times, error rates, and resource utilization
Step 8: Create Health Monitoring Dashboard¶
8.1 Create CloudWatch Dashboard
# Create dashboard JSON configuration
cat > dashboard.json <<EOF
{
"widgets": [
{
"type": "metric",
"properties": {
"metrics": [
["AWS/ApplicationELB", "HealthyHostCount", {"stat": "Average", "label": "Healthy Targets"}],
[".", "UnHealthyHostCount", {"stat": "Average", "label": "Unhealthy Targets"}]
],
"period": 300,
"stat": "Average",
"region": "us-east-1",
"title": "Load Balancer Health"
}
},
{
"type": "metric",
"properties": {
"metrics": [
["AWS/ECS", "CPUUtilization", {"stat": "Average", "label": "CPU"}],
[".", "MemoryUtilization", {"stat": "Average", "label": "Memory"}]
],
"period": 300,
"stat": "Average",
"region": "us-east-1",
"title": "Application Resources"
}
},
{
"type": "metric",
"properties": {
"metrics": [
["AWS/RDS", "CPUUtilization", {"stat": "Average", "label": "CPU"}],
[".", "DatabaseConnections", {"stat": "Average", "label": "Connections"}]
],
"period": 300,
"stat": "Average",
"region": "us-east-1",
"title": "Database Health"
}
}
]
}
EOF
# Create dashboard
aws cloudwatch put-dashboard \
--dashboard-name CorridorHealthDashboard \
--dashboard-body file://dashboard.json
8.2 Access Dashboard
# View dashboard URL
echo "https://console.aws.amazon.com/cloudwatch/home?region=us-east-1#dashboards:name=CorridorHealthDashboard"
Backup and Recovery¶
This section identifies the data stores and configurations that need to be backed up, and provides step-by-step instructions for backup and recovery procedures.
Data Stores and Configurations for Backup¶
The following data stores and configurations must be backed up to ensure complete recovery capability:
1. Metadata Database (Amazon RDS PostgreSQL)
- Data Type: Application metadata, user accounts, pipelines, prompts, models, RAG configurations, evaluation results, audit logs
- Backup Method: RDS automated backups and manual snapshots
- Backup Frequency:
- Automated backups: Daily (retention period: 7-35 days)
- Manual snapshots: Before major changes or weekly
- Storage Location: RDS backup storage (automated) or S3 (manual snapshots)
2. File Storage
- EKS/Fargate Deployments: Amazon EFS file system
- Data Type: Uploaded files, datasets, generated artifacts, temporary files
- Backup Method: EFS-to-EFS backup or EFS-to-S3 backup
- Backup Frequency: Daily incremental, weekly full
- EC2 Deployments: EBS volumes or local filesystem
- Data Type: Uploaded files, datasets, generated artifacts
- Backup Method: EBS snapshots or file system backups
- Backup Frequency: Daily snapshots
3. Application Configuration
- Data Type: Configuration files, environment variables, secrets
- Backup Method: Version control (Git) or S3 backups
- Backup Frequency: Before configuration changes
- Location:
- Kubernetes ConfigMaps/Secrets (EKS)
- ECS task definitions (ECS)
- Configuration files on EC2 instances
4. SSL/TLS Certificates
- Data Type: SSL certificates and private keys
- Backup Method: Export certificates from ACM or cert-manager
- Backup Frequency: Before certificate renewal
- Storage Location: Secure S3 bucket or secure file storage
5. IAM Roles and Policies
- Data Type: IAM role definitions, policies, trust relationships
- Backup Method: Infrastructure-as-Code (Terraform/CloudFormation) or AWS CLI export
- Backup Frequency: Before IAM changes
- Storage Location: Version control or S3
Step-by-Step Backup Procedures¶
Database Backup¶
Automated RDS Backups:
# Verify automated backups are enabled
aws rds describe-db-instances \
--db-instance-identifier corridor-db \
--query 'DBInstances[0].[BackupRetentionPeriod,AutomatedBackups]'
# List recent automated backups
aws rds describe-db-snapshots \
--db-instance-identifier corridor-db \
--snapshot-type automated \
--query 'DBSnapshots[*].[DBSnapshotIdentifier,SnapshotCreateTime,Status]' \
--output table
Manual Database Snapshot:
# Create manual snapshot
aws rds create-db-snapshot \
--db-snapshot-identifier corridor-db-manual-$(date +%Y%m%d-%H%M%S) \
--db-instance-identifier corridor-db
# Wait for snapshot completion
aws rds wait db-snapshot-completed \
--db-snapshot-identifier corridor-db-manual-$(date +%Y%m%d-%H%M%S)
# Copy snapshot to another region (for disaster recovery)
aws rds copy-db-snapshot \
--source-db-snapshot-identifier corridor-db-manual-$(date +%Y%m%d-%H%M%S) \
--target-db-snapshot-identifier corridor-db-backup-$(date +%Y%m%d-%H%M%S) \
--target-region us-west-2
Proprietary Database Export (Corridor API):
# For EKS deployments
kubectl exec -it -n corridor deploy/corridor-api -- \
/opt/corridor/venv-api/bin/corridor-api db export \
--output /tmp/corridor-backup-$(date +%Y%m%d).sqlite
# Copy export file to S3
kubectl cp corridor/<pod-name>:/tmp/corridor-backup-$(date +%Y%m%d).sqlite \
./corridor-backup-$(date +%Y%m%d).sqlite
aws s3 cp corridor-backup-$(date +%Y%m%d).sqlite \
s3://corridor-backups-<unique-id>/database/
# For EC2 deployments
/opt/corridor/venv-api/bin/corridor-api db export \
--output /opt/corridor/backups/corridor-backup-$(date +%Y%m%d).sqlite
aws s3 cp /opt/corridor/backups/corridor-backup-$(date +%Y%m%d).sqlite \
s3://corridor-backups-<unique-id>/database/
File Storage Backup¶
EFS Backup (EKS/Fargate):
# Create EFS backup using AWS Backup
aws backup create-backup-vault --backup-vault-name corridor-efs-backup-vault
aws backup start-backup-job \
--backup-vault-name corridor-efs-backup-vault \
--resource-arn arn:aws:elasticfilesystem:us-east-1:<account-id>:file-system/<efs-id> \
--iam-role-arn arn:aws:iam::<account-id>:role/service-role/AWSBackupDefaultServiceRole
# Or use DataSync to copy to S3
aws datasync create-location-efs \
--efs-filesystem-arn arn:aws:elasticfilesystem:us-east-1:<account-id>:file-system/<efs-id> \
--ec2-config SubnetArn=arn:aws:ec2:us-east-1:<account-id>:subnet/<subnet-id>,SecurityGroupArns=arn:aws:ec2:us-east-1:<account-id>:security-group/<sg-id>
aws datasync create-location-s3 \
--s3-bucket-arn arn:aws:s3:::corridor-backups-<unique-id> \
--s3-config BucketAccessRoleArn=arn:aws:iam::<account-id>:role/DataSyncS3Role
aws datasync create-task \
--source-location-arn <efs-location-arn> \
--destination-location-arn <s3-location-arn> \
--name corridor-efs-backup-task
EBS Snapshot Backup (EC2):
# Create EBS snapshot
aws ec2 create-snapshot \
--volume-id <volume-id> \
--description "Corridor EBS backup $(date +%Y%m%d-%H%M%S)" \
--tag-specifications 'ResourceType=snapshot,Tags=[{Key=Name,Value=corridor-ebs-backup}]'
# Copy snapshot to another region
aws ec2 copy-snapshot \
--source-region us-east-1 \
--source-snapshot-id <snapshot-id> \
--region us-west-2 \
--description "Corridor EBS backup copy"
Configuration Backup¶
Kubernetes ConfigMaps and Secrets (EKS):
# Backup all ConfigMaps
kubectl get configmaps -n corridor -o yaml > corridor-configmaps-backup-$(date +%Y%m%d).yaml
# Backup all Secrets (without sensitive data in output)
kubectl get secrets -n corridor -o yaml > corridor-secrets-backup-$(date +%Y%m%d).yaml
# Upload to S3
aws s3 cp corridor-configmaps-backup-$(date +%Y%m%d).yaml \
s3://corridor-backups-<unique-id>/config/
aws s3 cp corridor-secrets-backup-$(date +%Y%m%d).yaml \
s3://corridor-backups-<unique-id>/config/
ECS Task Definitions:
# Export task definition
aws ecs describe-task-definition \
--task-definition corridor-task \
--query 'taskDefinition' > corridor-task-definition-backup-$(date +%Y%m%d).json
aws s3 cp corridor-task-definition-backup-$(date +%Y%m%d).json \
s3://corridor-backups-<unique-id>/config/
EC2 Configuration Files:
# Backup configuration directory
tar -czf corridor-config-backup-$(date +%Y%m%d).tar.gz \
/opt/corridor/instances/default/config
aws s3 cp corridor-config-backup-$(date +%Y%m%d).tar.gz \
s3://corridor-backups-<unique-id>/config/
Step-by-Step Recovery Procedures¶
Database Recovery¶
Restore from RDS Snapshot:
# List available snapshots
aws rds describe-db-snapshots \
--db-instance-identifier corridor-db \
--query 'DBSnapshots[*].[DBSnapshotIdentifier,SnapshotCreateTime]' \
--output table
# Restore database from snapshot
aws rds restore-db-instance-from-db-snapshot \
--db-instance-identifier corridor-db-restored \
--db-snapshot-identifier corridor-db-manual-20240101-120000 \
--db-instance-class db.t3.xlarge
# Update application configuration to point to restored database
# For EKS: Update ConfigMap/Secret with new database endpoint
# For ECS: Update task definition with new database endpoint
# For EC2: Update configuration file with new database endpoint
Restore from Proprietary Export:
# Download export file from S3
aws s3 cp s3://corridor-backups-<unique-id>/database/corridor-backup-20240101.sqlite \
./corridor-backup-20240101.sqlite
# Import into database
# For EKS:
kubectl cp ./corridor-backup-20240101.sqlite \
corridor/<pod-name>:/tmp/corridor-backup-20240101.sqlite
kubectl exec -it -n corridor deploy/corridor-api -- \
/opt/corridor/venv-api/bin/corridor-api db import \
--input /tmp/corridor-backup-20240101.sqlite
# For EC2:
/opt/corridor/venv-api/bin/corridor-api db import \
--input ./corridor-backup-20240101.sqlite
File Storage Recovery¶
Restore EFS from Backup:
# Restore from AWS Backup
aws backup start-restore-job \
--recovery-point-arn <recovery-point-arn> \
--iam-role-arn arn:aws:iam::<account-id>:role/service-role/AWSBackupDefaultServiceRole \
--resource-type EFS \
--metadata file-system-id=<efs-id>,new-file-system=true
# Or restore from S3 using DataSync
aws datasync start-task-execution --task-arn <task-arn>
Restore EBS from Snapshot:
# Create volume from snapshot
aws ec2 create-volume \
--snapshot-id <snapshot-id> \
--availability-zone us-east-1a \
--volume-type gp3
# Attach volume to instance
aws ec2 attach-volume \
--volume-id <volume-id> \
--instance-id <instance-id> \
--device /dev/xvdf
# Mount volume on instance
sudo mount /dev/xvdf /mnt/restored-data
Configuration Recovery¶
Restore Kubernetes ConfigMaps/Secrets:
# Download from S3
aws s3 cp s3://corridor-backups-<unique-id>/config/corridor-configmaps-backup-20240101.yaml \
./corridor-configmaps-backup-20240101.yaml
# Apply restored ConfigMaps
kubectl apply -f corridor-configmaps-backup-20240101.yaml -n corridor
# Restart pods to pick up new configuration
kubectl rollout restart deployment -n corridor
Restore ECS Task Definition:
# Download from S3
aws s3 cp s3://corridor-backups-<unique-id>/config/corridor-task-definition-backup-20240101.json \
./corridor-task-definition-backup-20240101.json
# Register restored task definition
aws ecs register-task-definition \
--cli-input-json file://corridor-task-definition-backup-20240101.json
# Update service to use restored task definition
aws ecs update-service \
--cluster corridor-cluster \
--service corridor-service \
--task-definition corridor-task
Automated Backup Schedule¶
Recommended Backup Schedule:
- Database:
- Automated backups: Daily (retention: 7-35 days)
- Manual snapshots: Weekly or before major changes
- Cross-region copies: Monthly
- File Storage:
- EFS: Daily incremental, weekly full
- EBS: Daily snapshots
- Configuration:
- Before any configuration changes
- Weekly automated backups
Backup Verification:
# Verify backup completion
aws backup list-backup-jobs \
--by-state COMPLETED \
--max-results 10
# Test restore procedure quarterly
# Restore to test environment and verify data integrity
Credential and Key Rotation¶
This section provides step-by-step instructions for rotating programmatic system credentials and cryptographic keys used in Corridor deployments.
Rotating Database Credentials¶
Step 1: Create New Database User
# Connect to RDS database
psql -h <rds-endpoint> -U corridor -d corridor
# Create new user with temporary password
CREATE USER corridor_new WITH PASSWORD 'NewSecurePassword123!';
# Grant necessary permissions
GRANT ALL PRIVILEGES ON DATABASE corridor TO corridor_new;
GRANT ALL PRIVILEGES ON ALL TABLES IN SCHEMA public TO corridor_new;
GRANT ALL PRIVILEGES ON ALL SEQUENCES IN SCHEMA public TO corridor_new;
Step 2: Update Secrets Manager
# Create new secret version with updated credentials
aws secretsmanager put-secret-value \
--secret-id corridor/database/credentials \
--secret-string '{"username":"corridor_new","password":"NewSecurePassword123!","host":"<rds-endpoint>","port":"5432","database":"corridor"}'
Step 3: Update Application Configuration
For EKS:
# Update Kubernetes secret
kubectl create secret generic corridor-db-credentials \
--from-literal=username=corridor_new \
--from-literal=password=NewSecurePassword123! \
--from-literal=host=<rds-endpoint> \
--namespace corridor \
--dry-run=client -o yaml | kubectl apply -f -
# Restart pods to pick up new credentials
kubectl rollout restart deployment -n corridor
For ECS:
# Update task definition with new secret ARN (if using Secrets Manager integration)
# Task definition automatically retrieves latest secret version
aws ecs update-service \
--cluster corridor-cluster \
--service corridor-service \
--force-new-deployment
For EC2:
# Update configuration file
sudo vi /opt/corridor/instances/default/config/api_config.py
# Update SQLALCHEMY_DATABASE_URI with new credentials
# Restart services
sudo systemctl restart corridor-app corridor-api
Step 4: Verify Application Connectivity
# Test database connection
curl https://corridor.example.com/health
# Check application logs for connection errors
# For EKS: kubectl logs -n corridor deploy/corridor-app --tail=100
# For ECS: aws logs tail /ecs/corridor --follow
# For EC2: sudo journalctl -u corridor-app -n 100
Step 5: Remove Old Database User
# After verifying new credentials work (wait 24-48 hours)
psql -h <rds-endpoint> -U corridor_new -d corridor
# Revoke permissions from old user
REVOKE ALL PRIVILEGES ON DATABASE corridor FROM corridor;
REVOKE ALL PRIVILEGES ON ALL TABLES IN SCHEMA public FROM corridor;
# Drop old user
DROP USER corridor;
Rotating AWS Access Keys¶
Step 1: Create New Access Key
# Create new IAM user access key
aws iam create-access-key --user-name corridor-service-user
# Save new access key ID and secret access key securely
Step 2: Update Application Configuration
# Update AWS credentials in Secrets Manager
aws secretsmanager put-secret-value \
--secret-id corridor/aws/credentials \
--secret-string '{"access_key_id":"AKIAIOSFODNN7EXAMPLE","secret_access_key":"wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY"}'
Step 3: Deploy Updated Configuration
# Restart application to pick up new credentials
# For EKS: kubectl rollout restart deployment -n corridor
# For ECS: aws ecs update-service --cluster corridor-cluster --service corridor-service --force-new-deployment
# For EC2: sudo systemctl restart corridor-app
Step 4: Verify New Credentials Work
# Test AWS API calls with new credentials
aws s3 ls s3://corridor-backups-<unique-id>/ --profile new-credentials
Step 5: Deactivate Old Access Key
# Deactivate old access key
aws iam update-access-key \
--user-name corridor-service-user \
--access-key-id <old-access-key-id> \
--status Inactive
# After verification period (24-48 hours), delete old key
aws iam delete-access-key \
--user-name corridor-service-user \
--access-key-id <old-access-key-id>
Rotating SSL/TLS Certificates¶
For ACM Certificates:
# Request new certificate
aws acm request-certificate \
--domain-name corridor.example.com \
--validation-method DNS \
--region us-east-1
# Complete DNS validation
# Update ALB listener to use new certificate
aws elbv2 modify-listener \
--listener-arn <listener-arn> \
--certificates CertificateArn=<new-certificate-arn>
For cert-manager (EKS):
# Certificates are automatically renewed by cert-manager
# Verify certificate status
kubectl get certificate -n corridor
# Force renewal if needed
kubectl delete secret tls-corridor-cert -n corridor
# cert-manager will automatically create new certificate
Rotating Application API Keys¶
Step 1: Generate New API Key
# Generate new API key in Corridor application
# Navigate to: Settings → API Keys → Generate New Key
Step 2: Update External Integrations
# Update API key in external systems
# Update Secrets Manager if storing API keys there
aws secretsmanager put-secret-value \
--secret-id corridor/api-keys \
--secret-string '{"api_key":"new-api-key-value"}'
Step 3: Revoke Old API Key
Rotation Schedule Recommendations¶
- Database Credentials: Every 90 days
- AWS Access Keys: Every 90 days
- SSL/TLS Certificates: Automatically renewed (ACM/cert-manager)
- Application API Keys: Every 180 days or when compromised
- Container Registry Credentials: Every 90 days
Software Patches and Upgrades¶
This section provides prescriptive guidance for applying software patches and performing upgrades to Corridor deployments on AWS.
Patch Management Strategy¶
Patch Categories:
- Security Patches: Critical security vulnerabilities (apply immediately)
- Bug Fixes: Application bug fixes (apply within 30 days)
- Feature Updates: New features and enhancements (plan and schedule)
- Infrastructure Updates: AWS service updates, OS patches (apply monthly)
Pre-Patch Preparation¶
Step 1: Review Patch Notes
- Review Corridor release notes for changes and breaking changes
- Review AWS service updates and deprecations
- Identify dependencies and compatibility requirements
Step 2: Create Backup
# Create database snapshot
aws rds create-db-snapshot \
--db-snapshot-identifier pre-patch-backup-$(date +%Y%m%d) \
--db-instance-identifier corridor-db
# Backup configuration files
# (See Backup and Recovery section for detailed procedures)
Step 3: Test in Non-Production Environment
- Deploy patch to development/staging environment first
- Verify functionality and performance
- Document any issues or workarounds
Applying Application Updates¶
For EKS Deployments:
# Step 1: Update container images
# Pull new container image version
docker pull <registry>/corridor/ggx:<new-version>
# Step 2: Update Kubernetes deployment
kubectl set image deployment/corridor-app \
corridor-app=<registry>/corridor/ggx:<new-version> \
-n corridor
kubectl set image deployment/corridor-api \
corridor-api=<registry>/corridor/ggx:<new-version> \
-n corridor
# Step 3: Monitor rollout
kubectl rollout status deployment/corridor-app -n corridor
kubectl rollout status deployment/corridor-api -n corridor
# Step 4: Verify application health
curl https://corridor.example.com/health
# Step 5: Rollback if needed
kubectl rollout undo deployment/corridor-app -n corridor
For ECS Fargate Deployments:
# Step 1: Register new task definition with updated image
aws ecs register-task-definition \
--cli-input-json file://corridor-task-definition-v2.json
# Step 2: Update service with new task definition
aws ecs update-service \
--cluster corridor-cluster \
--service corridor-service \
--task-definition corridor-task:v2 \
--force-new-deployment
# Step 3: Monitor deployment
aws ecs describe-services \
--cluster corridor-cluster \
--services corridor-service \
--query 'services[0].deployments'
# Step 4: Verify application health
curl https://corridor.example.com/health
# Step 5: Rollback if needed
aws ecs update-service \
--cluster corridor-cluster \
--service corridor-service \
--task-definition corridor-task:v1
For EC2 Deployments:
# Step 1: Download new installation bundle
wget https://<download-url>/corridor-bundle-<new-version>.zip
# Step 2: Stop services
sudo systemctl stop corridor-app corridor-api corridor-worker-api
# Step 3: Backup current installation
sudo cp -r /opt/corridor /opt/corridor-backup-$(date +%Y%m%d)
# Step 4: Extract new bundle
cd /tmp
unzip corridor-bundle-<new-version>.zip
# Step 5: Run database migrations
/opt/corridor/venv-api/bin/corridor-api db upgrade
# Step 6: Restart services
sudo systemctl start corridor-app corridor-api corridor-worker-api
# Step 7: Verify application health
curl https://corridor.example.com/health
# Step 8: Rollback if needed
sudo systemctl stop corridor-app corridor-api
sudo rm -rf /opt/corridor
sudo cp -r /opt/corridor-backup-$(date +%Y%m%d) /opt/corridor
sudo systemctl start corridor-app corridor-api
Applying Infrastructure Patches¶
RDS Database Engine Updates:
# Step 1: Check available engine versions
aws rds describe-db-engine-versions \
--engine postgres \
--engine-version 14.9 \
--query 'DBEngineVersions[*].EngineVersion'
# Step 2: Apply minor version update during maintenance window
aws rds modify-db-instance \
--db-instance-identifier corridor-db \
--engine-version 14.10 \
--apply-immediately false
# Step 3: Monitor update progress
aws rds describe-db-instances \
--db-instance-identifier corridor-db \
--query 'DBInstances[0].[DBInstanceStatus,PendingModifiedValues]'
EKS Cluster Updates:
# Step 1: Update EKS cluster version
aws eks update-cluster-version \
--name corridor-cluster \
--version 1.28
# Step 2: Update node group AMI
aws eks update-nodegroup-version \
--cluster-name corridor-cluster \
--nodegroup-name corridor-nodes \
--release-version <new-release-version>
EC2 Instance Updates:
# Step 1: Install OS updates
sudo yum update -y # Amazon Linux
# or
sudo apt-get update && sudo apt-get upgrade -y # Ubuntu
# Step 2: Reboot if kernel updates were applied
sudo reboot
Post-Patch Verification¶
After applying patches, verify the following:
- Application health endpoint returns healthy status
- All services are running correctly
- Database connectivity is working
- File storage is accessible
- User authentication works
- Critical functionality is operational
- Performance metrics are within normal ranges
- No errors in application logs
Rollback Procedures¶
If patch causes issues:
- Stop the update process (if in progress)
- Restore from backup (database, configuration)
- Revert to previous version (application code)
- Verify rollback success
- Document issues for future reference
License Management¶
This section provides prescriptive guidance on managing Corridor software licenses and AWS service licenses.
Corridor Software License Management¶
License Types:
- Subscription License: Annual or multi-year subscription
- Perpetual License: One-time purchase with optional maintenance
- Trial License: Time-limited evaluation license
License Information Storage:
# License information is typically stored in:
# - Application configuration files
# - Environment variables
# - Secrets Manager (for license keys)
# View current license information
# For EKS:
kubectl get configmap corridor-config -n corridor -o yaml | grep -i license
# For EC2:
cat /opt/corridor/instances/default/config/app_config.py | grep -i license
License Renewal Process:
- Monitor License Expiration:
- Check license expiration date in application settings
- Set up alerts 30 days before expiration
-
Review license usage and compliance
-
Renew License:
- Contact Corridor sales or account representative
- Provide current license information
-
Receive new license key or activation code
-
Update License:
# Update license in application configuration # For EKS: Update ConfigMap or Secret kubectl create secret generic corridor-license \ --from-literal=license-key=<new-license-key> \ --namespace corridor \ --dry-run=client -o yaml | kubectl apply -f - # Restart application kubectl rollout restart deployment -n corridor # For EC2: Update configuration file sudo vi /opt/corridor/instances/default/config/app_config.py # Update LICENSE_KEY value sudo systemctl restart corridor-app -
Verify License Activation:
- Check application logs for license validation
- Verify license status in application UI
- Confirm all features are accessible
License Compliance:
- Track license usage against purchased licenses
- Monitor user counts and feature usage
- Generate license usage reports quarterly
- Ensure compliance with license terms
AWS Service License Management¶
AWS Services:
- AWS services use pay-as-you-go pricing (no separate licenses)
- Some services have usage limits (see AWS Service Limits section)
- Reserved Instances provide cost savings but are not licenses
Third-Party Software Licenses:
- PostgreSQL: Open-source (no license required)
- Redis: Open-source (no license required)
- Python: Open-source (no license required)
- Java: Open-source OpenJDK (no license required)
- Kubernetes: Open-source (no license required)
License Audit:
- Conduct quarterly license audits
- Document all software licenses and expiration dates
- Maintain license inventory spreadsheet
- Set up renewal reminders
AWS Service Limits Management¶
This section provides prescriptive guidance on managing AWS service limits (quotas) to ensure deployments can scale and operate effectively.
Understanding AWS Service Limits¶
Common Service Limits:
- EC2 Instances: Number of running instances per region
- EKS Clusters: Number of clusters per account
- RDS Instances: Number of DB instances per region
- VPCs: Number of VPCs per region
- Elastic IPs: Number of Elastic IPs per region
- NAT Gateways: Number of NAT Gateways per availability zone
Checking Current Service Limits¶
Step 1: List Service Quotas
# List all service quotas for a specific service
aws service-quotas list-service-quotas \
--service-code ec2 \
--query 'Quotas[*].[QuotaName,Value,Adjustable]' \
--output table
# Check specific quota
aws service-quotas get-service-quota \
--service-code ec2 \
--quota-code L-0263D0A3 # Running On-Demand EC2 instances
Step 2: Monitor Quota Usage
# Get current usage
aws service-quotas get-aws-default-service-quota \
--service-code ec2 \
--quota-code L-0263D0A3
# Set up CloudWatch alarms for quota usage
aws cloudwatch put-metric-alarm \
--alarm-name ec2-instance-quota-warning \
--alarm-description "Alert when EC2 instance quota exceeds 80%" \
--metric-name ResourceCount \
--namespace AWS/Usage \
--statistic Average \
--period 300 \
--threshold 80 \
--comparison-operator GreaterThanThreshold
Requesting Limit Increases¶
Step 1: Identify Required Increase
- Determine current usage and projected growth
- Calculate required limit increase
- Document business justification
Step 2: Submit Limit Increase Request
# Request limit increase via AWS CLI
aws service-quotas request-service-quota-increase \
--service-code ec2 \
--quota-code L-0263D0A3 \
--desired-value 100
# Or use AWS Console:
# Service Quotas → AWS services → Select service → Request quota increase
Step 3: Monitor Request Status
# Check request status
aws service-quotas list-requested-service-quota-change-history \
--service-code ec2 \
--query 'RequestedQuotas[*].[QuotaName,Status,DesiredValue]' \
--output table
Step 4: Implement Workarounds (If Needed)
- Use multiple AWS accounts for different environments
- Distribute resources across multiple regions
- Use Spot Instances for non-production workloads
- Optimize resource usage
Proactive Limit Management¶
Best Practices:
- Monitor Quota Usage Regularly:
- Set up CloudWatch alarms at 80% usage
- Review quota usage monthly
-
Plan for growth
-
Request Increases Early:
- Request increases 2-4 weeks before needed
- Provide business justification
-
Request slightly higher than immediate need
-
Document Limits:
- Maintain inventory of all service limits
- Track limit increase requests
-
Document approved limits
-
Optimize Resource Usage:
- Right-size instances and resources
- Use Reserved Instances for predictable workloads
- Clean up unused resources regularly
Fault Handling and Recovery¶
This section provides step-by-step instructions for handling fault conditions and recovering the Corridor software deployment.
Handling Fault Conditions¶
Fault Detection¶
Step 1: Identify Fault Type
Common fault conditions: - Application unresponsive or returning errors - Database connection failures - Storage access issues - Network connectivity problems - High resource utilization - Service crashes or restarts
Step 2: Check Application Health
# Check health endpoint
curl https://corridor.example.com/health
# Check load balancer target health
aws elbv2 describe-target-health --target-group-arn <target-group-arn>
# Check application logs for errors
# For EKS:
kubectl logs -n corridor deploy/corridor-app --tail=100 | grep -i error
# For ECS:
aws logs tail /ecs/corridor --follow | grep -i error
# For EC2:
sudo journalctl -u corridor-app -n 100 | grep -i error
Step 3: Check Infrastructure Health
# Check database status
aws rds describe-db-instances \
--db-instance-identifier corridor-db \
--query 'DBInstances[0].[DBInstanceStatus,DBInstanceStatusInfos]'
# Check compute resources
# For EKS:
kubectl get nodes
kubectl get pods -n corridor
# For ECS:
aws ecs describe-tasks \
--cluster corridor-cluster \
--tasks $(aws ecs list-tasks --cluster corridor-cluster --query 'taskArns[0]' --output text)
# For EC2:
aws ec2 describe-instance-status --instance-ids <instance-id>
Common Fault Scenarios and Resolution¶
Scenario 1: Application Pod/Container Crashes
Symptoms: Pods/containers restarting frequently, application unavailable
Resolution Steps:
# For EKS:
# 1. Check pod status
kubectl get pods -n corridor
# 2. Describe pod for events
kubectl describe pod <pod-name> -n corridor
# 3. Check pod logs
kubectl logs <pod-name> -n corridor --previous
# 4. Check resource limits
kubectl top pod <pod-name> -n corridor
# 5. Scale up if resource constrained
kubectl scale deployment corridor-app --replicas=3 -n corridor
# For ECS:
# 1. Check task status
aws ecs describe-tasks --cluster corridor-cluster --tasks <task-id>
# 2. Check task logs
aws logs tail /ecs/corridor --follow
# 3. Update service desired count
aws ecs update-service --cluster corridor-cluster --service corridor-service --desired-count 3
# For EC2:
# 1. Check service status
sudo systemctl status corridor-app
# 2. Check service logs
sudo journalctl -u corridor-app -n 100
# 3. Restart service
sudo systemctl restart corridor-app
Scenario 2: Database Connection Failures
Symptoms: Application cannot connect to database, database errors in logs
Resolution Steps:
# 1. Verify database is running
aws rds describe-db-instances \
--db-instance-identifier corridor-db \
--query 'DBInstances[0].DBInstanceStatus'
# 2. Check database endpoint
aws rds describe-db-instances \
--db-instance-identifier corridor-db \
--query 'DBInstances[0].Endpoint.Address'
# 3. Test database connectivity
psql -h <rds-endpoint> -U corridor -d corridor -c "SELECT 1;"
# 4. Check security groups
aws ec2 describe-security-groups --group-ids <db-sg-id>
# 5. Verify credentials in Secrets Manager
aws secretsmanager get-secret-value --secret-id corridor/database/credentials
# 6. Restart application to refresh connections
# For EKS: kubectl rollout restart deployment -n corridor
# For ECS: aws ecs update-service --cluster corridor-cluster --service corridor-service --force-new-deployment
# For EC2: sudo systemctl restart corridor-app
Scenario 3: Storage Access Issues
Symptoms: File uploads/downloads failing, storage errors
Resolution Steps:
# For EFS:
# 1. Check EFS status
aws efs describe-file-systems --file-system-id <efs-id>
# 2. Check mount targets
aws efs describe-mount-targets --file-system-id <efs-id>
# 3. Test mount from application
# For EKS:
kubectl exec -it -n corridor deploy/corridor-app -- df -h | grep efs
# For ECS:
aws ecs execute-command --cluster corridor-cluster --task <task-id> --container corridor-app --command "df -h"
# For EBS (EC2):
# 1. Check volume status
aws ec2 describe-volumes --volume-ids <volume-id>
# 2. Check volume attachment
aws ec2 describe-instances --instance-ids <instance-id> --query 'Instances[0].BlockDeviceMappings'
# 3. Check filesystem
df -h
sudo mount /dev/xvdf /mnt/data
Scenario 4: High Resource Utilization
Symptoms: Slow response times, timeouts, high CPU/memory usage
Resolution Steps:
# 1. Check resource utilization
# For EKS:
kubectl top nodes
kubectl top pods -n corridor
# For ECS:
aws cloudwatch get-metric-statistics \
--namespace AWS/ECS \
--metric-name CPUUtilization \
--dimensions Name=ServiceName,Value=corridor-service \
--start-time $(date -u -d '1 hour ago' +%Y-%m-%dT%H:%M:%S) \
--end-time $(date -u +%Y-%m-%dT%H:%M:%S) \
--period 300 \
--statistics Average,Maximum
# 2. Scale resources
# For EKS: Scale node group or increase pod replicas
aws eks update-nodegroup-config \
--cluster-name corridor-cluster \
--nodegroup-name corridor-nodes \
--scaling-config minSize=5,maxSize=15
# For ECS: Increase task count
aws ecs update-service \
--cluster corridor-cluster \
--service corridor-service \
--desired-count 4
# For EC2: Upgrade instance type or add instances
aws ec2 modify-instance-attribute \
--instance-id <instance-id> \
--instance-type Value=t3.2xlarge
Software Recovery Procedures¶
Complete Deployment Recovery¶
Step 1: Assess Recovery Requirements
- Identify what needs to be recovered (database, files, configuration)
- Determine recovery point objective (RPO) and recovery time objective (RTO)
- Choose recovery method (full restore, partial restore, failover)
Step 2: Restore Infrastructure (If Needed)
# Restore from Infrastructure-as-Code
# Using Terraform:
terraform apply -var-file=production.tfvars
# Using CloudFormation:
aws cloudformation create-stack \
--stack-name corridor-recovery \
--template-body file://corridor-infrastructure.yaml
Step 3: Restore Database
# Restore from latest snapshot
aws rds restore-db-instance-from-db-snapshot \
--db-instance-identifier corridor-db-recovered \
--db-snapshot-identifier corridor-db-snapshot-20240101
# Wait for restore completion
aws rds wait db-instance-available \
--db-instance-identifier corridor-db-recovered
# Update application configuration with new database endpoint
Step 4: Restore File Storage
# Restore EFS from backup
aws backup start-restore-job \
--recovery-point-arn <recovery-point-arn> \
--iam-role-arn arn:aws:iam::<account-id>:role/service-role/AWSBackupDefaultServiceRole \
--resource-type EFS \
--metadata file-system-id=<efs-id>
# Or restore EBS from snapshot
aws ec2 create-volume \
--snapshot-id <snapshot-id> \
--availability-zone us-east-1a
Step 5: Restore Application Configuration
# Restore Kubernetes ConfigMaps/Secrets
kubectl apply -f corridor-configmaps-backup-20240101.yaml -n corridor
# Restore ECS task definition
aws ecs register-task-definition \
--cli-input-json file://corridor-task-definition-backup-20240101.json
# Restore EC2 configuration files
tar -xzf corridor-config-backup-20240101.tar.gz -C /
Step 6: Deploy Application
# Deploy application using restored configuration
# For EKS:
kubectl apply -f corridor-manifests.yaml -n corridor
# For ECS:
aws ecs update-service \
--cluster corridor-cluster \
--service corridor-service \
--task-definition corridor-task
# For EC2:
sudo systemctl start corridor-app corridor-api
Step 7: Verify Recovery
# Verify application health
curl https://corridor.example.com/health
# Verify database connectivity
psql -h <rds-endpoint> -U corridor -d corridor -c "SELECT COUNT(*) FROM users;"
# Verify file storage
# Test file upload/download functionality
# Verify all services
# Check application logs for errors
Disaster Recovery to Secondary Region¶
Step 1: Prepare Secondary Region
# Copy database snapshot to secondary region
aws rds copy-db-snapshot \
--source-db-snapshot-identifier corridor-db-snapshot-20240101 \
--target-db-snapshot-identifier corridor-db-snapshot-dr \
--target-region us-west-2
# Copy EBS snapshots
aws ec2 copy-snapshot \
--source-region us-east-1 \
--source-snapshot-id <snapshot-id> \
--region us-west-2
Step 2: Deploy Infrastructure in Secondary Region
# Deploy infrastructure using Terraform/CloudFormation
# Update region parameter to secondary region
terraform apply -var="aws_region=us-west-2"
Step 3: Restore Data in Secondary Region
# Restore database from snapshot
aws rds restore-db-instance-from-db-snapshot \
--db-instance-identifier corridor-db-dr \
--db-snapshot-identifier corridor-db-snapshot-dr \
--region us-west-2
# Restore file storage
# (Follow file storage restore procedures)
Step 4: Update DNS for Failover
# Update Route 53 health check and failover
aws route53 change-resource-record-sets \
--hosted-zone-id <hosted-zone-id> \
--change-batch file://failover-routing.json
Recovery Testing¶
Regular Recovery Testing:
- Test database restore procedures quarterly
- Test file storage restore procedures quarterly
- Test complete deployment recovery annually
- Document test results and update procedures
Support Information¶
This section provides details on how to receive support, technical support tiers, and Service Level Agreements (SLAs) for Corridor deployments on AWS.
How to Receive Support¶
Support Channels:
- Email Support:
- Primary support email: support@corridorplatforms.com
- Include deployment method (EKS/ECS/EC2), AWS region, error messages, and logs
-
Response time: Based on support tier (see below)
-
Support Portal:
- Access support portal at: https://support.corridorplatforms.com
- Submit tickets, track issues, access knowledge base
-
Available 24/7 for ticket submission
-
Emergency Support:
- For critical production issues
- Contact via emergency hotline (Premium/Enterprise tiers)
- Available 24/7 for critical issues
Information to Provide When Requesting Support:
- Deployment method (EKS, ECS Fargate, or EC2)
- AWS region and account ID
- Corridor version
- Detailed error messages and logs
- Steps to reproduce the issue
- Screenshots or screen recordings (if applicable)
- Impact assessment (number of users affected, business impact)
Technical Support Tiers¶
Tier 1: Standard Support
- Availability: Business hours (9 AM - 5 PM EST, Monday-Friday)
- Response Time:
- Critical: 8 business hours
- High: 1 business day
- Medium: 2 business days
- Low: 3 business days
- Channels: Email, Support Portal
- Included Services:
- Product documentation access
- Knowledge base access
- Bug fixes and security patches
- General product questions
- Exclusions:
- Custom development
- Architecture design consultation
- On-site support
Tier 2: Premium Support
- Availability: Extended business hours (8 AM - 8 PM EST, Monday-Friday)
- Response Time:
- Critical: 4 business hours
- High: 8 business hours
- Medium: 1 business day
- Low: 2 business days
- Channels: Email, Support Portal
- Included Services:
- All Standard Support services
- Priority ticket handling
- Architecture guidance
- Performance optimization assistance
- Additional Features:
- Monthly health check reviews
- Proactive monitoring recommendations
Tier 3: Enterprise Support
- Availability: 24/7 support
- Response Time:
- Critical: 1 hour (24/7)
- High: 4 hours (24/7)
- Medium: 8 business hours
- Low: 1 business day
- Channels: Email, Support Portal, Emergency Hotline
- Included Services:
- All Premium Support services
- 24/7 critical issue support
- Dedicated support engineer
- Custom deployment assistance
- Quarterly architecture reviews
- On-site support (as needed)
- Additional Features:
- Custom SLA definitions
- Direct access to engineering team
- Proactive issue identification
Support Tiers and Service Level Agreements (SLAs)¶
Severity Levels:
- Critical (P1):
- Production system down or severely degraded
- Data loss or corruption risk
- Security breach
-
SLA:
- Standard: 8 business hours
- Premium: 4 business hours
- Enterprise: 1 hour (24/7)
-
High (P2):
- Major feature unavailable
- Significant performance degradation
-
SLA:
- Standard: 1 business day
- Premium: 8 business hours
- Enterprise: 4 hours (24/7)
-
Medium (P3):
- Minor feature unavailable
- Workaround available
-
SLA:
- Standard: 2 business days
- Premium: 1 business day
- Enterprise: 8 business hours
-
Low (P4):
- General questions
- Feature requests
- SLA:
- Standard: 3 business days
- Premium: 2 business days
- Enterprise: 1 business day
SLA Guarantees:
- Response Time: Time to initial response from support team
- Resolution Time: Target time to resolve issue (varies by severity and complexity)
- Uptime Guarantee: 99.9% availability (Enterprise tier)
- Escalation Process: Automatic escalation if SLA not met
Support Coverage:
- Product Support: Corridor platform functionality, deployment, configuration
- Infrastructure Support: AWS infrastructure guidance (limited)
- Integration Support: Third-party integration assistance
- Exclusions:
- Custom development
- Training (available as separate service)
- AWS account management
Escalation Process:
- Level 1: Support Engineer (initial response)
- Level 2: Senior Support Engineer (complex issues)
- Level 3: Engineering Team (critical bugs, architecture issues)
- Level 4: Product Management (feature requests, strategic issues)
Support Request Process:
- Submit ticket via email or support portal
- Receive ticket confirmation and tracking number
- Initial response within SLA timeframe
- Issue investigation and resolution
- Ticket closure with resolution summary
Contact Information:
- Support Email: support@corridorplatforms.com
- Support Portal: https://support.corridorplatforms.com
- Emergency Hotline: Provided upon account activation (Enterprise)
Installation Methods¶
Corridor supports three different installation approaches on AWS, each suited to different organizational needs and infrastructure preferences:
Option 1: Kubernetes - EKS¶
Deploy Corridor on Amazon Elastic Kubernetes Service (EKS) for a cloud-native, containerized deployment.
Best for:
- Organizations with Kubernetes expertise
- Multi-tenant deployments with namespace isolation
- Auto-scaling and high availability requirements
- Modern cloud-native infrastructure
Option 2: Serverless - ECS Fargate¶
Deploy Corridor using AWS Fargate for a serverless container deployment.
Best for:
- Organizations wanting serverless infrastructure
- Minimal infrastructure management
- Variable workload patterns
View Fargate Installation Guide →
Option 3: Virtual Machines - EC2¶
Deploy Corridor on Amazon EC2 instances for a traditional VM-based deployment.
Best for:
- Organizations preferring VM-based infrastructure
- Direct control over the operating system
- Traditional IT infrastructure patterns
- Custom hardware requirements
Common AWS Services¶
All deployment options utilize these AWS managed services:
- RDS: PostgreSQL database for metadata storage
- S3: Object storage for file management (or EFS for NFS)
- Application Load Balancer: HTTP(S) load balancing
- Route 53: Domain name management
- VPC: Virtual private cloud networking
Choosing the Right Option¶
| Factor | EKS (Kubernetes) | ECS Fargate | EC2 (VMs) |
|---|---|---|---|
| Complexity | Higher | Medium | Lower |
| Scalability | Automatic | Automatic | Manual |
| Multi-tenancy | Native (namespaces) | Task-based | Separate VMs |
| Operational Overhead | Lower (managed K8s) | Lowest | Higher |
| Deployment Speed | Fast | Fastest | Moderate |
| Infrastructure Management | Minimal | None | Full |
| Cost Model | Node-based | Per task | Per instance |
| Kubernetes Skills Required | Yes | No | No |