devops-exercises/topics/gcp/README.md
Kian-Meng Ang 4b6718938c
Fix typos ()
Found via `codespell -L caf,etcp,alle,aks`
2023-08-24 23:02:53 +03:00

21 KiB

Google Cloud Platform

Exercises

Account Setup

Name Topic Objective & Instructions Solution Comments
Create a project Organization Exercise Solution
Assign roles IAM Exercise Solution

Compute Engine

Name Topic Objective & Instructions Solution Comments
Create an instance Compute, Labels Exercise Solution

Questions

Global Infrastructure

Explain each of the following
  • Zone
  • Region

GCP regions are data centers hosted across different geographical locations worldwide.

Within each region, there are multiple isolated locations known as Zones. Each zone is one or more data-centers with redundant network and connectivity and power supply. Multiple zones ensure high availability in case one of them goes down

True or False? Each GCP region is designed to be completely isolated from the other GCP regions

True.

What considerations to take when choosing an GCP region for running a new application?
  • Services Availability: not all service (and all their features) are available in every region
  • Reduced latency: deploy application in a region that is close to customers
  • Compliance: some countries have more strict rules and requirements such as making sure the data stays within the borders of the country or the region. In that case, only specific region can be used for running the application
  • Pricing: the pricing might not be consistent across regions so, the price for the same service in different regions might be different.
True or False? All GCP services are available in all regions zones

False. You can see here which products/services available in each region.

gcloud

How to list all regions?

gcloud compute regions list

Resource Hierarchy

Explain resources hierarchy in GCP

Organization Folder Project Resources

  • Organizations - Company
  • Folder - usually for departments, teams, products, etc.
  • Project - can be different projects or same project but different environments (dev, staging, production)
  • Resources - actual GCP services (Compute, App engine, Storage, etc.)

True or False? In a project, you can have one or more organizations

False. It's quite the opposite. First there is an organization and under organization you can have one or more folder with one or more projects.

True or False? A resource has to be associated with at least one project

True. You can't have resources associate with no projects.

True or False? Project name has to be globally unique

True.

IAM and Roles

Explain roles and permissions

Role is an encapsulation of set of permissions. For example an "owner" role has more than 3000 assigned permissions to the different components and services of GCP.

True or False? Permissive parent policy will always overrule restrictive child policy

True

Labels and Tags

What are labels?

You can think about labels in GCP as sticky notes that you attach to different GCP resources. That makes it easier for example, to search for specific resources (like applying the label called "web-app" and search for all the resources that are related somehow to "web-app")

Can you provide some examples to labels usage in GCP?
  • Location (cost center)
  • Project (or environment, folder, etc.)
  • Service type
  • Service owner
  • Application type
  • Application owner
What are network tags and how are they different from labels?

As the name suggests, network tags can be applied only to network resources. While labels don't affect the resources on which they are applied, network tags do affect resources (e.g. firewall access and networking routes)

gcloud

List the labels of an instance called "instance-1"

gcloud compute instances describe instance-1 --format "yaml(labels)"

Update a label to "app=db" for the instance called "instance-1"

gcloud compute instances update instance-1 --update-labels app=db

Remove the label "env" from an instance called "instance-1"

gcloud compute instances update instance-1 --remove-labels env

Compute Engine

gcloud

Create an instance with the following properties:
  • name: instance-1
  • machine type: e2-micro
  • labels: app=web, env=dev

gcloud compute instances create instance-1 --labels app=web,env=dev --machine-type=e2-micro

Other

Tell me what do you know about GCP networking

Virtual Private Cloud(VPC) network is a virtual version of physical network, implemented in Google's internal Network. VPC is a global resource in GCP. Subnetworks(subnets) are regional resources, ie., subnets can be created withinin regions.

VPC are created in 2 modes,

  1. Auto mode VPC - One subnet in each region is created automatically by GCP while creating VPC

  2. Custom mode VPC - No subnets are automatically created. This type of network provides complete control over the subnets creation to the users.

Explain Cloud Functions

Google Cloud Functions is a serverless execution environment for building and connecting cloud services. With Cloud Functions you write simple, single-purpose functions that are attached to events emitted from your cloud infrastructure and services. Your function is triggered when an event being watched is fired.

What is Cloud Datastore?

Cloud Datastore is a schemaless NoSQL datastore in Google's cloud. Applications can use Datastore to query your data with SQL-like queries that support filtering and sorting. Datastore replicates data across multiple datacenters, which provides a high level of read/write availability.

What network tags are used for?

Network tags allow you to apply firewall rules and routes to a specific instance or set of instances: You make a firewall rule applicable to specific instances by using target tags and source tags.

What are flow logs? Where are they enabled?

VPC Flow Logs records a sample of network flows sent from and received by VM instances, including instances used as Google Kubernetes Engine nodes. These logs can be used for network monitoring, forensics, real-time security analysis, and expense optimization.

Enable Flow Logs

  1. Open VPC Network in GCP Console

  2. Click the name of the subnet

  3. Click EDIT button

  4. Set Flow Logs to On

  5. Click Save

How do you list buckets?
Two ways to do that:

$ gsutil ls

$ gcloud alpha storage ls

What Compute metadata key allows you to run code at startup?

startap-script

What the following commands does? `gcloud deployment-manager deployments create`

Deployment Manager creates a new deployment.

What is Cloud Code?
It is a set of tools to help developers write, run and debug GCP kubernetes based applications. It provides built-in support for rapid iteration, debugging and running applications in development and production K8s environments.

Google Kubernetes Engine (GKE)

What is GKE
  • It is the managed kubernetes service on GCP for deploying, managing and scaling containerised applications using Google infrastructure.

Anthos

What is Anthos
It is a managed application platform for organisations like enterprises that require quick modernisation and certain levels of consistency for their legacy applications in a hybrid or multicloud world. From this explanation the core ideas can be drawn from these statements;
  • Managed -> the customer does not need to worry about the underlying software integrations, they just enable the API.
  • application platform -> It consists of open source tools like K8s, Knative, Istio and Tekton
  • Enterprises -> these are usually organisations with complex needs
  • Consistency -> to have the same policies declaratively initiated to be run anywhere securely e.g on-prem, GCP or other-clouds (AWS or Azure)

fun fact: Anthos is flower in greek, they grow in the ground (earth) but need rain from the clouds to flourish.

List the technical components that make up Anthos
  • Infrastructure management - Google Kubernetes Engine (GKE)
  • Cluster management - GKE, Ingress for Anthos
  • Service management - Anthos Service Mesh
  • Policy enforcement - Anthos Config Management, Anthos Enterprise Data Protection, Policy Controller
  • Application deployment - CI/CD tools like Cloud Build, GitLab
  • Application development - Cloud Code
What is the primary computing environment for Anthos to easily manage workload deployment?
  • Google Kubernetes Engine (GKE)
How does Anthos handle the control plane and node components for GKE?

On GCP the kubernetes api-server is the only control plane component exposed to customers whilst compute engine manages instances in the project.

Which load balancing options are available?
  • Networking load balancing for L4 and HTTP(S) Load Balancing for L7 which are both managed services that do not require additional configuration.
  • Ingress for Anthos which allows the ability to deploy a load balancer that serves an application across multiple clusters on GKE
Can you deploy Anthos on AWS?
  • Yes, Anthos on AWS is now GA. For more read here
List and explain the enterprise security capabilities provided by Anthos
  • Control plane security - GCP manages and maintains the K8s control plane out of the box. The user can secure the api-server by using master authorized networks and private clusters. These allow the user to disable access on the public IP address by assigning a private IP address to the master.
  • Node security - By default workloads are provisioned on Compute engine instances that use Google's Container Optimised OS. This operating system implements a locked-down firewall, limited user accounts with root disabled and a read-only filesystem. There is a further option to enable GKE Sandbox for stronger isolation in multi-tenant deployment scenarios.
  • Network security - Within a created cluster VPC, Anthos GKE leverages a powerful software-defined network that enables simple Pod-to-Pod communications. Network policies allow locking down ingress and egress connections in a given namespace. Filtering can also be implemented to incoming load-balanced traffic for services that require external access, by supplying whitelisted CIDR IP ranges.
  • Workload security - Running workloads run with limited privileges, default Docker AppArmor security policies are applied to all Kubernetes Pods. Workload identity for Anthos GKE aligns with the open source kubernetes service accounts with GCP service account permissions.
  • Audit logging - Administrators are given a way to retain, query, process and alert on events of the deployed environments.
How can workloads deployed on Anthos GKE on-prem clusters securely connect to Google Cloud services?
  • Google Cloud Virtual Private Network (Cloud VPN) - this is for secure networking
  • Google Cloud Key Management Service (Cloud KMS) - for key management
What is Island Mode configuration with regards to networking in Anthos GKE deployed on-prem?
  • This is when pods can directly talk to each other within a cluster, but cannot be reached from outside the cluster thus forming an "island" within the network that is not connected to the external network.
Explain Anthos Config Management

It is a core component of the Anthos stack which provides platform, service and security operators with a single, unified approach to multi-cluster management that spans both on-premises and cloud environments. It closely follows K8s best practices, favoring declarative approaches over imperative operations, and actively monitors cluster state and applies the desired state as defined in Git. It includes three key components as follows:

  1. An importer that reads from a central Git repository
  2. A component that synchronises stored configuration data into K8s objects
  3. A component that monitors drift between desired and actual cluster configurations with a capability of reconciliation when need rises.
How does Anthos Config Management help?

It follows common modern software development practices which makes cluster configuration, management and policy changes auditable, revertable, and versionable easily enforcing IT governance and unifying resource management in an organisation.

What is Anthos Service Mesh?
  • It is a suite of tools that assist in monitoring and managing deployed services on Anthos of all shapes and sizes whether running in cloud, hybrid or multi-cloud environments. It leverages the APIs and core components from Istio, a highly configurable and open-source service mesh platform.
Describe the two main components of Anthos Service Mesh
  1. Data plane - it consists of a set of distributed proxies that mediate all inbound and outbound network traffic between individual services which are configured using a centralised control plane and an open API
  2. Control plane - is a fully managed offering outside of Anthos GKE clusters to simplify management overhead and ensure highest possible availability.
What are the components of the managed control plane of Anthos Service Mesh?
  1. Traffic Director - it is GCP's fully managed service mesh traffic control plane, responsible for translating Istio API objects into configuration information for the distributed proxies, as well as directing service mesh ingress and egress traffic
  2. Managed CA - is a centralised certificate authority responsible for providing SSL certificates to each of the distributed proxies, authentication information and distributing secrets
  3. Operations tooling - formerly stackdriver, provides a managed ingestion point for observability and telemetry, specifically monitoring, tracing and logging data generated by each of the proxies. This powers the observability dashboard for operators to visually inspect their services and service dependencies assisting in the implementation of SRE best practices for monitoring SLIs and establishing SLOs.
How does Anthos Service Mesh help?
Tool and technology integration that makes up Anthos service mesh delivers significant operational benefits to Anthos environments, with minimal additional overhead such as follows:
  • Uniform observability - the data plane reports service to service communication back to the control plane generating a service dependency graph. Traffic inspection by the proxy inserts headers to facilitate distributed tracing, capturing and reporting service logs together with service-level metrics (i.e latency, errors, availability).
  • Operational agility - fine-grained controls for managing the flow of inter-mesh (north-south) and intra-mesh (east-west) traffic are provided.
  • Policy-driven security - policies can be enforced consistently across diverse protocols and runtimes as service communications are secured by default.
List possible use cases of traffic controls that can be implemented within Anthos Service Mesh
  • Traffic splitting across differing service versions for canary or A/B testing
  • Circuit breaking to prevent cascading failures
  • Fault injection to help build resilient and fault-tolerant deployments
  • HTTP header-based traffic steering between individual services or versions
What is Cloud Run for Anthos?

It is part of the Anthos stack that brings a serverless container experience to Anthos, offering a high-level platform experience on top of K8s clusters. It is built with Knative, an open-source operator for K8s that brings serverless application serving and eventing capabilities.

How does Cloud Run for Anthos simplify operations?

Platform teams in organisations that wish to offer developers additional tools to test, deploy and run applications can use Knative to enhance this experience on Anthos as Cloud Run. Below are some of the benefits;

  • Easy migration from K8s deployments - Without Cloud Run, platform engineers have to configure deployment, service, and HorizontalPodAutoscalers(HPA) objects to a loadbalancer and autoscaling. If application is already serving traffic it becomes hard to change configurations or roll back efficiently. Using Cloud Run all this is managed thus the Knative service manifest describes the application to be autoscaled and loadbalanced
  • Autoscaling - a sudden traffic spike may cause application containers in K8s to crash due to overload thus an efficient automated autoscaling is executed to serve the high volume of traffic
  • Networking - it has built-in load balancing capabilities and policies for traffic splitting between multiple versions of an application.
  • Releases and rollouts - supports the notion of the Knatibe API's revisions which describe new versions or different configurations of your application and canary deployments by splitting traffic.
  • Monitoring - observing and recording metrics such as latency, error rate and requests per second.
List and explain three high-level out of the box autoscaling primitives offered by Cloud Run for Anthos that do not exist in K8s natively
  • Rapid, request-based autoscaling - default autoscalers monitor request metrics which allows Cloud Run for Anthos to handle spiky traffic patterns smoothly
  • Concurrency controls - limits such as max in-flight requests per container are enforced to ensure the container does not become overloaded and crash. More containers are added to handle the spiky traffic, buffering the requests.
  • Scale to zero - if an application is inactive for a while Cloud Run scales it down to zero to reduce its footprint. Alternatively one can turn off scale-to-zero to prevent cold starts.
List some Cloud Run for Anthos use cases

As it does not support stateful applications or sticky sessions, it is suitable for running stateless applications such as:

  • Machine learning model predictions e.g Tensorflow serving containers
  • API gateways, API middleware, web front ends and Microservices
  • Event handlers, ETL