kubernetes-the-hard-way/docs/03-compute-resources.md

# Provisioning Compute Resources

Kubernetes requires a set of machines to host the Kubernetes control plane and the worker nodes where containers are ultimately run. In this lab you will provision the compute resources required for running a secure and highly available Kubernetes cluster, spread across the [Availability Domains and Fault Domains](https://docs.oracle.com/en-us/iaas/Content/General/Concepts/regions.htm) of a single OCI [Region](https://docs.oracle.com/en-us/iaas/Content/General/Concepts/regions.htm).

> Ensure a default compute zone and region have been set as described in the [Prerequisites](01-prerequisites.md#set-a-default-compute-region-and-zone) lab.

## Networking

The Kubernetes [networking model](https://kubernetes.io/docs/concepts/cluster-administration/networking/#kubernetes-model) assumes a flat network in which containers and nodes can communicate with each other. In cases where this is not desired [network policies](https://kubernetes.io/docs/concepts/services-networking/network-policies/) can limit how groups of containers are allowed to communicate with each other and external network endpoints.

> Setting up network policies is out of scope for this tutorial.

### Virtual Cloud Network

In this section a dedicated [Virtual Cloud Network](https://www.oracle.com/cloud/networking/virtual-cloud-network/) (VCN) will be setup to host the Kubernetes cluster.

Create the `kubernetes-the-hard-way` custom VCN:

```
VCN_ID=$(oci network vcn create --display-name kubernetes-the-hard-way --dns-label vcn --cidr-block \
  10.240.0.0/24 | jq -r .data.id)
```

A [subnet](https://docs.oracle.com/en-us/iaas/Content/Network/Tasks/managingVCNs_topic-Overview_of_VCNs_and_Subnets.htm#Overview) must be provisioned with an IP address range large enough to assign a private IP address to each node in the Kubernetes cluster.

Create the `kubernetes` subnet in the `kubernetes-the-hard-way` VCN, along with a Route Table and Internet Gateway allowing traffic to the internet.

```
INTERNET_GATEWAY_ID=$(oci network internet-gateway create --display-name kubernetes-the-hard-way \
  --vcn-id $VCN_ID --is-enabled true | jq -r .data.id)
ROUTE_TABLE_ID=$(oci network route-table create --display-name kubernetes-the-hard-way --vcn-id $VCN_ID \
  --route-rules  "[{\"cidrBlock\":\"0.0.0.0/0\",\"networkEntityId\":\"$INTERNET_GATEWAY_ID\"}]" \
   | jq -r .data.id)
SUBNET_ID=$(oci network subnet create --display-name kubernetes --vcn-id $VCN_ID --dns-label subnet \
  --cidr-block 10.240.0.0/24 --route-table-id $ROUTE_TABLE_ID | jq -r .data.id)
```

> The `10.240.0.0/24` IP address range can host up to 254 compute instances.

:warning: **Note**: For simplicity and to stay close to the original kubernetes-the-hard-way, we will be using a single subnet, shared between the Kubernetes worker nodes, controller plane nodes, and LoadBalancer.  A production-caliber setup would consist of at least:
- A dedicated public subnet for the public LoadBalancer.
- A dedicated private subnet for controller plan nodes.
- A dedicated private subnet for worker nodes.  This setup would not allow NodePort access to services.

## Compute Instances

The compute instances in this lab will be provisioned using [Ubuntu Server](https://www.ubuntu.com/server) 20.04, which has good support for the [containerd container runtime](https://github.com/containerd/containerd). Each compute instance will be provisioned with a fixed private IP address to simplify the Kubernetes bootstrapping process.

:warning: **Note**: For simplicity in this tutorial, we will be accessing controller and worker nodes over SSH, using public addresses.  A production-caliber setup would instead run controller and worker nodes in _private_ subnets, with any direct SSH access done via [Bastions](https://docs.oracle.com/en-us/iaas/Content/Resources/Assets/whitepapers/bastion-hosts.pdf) when required.

### Create SSH Keys

Generate an RSA key pair, which we'll use for SSH access to our compute nodes:

```
ssh-keygen -b 2048 -t rsa -f kubernetes_ssh_rsa
```

Enter a passphrase at the prompt to continue:
```
Generating public/private rsa key pair.
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
```

Results:
```
kubernetes_ssh_rsa
kubernetes_ssh_rsa.pub
```

### Kubernetes Controllers

Create three compute instances which will host the Kubernetes control plane:

```
IMAGE_ID=$(oci compute image list --operating-system "Canonical Ubuntu" --operating-system-version \
  "20.04" | jq -r .data[0].id)
NUM_ADS=$(oci iam availability-domain list | jq -r .data | jq length)
for i in 0 1 2; do
  # Rudimentary distributing of nodes across Availability Domains and Fault Domains
  AD_NAME=$(oci iam availability-domain list | jq -r .data[$((i % NUM_ADS))].name)
  NUM_FDS=$(oci iam fault-domain list --availability-domain $AD_NAME | jq -r .data | jq length)
  FD_NAME=$(oci iam fault-domain list --availability-domain $AD_NAME | jq -r .data[$((i % NUM_FDS))].name)

  oci compute instance launch --display-name controller-${i} --assign-public-ip true \
    --subnet-id $SUBNET_ID --shape VM.Standard.E3.Flex --availability-domain $AD_NAME \
    --fault-domain $FD_NAME --image-id $IMAGE_ID --shape-config '{"memoryInGBs": 8.0, "ocpus": 2.0}' \
    --private-ip 10.240.0.1${i} \
    --freeform-tags '{"project": "kubernetes-the-hard-way","role":"controller"}' \
     --metadata "{\"ssh_authorized_keys\":\"$(cat kubernetes_ssh_rsa.pub)\"}"
done
```

### Kubernetes Workers

Each worker instance requires a pod subnet allocation from the Kubernetes cluster CIDR range. The pod subnet allocation will be used to configure container networking in a later exercise. The `pod-cidr` instance metadata will be used to expose pod subnet allocations to compute instances at runtime.

> The Kubernetes cluster CIDR range is defined by the Controller Manager's `--cluster-cidr` flag. In this tutorial the cluster CIDR range will be set to `10.200.0.0/16`, which supports 254 subnets.

Create three compute instances which will host the Kubernetes worker nodes:

```
IMAGE_ID=$(oci compute image list --operating-system "Canonical Ubuntu" --operating-system-version \
  "20.04" | jq -r .data[0].id)
NUM_ADS=$(oci iam availability-domain list | jq -r .data | jq length)
for i in 0 1 2; do
  # Rudimentary distributing of nodes across Availability Domains and Fault Domains
  AD_NAME=$(oci iam availability-domain list | jq -r .data[$((i % NUM_ADS))].name)
  NUM_FDS=$(oci iam fault-domain list --availability-domain $AD_NAME | jq -r .data | jq length)
  FD_NAME=$(oci iam fault-domain list --availability-domain $AD_NAME | jq -r .data[$((i % NUM_FDS))].name)

  oci compute instance launch --display-name worker-${i} --assign-public-ip true \
    --subnet-id $SUBNET_ID --shape VM.Standard.E3.Flex --availability-domain $AD_NAME \
    --fault-domain $FD_NAME --image-id $IMAGE_ID --shape-config '{"memoryInGBs": 8.0, "ocpus": 2.0}' \
    --private-ip 10.240.0.2${i} \
    --freeform-tags '{"project": "kubernetes-the-hard-way","role":"worker"}' \
    --metadata "{\"ssh_authorized_keys\":\"$(cat kubernetes_ssh_rsa.pub)\",\"pod-cidr\":\"10.200.${i}.0/24\"}" \
    --skip-source-dest-check true
done
```

### Verification

List the compute instances in our compartment:

```
oci compute instance list --sort-by DISPLAYNAME --lifecycle-state RUNNING --all | jq -r .data[] \
  | jq '{"display-name","lifecycle-state"}'
```

> output

```
{
  "display-name": "controller-0",
  "lifecycle-state": "RUNNING"
}
{
  "display-name": "controller-1",
  "lifecycle-state": "RUNNING"
}
{
  "display-name": "controller-2",
  "lifecycle-state": "RUNNING"
}
{
  "display-name": "worker-0",
  "lifecycle-state": "RUNNING"
}
{
  "display-name": "worker-1",
  "lifecycle-state": "RUNNING"
}
{
  "display-name": "worker-2",
  "lifecycle-state": "RUNNING"
}
```

Rerun the above command until all of the compute instances we created are listed as "Running", before continuing on to the next section.

## Verifying SSH Access

Our subnet was created with a default Security List that allows public SSH access, so we can verify at this point that SSH is working:

```
oci-ssh controller-0
```

The first time SSHing into a node, you'll see something like the following, at which point enter "yes":
```
The authenticity of host 'XX.XX.XX.XXX (XX.XX.XX.XXX )' can't be established.
ECDSA key fingerprint is SHA256:xxxxxxxxxxxxxxxxxxxx/xxxxxxxxxxxxxxxxxxxxxx.
Are you sure you want to continue connecting (yes/no/[fingerprint])?
```

```
Welcome to Ubuntu 20.04.1 LTS (GNU/Linux 5.4.0-1029-oracle x86_64)
...
```

Type `exit` at the prompt to exit the `controller-0` compute instance:

```
ubuntu@controller-0:~$ exit
```
> output

```
logout
Connection to XX.XX.XX.XXX closed
```

### Security Lists

For use in later steps of the tutorial, we'll create Security Lists to allow:
- Intra-VCN communication between worker and controller nodes.
- Public access to the NodePort range.
- Public access to the LoadBalancer port.

```
{
  INTRA_VCN_SECURITY_LIST_ID=$(oci network security-list create --display-name intra-vcn \
    --vcn-id $VCN_ID  --ingress-security-rules '[
  {
    "icmp-options": null,
    "is-stateless": true,
    "protocol": "all",
    "source": "10.240.0.0/24",
    "source-type": "CIDR_BLOCK",
    "tcp-options": null,
    "udp-options": null
  }]' --egress-security-rules '[]' | jq -r .data.id)

  WORKER_SECURITY_LIST_ID=$(oci network security-list create --display-name worker \
    --vcn-id $VCN_ID --ingress-security-rules '[
  {
    "icmp-options": null,
    "is-stateless": false,
    "protocol": "6",
    "source": "0.0.0.0/0",
    "source-type": "CIDR_BLOCK",
    "tcp-options": {
      "destination-port-range": {
        "max": 32767,
        "min": 30000
      },
      "source-port-range": null
    },
    "udp-options": null
  }]' --egress-security-rules '[]' | jq -r .data.id)

  LB_SECURITY_LIST_ID=$(oci network security-list create --display-name load-balancer \
    --vcn-id $VCN_ID  --ingress-security-rules '[
  {
    "icmp-options": null,
    "is-stateless": false,
    "protocol": "6",
    "source": "0.0.0.0/0",
    "source-type": "CIDR_BLOCK",
    "tcp-options": {
      "destination-port-range": {
        "max": 6443,
        "min": 6443
      },
      "source-port-range": null
    },
    "udp-options": null
  }]' --egress-security-rules '[]' | jq -r .data.id)
}
```

We'll add these Security Lists to our subnet:
```
{
  DEFAULT_SECURITY_LIST_ID=$(oci network security-list list --display-name \
    "Default Security List for kubernetes-the-hard-way" | jq -r .data[0].id)
  oci network subnet update --subnet-id $SUBNET_ID --force --security-list-ids \
   "[\"$DEFAULT_SECURITY_LIST_ID\",\"$INTRA_VCN_SECURITY_LIST_ID\",\"$WORKER_SECURITY_LIST_ID\",\"$LB_SECURITY_LIST_ID\"]"
}
```

### Firewall Rules

And similarly, we'll open up the firewall of the worker and controller nodes to allow intra-VCN traffic.

```
for instance in controller-0 controller-1 controller-2; do
  oci-ssh ${instance} "sudo ufw allow from 10.240.0.0/24;sudo iptables -A INPUT -i ens3 -s 10.240.0.0/24 -j ACCEPT;sudo iptables -F"
done
for instance in worker-0 worker-1 worker-2; do
  oci-ssh ${instance} "sudo ufw allow from 10.240.0.0/24;sudo iptables -A INPUT -i ens3 -s 10.240.0.0/24 -j ACCEPT;sudo iptables -F"
done
```

### Provision a Network Load Balancer

> An [OCI Load Balancer](https://docs.oracle.com/en-us/iaas/Content/Balance/Concepts/balanceoverview.htm) will be used to expose the Kubernetes API Servers to remote clients.

Create the Load Balancer:

```
LOADBALANCER_ID=$(oci lb load-balancer create --display-name  kubernetes-the-hard-way \
  --shape-name 100Mbps --wait-for-state SUCCEEDED --subnet-ids "[\"$SUBNET_ID\"]" | jq -r .data.id)
```

Create a Backend Set, with Backends for the our 3 controller nodes:
```
{
cat > backends.json <<EOF
[
    {
        "ipAddress": "10.240.0.10",
        "port": 6443,
        "weight": 1
    },
    {
        "ipAddress": "10.240.0.11",
        "port": 6443,
        "weight": 1
    },
    {
        "ipAddress": "10.240.0.12",
        "port": 6443,
        "weight": 1
    }
]
EOF
oci lb backend-set create --name controller-backend-set --load-balancer-id $LOADBALANCER_ID --backends file://backends.json \
  --health-checker-interval-in-ms 10000 --health-checker-port 8888 --health-checker-protocol HTTP \
  --health-checker-retries 3 --health-checker-return-code 200 --health-checker-timeout-in-ms 3000 \
  --health-checker-url-path "/healthz" --policy "ROUND_ROBIN" --wait-for-state SUCCEEDED

oci lb listener create --name controller-listener --default-backend-set-name controller-backend-set \
  --port 6443 --protocol TCP --load-balancer-id $LOADBALANCER_ID  --wait-for-state SUCCEEDED
}
```

At this point, the Load Balancer will be shown as in a "Critical" state - that's ok.  This will be case until we configure the API server on the controller nodes in subsequent steps.

Next: [Provisioning a CA and Generating TLS Certificates](04-certificate-authority.md)