kubernetes-the-hard-way/docs/03-compute-resources.md

323 lines
13 KiB
Markdown
Raw Normal View History

2017-08-29 00:19:25 +03:00
# Provisioning Compute Resources
2021-02-25 23:00:13 +03:00
Kubernetes requires a set of machines to host the Kubernetes control plane and the worker nodes where containers are ultimately run. In this lab you will provision the compute resources required for running a secure and highly available Kubernetes cluster, spread across the [Availability Domains and Fault Domains](https://docs.oracle.com/en-us/iaas/Content/General/Concepts/regions.htm) of a single OCI [Region](https://docs.oracle.com/en-us/iaas/Content/General/Concepts/regions.htm).
2017-08-29 00:19:25 +03:00
> Ensure a default compute zone and region have been set as described in the [Prerequisites](01-prerequisites.md#set-a-default-compute-region-and-zone) lab.
## Networking
The Kubernetes [networking model](https://kubernetes.io/docs/concepts/cluster-administration/networking/#kubernetes-model) assumes a flat network in which containers and nodes can communicate with each other. In cases where this is not desired [network policies](https://kubernetes.io/docs/concepts/services-networking/network-policies/) can limit how groups of containers are allowed to communicate with each other and external network endpoints.
> Setting up network policies is out of scope for this tutorial.
2021-02-05 09:40:21 +03:00
### Virtual Cloud Network
2017-08-29 00:19:25 +03:00
2021-02-25 23:00:13 +03:00
In this section a dedicated [Virtual Cloud Network](https://www.oracle.com/cloud/networking/virtual-cloud-network/) (VCN) will be setup to host the Kubernetes cluster.
2017-08-29 00:19:25 +03:00
2021-02-05 09:40:21 +03:00
Create the `kubernetes-the-hard-way` custom VCN:
2017-08-29 00:19:25 +03:00
```
VCN_ID=$(oci network vcn create --display-name kubernetes-the-hard-way --dns-label vcn --cidr-block \
10.240.0.0/24 | jq -r .data.id)
2017-08-29 00:19:25 +03:00
```
2021-02-05 09:40:21 +03:00
A [subnet](https://docs.oracle.com/en-us/iaas/Content/Network/Tasks/managingVCNs_topic-Overview_of_VCNs_and_Subnets.htm#Overview) must be provisioned with an IP address range large enough to assign a private IP address to each node in the Kubernetes cluster.
2017-08-29 00:19:25 +03:00
2021-02-05 09:40:21 +03:00
Create the `kubernetes` subnet in the `kubernetes-the-hard-way` VCN, along with a Route Table and Internet Gateway allowing traffic to the internet.
2017-08-29 00:19:25 +03:00
```
INTERNET_GATEWAY_ID=$(oci network internet-gateway create --display-name kubernetes-the-hard-way \
--vcn-id $VCN_ID --is-enabled true | jq -r .data.id)
ROUTE_TABLE_ID=$(oci network route-table create --display-name kubernetes-the-hard-way --vcn-id $VCN_ID \
--route-rules "[{\"cidrBlock\":\"0.0.0.0/0\",\"networkEntityId\":\"$INTERNET_GATEWAY_ID\"}]" \
| jq -r .data.id)
SUBNET_ID=$(oci network subnet create --display-name kubernetes --vcn-id $VCN_ID --dns-label subnet \
2021-02-05 09:40:21 +03:00
--cidr-block 10.240.0.0/24 --route-table-id $ROUTE_TABLE_ID | jq -r .data.id)
2017-08-29 00:19:25 +03:00
```
> The `10.240.0.0/24` IP address range can host up to 254 compute instances.
2021-02-05 09:40:21 +03:00
:warning: **Note**: For simplicity and to stay close to the original kubernetes-the-hard-way, we will be using a single subnet, shared between the Kubernetes worker nodes, controller plane nodes, and LoadBalancer. A production-caliber setup would consist of at least:
- A dedicated public subnet for the public LoadBalancer.
- A dedicated private subnet for controller plan nodes.
- A dedicated private subnet for worker nodes. This setup would not allow NodePort access to services.
2017-08-29 00:19:25 +03:00
2021-02-05 09:40:21 +03:00
## Compute Instances
2017-08-29 00:19:25 +03:00
2021-02-05 09:40:21 +03:00
The compute instances in this lab will be provisioned using [Ubuntu Server](https://www.ubuntu.com/server) 20.04, which has good support for the [containerd container runtime](https://github.com/containerd/containerd). Each compute instance will be provisioned with a fixed private IP address to simplify the Kubernetes bootstrapping process.
2017-08-29 00:19:25 +03:00
2021-02-05 09:40:21 +03:00
:warning: **Note**: For simplicity in this tutorial, we will be accessing controller and worker nodes over SSH, using public addresses. A production-caliber setup would instead run controller and worker nodes in _private_ subnets, with any direct SSH access done via [Bastions](https://docs.oracle.com/en-us/iaas/Content/Resources/Assets/whitepapers/bastion-hosts.pdf) when required.
2017-08-29 00:19:25 +03:00
2021-02-05 09:40:21 +03:00
### Create SSH Keys
2017-08-29 00:19:25 +03:00
2021-02-05 09:40:21 +03:00
Generate an RSA key pair, which we'll use for SSH access to our compute nodes:
2017-08-29 00:19:25 +03:00
```
2021-02-05 09:40:21 +03:00
ssh-keygen -b 2048 -t rsa -f kubernetes_ssh_rsa
2017-08-29 00:19:25 +03:00
```
2021-02-05 09:40:21 +03:00
Enter a passphrase at the prompt to continue:
2017-08-29 00:19:25 +03:00
```
2021-02-05 09:40:21 +03:00
Generating public/private rsa key pair.
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
2017-08-29 00:19:25 +03:00
```
2021-02-05 09:40:21 +03:00
Results:
2017-08-29 00:19:25 +03:00
```
2021-02-05 09:40:21 +03:00
kubernetes_ssh_rsa
kubernetes_ssh_rsa.pub
2017-08-29 00:19:25 +03:00
```
### Kubernetes Controllers
Create three compute instances which will host the Kubernetes control plane:
```
IMAGE_ID=$(oci compute image list --operating-system "Canonical Ubuntu" --operating-system-version \
"20.04" | jq -r .data[0].id)
NUM_ADS=$(oci iam availability-domain list | jq -r .data | jq length)
2017-08-29 00:19:25 +03:00
for i in 0 1 2; do
# Rudimentary distributing of nodes across Availability Domains and Fault Domains
2021-02-05 09:40:21 +03:00
AD_NAME=$(oci iam availability-domain list | jq -r .data[$((i % NUM_ADS))].name)
NUM_FDS=$(oci iam fault-domain list --availability-domain $AD_NAME | jq -r .data | jq length)
FD_NAME=$(oci iam fault-domain list --availability-domain $AD_NAME | jq -r .data[$((i % NUM_FDS))].name)
oci compute instance launch --display-name controller-${i} --assign-public-ip true \
--subnet-id $SUBNET_ID --shape VM.Standard.E3.Flex --availability-domain $AD_NAME \
--fault-domain $FD_NAME --image-id $IMAGE_ID --shape-config '{"memoryInGBs": 8.0, "ocpus": 2.0}' \
--private-ip 10.240.0.1${i} \
--freeform-tags '{"project": "kubernetes-the-hard-way","role":"controller"}' \
2021-02-05 09:40:21 +03:00
--metadata "{\"ssh_authorized_keys\":\"$(cat kubernetes_ssh_rsa.pub)\"}"
2017-08-29 00:19:25 +03:00
done
```
### Kubernetes Workers
Each worker instance requires a pod subnet allocation from the Kubernetes cluster CIDR range. The pod subnet allocation will be used to configure container networking in a later exercise. The `pod-cidr` instance metadata will be used to expose pod subnet allocations to compute instances at runtime.
> The Kubernetes cluster CIDR range is defined by the Controller Manager's `--cluster-cidr` flag. In this tutorial the cluster CIDR range will be set to `10.200.0.0/16`, which supports 254 subnets.
Create three compute instances which will host the Kubernetes worker nodes:
```
IMAGE_ID=$(oci compute image list --operating-system "Canonical Ubuntu" --operating-system-version \
"20.04" | jq -r .data[0].id)
NUM_ADS=$(oci iam availability-domain list | jq -r .data | jq length)
2017-08-29 00:19:25 +03:00
for i in 0 1 2; do
# Rudimentary distributing of nodes across Availability Domains and Fault Domains
2021-02-05 09:40:21 +03:00
AD_NAME=$(oci iam availability-domain list | jq -r .data[$((i % NUM_ADS))].name)
NUM_FDS=$(oci iam fault-domain list --availability-domain $AD_NAME | jq -r .data | jq length)
FD_NAME=$(oci iam fault-domain list --availability-domain $AD_NAME | jq -r .data[$((i % NUM_FDS))].name)
oci compute instance launch --display-name worker-${i} --assign-public-ip true \
--subnet-id $SUBNET_ID --shape VM.Standard.E3.Flex --availability-domain $AD_NAME \
--fault-domain $FD_NAME --image-id $IMAGE_ID --shape-config '{"memoryInGBs": 8.0, "ocpus": 2.0}' \
--private-ip 10.240.0.2${i} \
--freeform-tags '{"project": "kubernetes-the-hard-way","role":"worker"}' \
2021-02-05 09:40:21 +03:00
--metadata "{\"ssh_authorized_keys\":\"$(cat kubernetes_ssh_rsa.pub)\",\"pod-cidr\":\"10.200.${i}.0/24\"}" \
--skip-source-dest-check true
2017-10-02 06:37:09 +03:00
done
2017-08-29 00:19:25 +03:00
```
### Verification
2021-02-05 09:40:21 +03:00
List the compute instances in our compartment:
2017-08-29 00:19:25 +03:00
```
oci compute instance list --sort-by DISPLAYNAME --lifecycle-state RUNNING --all | jq -r .data[] \
| jq '{"display-name","lifecycle-state"}'
2017-08-29 00:19:25 +03:00
```
> output
```
2021-02-05 09:40:21 +03:00
{
"display-name": "controller-0",
"lifecycle-state": "RUNNING"
}
{
"display-name": "controller-1",
"lifecycle-state": "RUNNING"
}
{
"display-name": "controller-2",
"lifecycle-state": "RUNNING"
}
{
"display-name": "worker-0",
"lifecycle-state": "RUNNING"
}
{
"display-name": "worker-1",
"lifecycle-state": "RUNNING"
}
{
"display-name": "worker-2",
"lifecycle-state": "RUNNING"
}
2017-08-29 00:19:25 +03:00
```
2021-02-25 23:00:13 +03:00
Rerun the above command until all of the compute instances we created are listed as "Running", before continuing on to the next section.
2021-02-05 09:40:21 +03:00
## Verifying SSH Access
2021-02-05 09:40:21 +03:00
Our subnet was created with a default Security List that allows public SSH access, so we can verify at this point that SSH is working:
```
2021-02-05 09:40:21 +03:00
oci-ssh controller-0
```
2021-02-05 09:40:21 +03:00
The first time SSHing into a node, you'll see something like the following, at which point enter "yes":
```
The authenticity of host 'XX.XX.XX.XXX (XX.XX.XX.XXX )' can't be established.
ECDSA key fingerprint is SHA256:xxxxxxxxxxxxxxxxxxxx/xxxxxxxxxxxxxxxxxxxxxx.
Are you sure you want to continue connecting (yes/no/[fingerprint])?
```
```
2021-02-05 09:40:21 +03:00
Welcome to Ubuntu 20.04.1 LTS (GNU/Linux 5.4.0-1029-oracle x86_64)
...
```
2021-02-05 09:40:21 +03:00
Type `exit` at the prompt to exit the `controller-0` compute instance:
```
2021-02-05 09:40:21 +03:00
ubuntu@controller-0:~$ exit
```
2021-02-05 09:40:21 +03:00
> output
2021-02-05 09:40:21 +03:00
```
logout
Connection to XX.XX.XX.XXX closed
```
2021-02-05 09:40:21 +03:00
### Security Lists
For use in later steps of the tutorial, we'll create Security Lists to allow:
- Intra-VCN communication between worker and controller nodes.
- Public access to the NodePort range.
- Public access to the LoadBalancer port.
```
{
INTRA_VCN_SECURITY_LIST_ID=$(oci network security-list create --display-name intra-vcn \
--vcn-id $VCN_ID --ingress-security-rules '[
{
"icmp-options": null,
"is-stateless": true,
"protocol": "all",
"source": "10.240.0.0/24",
"source-type": "CIDR_BLOCK",
"tcp-options": null,
"udp-options": null
}]' --egress-security-rules '[]' | jq -r .data.id)
WORKER_SECURITY_LIST_ID=$(oci network security-list create --display-name worker \
--vcn-id $VCN_ID --ingress-security-rules '[
{
"icmp-options": null,
"is-stateless": false,
"protocol": "6",
"source": "0.0.0.0/0",
"source-type": "CIDR_BLOCK",
"tcp-options": {
"destination-port-range": {
"max": 32767,
"min": 30000
},
"source-port-range": null
2021-02-05 09:40:21 +03:00
},
"udp-options": null
}]' --egress-security-rules '[]' | jq -r .data.id)
LB_SECURITY_LIST_ID=$(oci network security-list create --display-name load-balancer \
--vcn-id $VCN_ID --ingress-security-rules '[
{
"icmp-options": null,
"is-stateless": false,
"protocol": "6",
"source": "0.0.0.0/0",
"source-type": "CIDR_BLOCK",
"tcp-options": {
"destination-port-range": {
"max": 6443,
"min": 6443
},
"source-port-range": null
2021-02-05 09:40:21 +03:00
},
"udp-options": null
}]' --egress-security-rules '[]' | jq -r .data.id)
}
2021-02-05 09:40:21 +03:00
```
We'll add these Security Lists to our subnet:
```
{
DEFAULT_SECURITY_LIST_ID=$(oci network security-list list --display-name \
"Default Security List for kubernetes-the-hard-way" | jq -r .data[0].id)
oci network subnet update --subnet-id $SUBNET_ID --force --security-list-ids \
"[\"$DEFAULT_SECURITY_LIST_ID\",\"$INTRA_VCN_SECURITY_LIST_ID\",\"$WORKER_SECURITY_LIST_ID\",\"$LB_SECURITY_LIST_ID\"]"
}
```
2021-02-05 09:40:21 +03:00
### Firewall Rules
And similarly, we'll open up the firewall of the worker and controller nodes to allow intra-VCN traffic.
```
for instance in controller-0 controller-1 controller-2; do
oci-ssh ${instance} "sudo ufw allow from 10.240.0.0/24;sudo iptables -A INPUT -i ens3 -s 10.240.0.0/24 -j ACCEPT;sudo iptables -F"
done
for instance in worker-0 worker-1 worker-2; do
oci-ssh ${instance} "sudo ufw allow from 10.240.0.0/24;sudo iptables -A INPUT -i ens3 -s 10.240.0.0/24 -j ACCEPT;sudo iptables -F"
done
```
2021-02-05 09:40:21 +03:00
### Provision a Network Load Balancer
> An [OCI Load Balancer](https://docs.oracle.com/en-us/iaas/Content/Balance/Concepts/balanceoverview.htm) will be used to expose the Kubernetes API Servers to remote clients.
Create the Load Balancer:
```
2021-02-05 09:40:21 +03:00
LOADBALANCER_ID=$(oci lb load-balancer create --display-name kubernetes-the-hard-way \
--shape-name 100Mbps --wait-for-state SUCCEEDED --subnet-ids "[\"$SUBNET_ID\"]" | jq -r .data.id)
```
2021-02-05 09:40:21 +03:00
Create a Backend Set, with Backends for the our 3 controller nodes:
```
{
2021-02-05 09:40:21 +03:00
cat > backends.json <<EOF
[
{
"ipAddress": "10.240.0.10",
"port": 6443,
"weight": 1
},
{
"ipAddress": "10.240.0.11",
"port": 6443,
"weight": 1
},
{
"ipAddress": "10.240.0.12",
"port": 6443,
"weight": 1
}
]
EOF
oci lb backend-set create --name controller-backend-set --load-balancer-id $LOADBALANCER_ID --backends file://backends.json \
--health-checker-interval-in-ms 10000 --health-checker-port 8888 --health-checker-protocol HTTP \
--health-checker-retries 3 --health-checker-return-code 200 --health-checker-timeout-in-ms 3000 \
--health-checker-url-path "/healthz" --policy "ROUND_ROBIN" --wait-for-state SUCCEEDED
oci lb listener create --name controller-listener --default-backend-set-name controller-backend-set \
--port 6443 --protocol TCP --load-balancer-id $LOADBALANCER_ID --wait-for-state SUCCEEDED
}
```
At this point, the Load Balancer will be shown as in a "Critical" state - that's ok. This will be case until we configure the API server on the controller nodes in subsequent steps.
2021-02-05 09:40:21 +03:00
2017-08-29 00:19:25 +03:00
Next: [Provisioning a CA and Generating TLS Certificates](04-certificate-authority.md)