JupyterHub with Kubernetes On Single Bare Metal Instance Tutorial
Preamble
Currently there are guides out there on how to deploy JupyterHub on Kubernetes using native cloud services. How about if you just wanted to just use a single cloud instance? Or even an on-premise solution?
This guide will cover how to host your own JupyterHub on a single bare metal instance.
Bit of a background…
Jupyter Notebook is an extremely popular web-based IDE amongst software engineers and data scientists. JupyterHub can be understood as the environment that serves Jupyter Notebook to each individual. For more information visit https://jupyter.org/hub
In April 2018, my team and I competed in the Alibaba NAB Cross Border Hackathon and claimed the first prize. We were invited to attend the 2018 Alibaba Cloud Conference in Hangzhou and was awarded $10,000 in Alibaba Cloud credits.
However I have sat on this immense amount of cloud credit which I struggle to utilise and wanted to give it away to people who could fully utilise it.
So this gave me the idea: why not create an environment that provide free computation power for education purposes?
I always loved to code in Python and used Jupyter Notebook extensively. In my research I came across this amazing concept of running JupyterHub with Kubernetes on the cloud. Unfortunately the guide utilises a lot of native cloud services, and I wanted to create a simple JupyterHub environment with as little moving parts as possible. I have re-written a new guide to run JupyterHub with Kubernetes on a single bare metal instance.
I present to you: freenotebooks.io
In this guide I will show you how I build freenotebooks.io.
For this tutorial I will be using a single Alibaba Cloud ECS instance. You may choose the cloud provider of your choice while following this guide, or even just have it on-premise with public internet access.
Note: Most of the guide were based on the original Zero-To-JupyterHub guide found here: https://zero-to-jupyterhub.readthedocs.io/en/latest/index.html — full acknowledgement to the team who made this possible!
If you haven’t used Alibaba Cloud before, here’s a little short guide I wrote on how to create an ECS instance in Alibaba Cloud.
While I tried to make this guide as simple as possible, this guide assumed that you know a little about cloud infrastructure, docker and Kubernetes. I am not an expert myself so feel free to provide feedback!
Let’s get started.
First login to your Cloud Instance, I am currently using Ubuntu 18.04 64bit
Install Docker and Kubectl by running all the commands below
Initialising Kubernetes Cluster
Always remember to disable swap before running Kubernetes
sudo swapoff -a
To initialise cluster, give it a CIDR of 10.244.0.0/16
This may take up to a few minutes to complete, you should receive a notification for a successful initialisation
kubeadm init --pod-network-cidr=10.244.0.0/16
Set the env variable
Every time you logout will clear out KUBECONFIG
export KUBECONFIG=/etc/kubernetes/admin.conf
IP tables settings for Kubernetes
sysctl net.bridge.bridge-nf-call-iptables=1
Use flannel to create layer 3 in Kubernetes
Read more about flannel here: https://github.com/coreos/flannel
kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml
Check that the pods are all running
kubectl --namespace=kube-system get pods
Great! Now you have a fully operational, but blank, Kubernetes cluster. Moving forward we will be using Helm for some of our deployments.
What is Helm?
Helm is a tool for managing Kubernetes charts. Charts are packages of pre-configured Kubernetes resources. Think of it like apt/yum/homebrew for Kubernetes. For more information, visit https://github.com/helm/helm
Setting Up Helm
Untaint all the nodes (this allows proper helm installation)
kubectl taint nodes --all node-role.kubernetes.io/master-
Install Helm
You will be installing helm onto the default kubectl namespace kube-system
curl https://raw.githubusercontent.com/kubernetes/helm/master/scripts/get | bashkubectl --namespace kube-system create serviceaccount tillerkubectl create clusterrolebinding tiller --clusterrole cluster-admin --serviceaccount=kube-system:tiller
helm init --service-account tiller --wait
You should receive the message “Happy Helming!” — indicating a successful helm installation
Set up the helm patch
kubectl patch deployment tiller-deploy --namespace=kube-system --type=json --patch='[{"op": "add", "path": "/spec/template/spec/containers/0/command", "value": ["/tiller", "--listen=localhost:44134"]}]'
Setting up Postgres DB in NFS server
We will need to create a Postgres DB in the Kubernetes Cluster and it has to be on an NFS server. I found that running JupyterHub on a single local volume will cause the default sqlite db to lock up badly. Deploy it in Postgres on a NFS server prevents this issue.
Install NFS server
helm install stable/nfs-server-provisioner --namespace nfsprovisioner --set=storageClass.defaultClass=true
Add Postgres helm repo
helm repo add bitnami https://charts.bitnami.com/bitnami
helm repo update
Install Postgres with parameters
Feel free to change the database name and password if you wish, just note them down since we will be using them later.
helm upgrade --install pgdatabase --namespace pgdatabase bitnami/postgresql \
--set postgresqlPassword=change_this_password \
--set postgresqlDatabase=jhubdb
You will need to wait at least 2 minutes for it to spin up properly. It has to show 1/1 Running otherwise you will cause the postgres deployment to fail if you proceed further. Run this command to check:
kubectl --namespace pgdatabase get pods
Installing Jupyterhub
JupyterHub requires a config.yaml file. It will contain the multiple Helm values to be provided to a JupyterHub Helm chart developed specifically together with this guide.
Generate Proxy.secretToken
openssl rand -hex 32
Generate hub.cookieSecret
openssl rand -hex 32
Write down both values somewhere
Create config.yaml and copy/paste the configuration below
vi config.yaml
You may play around with the setting configuration, please refer to https://zero-to-jupyterhub.readthedocs.io/en/latest/reference.html#helm-chart-configuration-reference
Install Jupyterhub helm repo
helm repo add jupyterhub https://jupyterhub.github.io/helm-chart/
helm repo update
Installing JupyterHub
RELEASE=jhub
NAMESPACE=jhubhelm upgrade --install $RELEASE jupyterhub/jupyterhub \
--namespace $NAMESPACE \
--version=0.8.2 \
--values config.yaml
Jupyterhub looks like it has been installed, lets do some post-installation checks
Check jhub pods
kubectl --namespace jhub get pods
Pods looks stable and running
Check the internal ip address
curl -i $(kubectl --namespace jhub get service | grep proxy-public | awk '{print $3}')
It appears that the service can be seen locally on the server! But what happens when we try to reach the ip address through the browser?
It appears that we cannot access Jupyterhub from the outside world. Let’s dive in deeper.
Let’s check jhub services
kubectl --namespace jhub get service
The proxy-public is looking for an external ip. Recall that Kubernetes leverages a lot of cloud native services. This proxy-public typically points to an actual cloud load balancer.
This tutorial focuses on using a single bare-metal instance to build the JupyterHub from the ground up, so we won’t be creating any expensive cloud load balancer.
What we can do is to create an internal software load balancer and have the Kubernetes cluster pointed to it!
Configuring Metal Load Balancer
To do this will we use something called “Metal Load Balancer”
Read more here: https://metallb.universe.tf/
Create metal_config.yaml using the following file
Replace PUBLIC_IP_ADDRESS with the actual public ip address of your instance
vi metal_config.yaml
Apply the default metallb.yaml configuration first
kubectl apply -f https://raw.githubusercontent.com/google/metallb/v0.7.3/manifests/metallb.yaml
Apply custom metal load balancer configuration — metal_config.yaml
kubectl apply -f metal_config.yaml
Check the metal load balancer status
kubectl --namespace=metallb-system get pods
kubectl logs -l component=speaker -n metallb-system
Most important of all, check that the jhub picks up the external ip address
kubectl --namespace jhub get service
Well done! The proxy-public managed to picked up the external IP address with port 80 mapped to 31375 and port 443 mapped to 31208
Let’s try accessing from the browser again
Opps looks like we still can’t!
We can tell that the proxy-public has been expose to the world, just not on port 80. In this case it is port 31375. In Kubernetes this is called a node port.
Metal Load Balancer will randomly use any ports from the node port range of 30000–32767
To test this theory, you can open up the node port from the security group and access the ip address with the node port. This will work.
But let’s get it working on port 80
Install NGINX reverse proxy
We will now use NGINX as a reverse proxy to redirect request for port 80 into its internal proxy-public endpoint
Create an NGINX file
vi default
Copy and paste the values from below
Modify the IP_ADDRESS_HERE to the actual public IP address
Install and configure NGINX
sudo apt-get update
sudo apt-get -y install nginx
sudo systemctl restart nginx
mkdir /root/logs
cp default /etc/nginx/sites-available/default
sudo nginx -t
sudo systemctl reload nginx
Now you may visit the ip address on the browser using port 80
You have now successfully configured Jupyterhub with Kubernetes on Single Bare Metal Instance
Tearing It All Down
helm delete jhub --purge
kubectl delete namespace jhub
helm delete pgdatabase --purge
kubectl delete namespace pgdatabase
kubectl drain $(hostname) --delete-local-data --force --ignore-daemonsets
kubectl delete node $(hostname)
kubeadm reset -f
iptables -F && iptables -t nat -F && iptables -t mangle -F && iptables -X
You can also find some of the template files in this repository: https://github.com/gpaw/jupyterhub_with_kubernetes_on_single_bare_metal_instance_tutorial