JupyterHub with Kubernetes On Single Bare Metal Instance Tutorial

9 min readMay 2, 2019

Preamble

Currently there are guides out there on how to deploy JupyterHub on Kubernetes using native cloud services. How about if you just wanted to just use a single cloud instance? Or even an on-premise solution?

This guide will cover how to host your own JupyterHub on a single bare metal instance.

Bit of a background…

Jupyter Notebook is an extremely popular web-based IDE amongst software engineers and data scientists. JupyterHub can be understood as the environment that serves Jupyter Notebook to each individual. For more information visit https://jupyter.org/hub

In April 2018, my team and I competed in the Alibaba NAB Cross Border Hackathon and claimed the first prize. We were invited to attend the 2018 Alibaba Cloud Conference in Hangzhou and was awarded $10,000 in Alibaba Cloud credits.

However I have sat on this immense amount of cloud credit which I struggle to utilise and wanted to give it away to people who could fully utilise it.

So this gave me the idea: why not create an environment that provide free computation power for education purposes?

I always loved to code in Python and used Jupyter Notebook extensively. In my research I came across this amazing concept of running JupyterHub with Kubernetes on the cloud. Unfortunately the guide utilises a lot of native cloud services, and I wanted to create a simple JupyterHub environment with as little moving parts as possible. I have re-written a new guide to run JupyterHub with Kubernetes on a single bare metal instance.

I present to you: freenotebooks.io

FreeNotebooks

The Jupyter Notebook is an open-source web application that allows you to create and share documents that contain live…

freenotebooks.io

In this guide I will show you how I build freenotebooks.io.

For this tutorial I will be using a single Alibaba Cloud ECS instance. You may choose the cloud provider of your choice while following this guide, or even just have it on-premise with public internet access.

Note: Most of the guide were based on the original Zero-To-JupyterHub guide found here: https://zero-to-jupyterhub.readthedocs.io/en/latest/index.html — full acknowledgement to the team who made this possible!

If you haven’t used Alibaba Cloud before, here’s a little short guide I wrote on how to create an ECS instance in Alibaba Cloud.

While I tried to make this guide as simple as possible, this guide assumed that you know a little about cloud infrastructure, docker and Kubernetes. I am not an expert myself so feel free to provide feedback!

Let’s get started.

First login to your Cloud Instance, I am currently using Ubuntu 18.04 64bit

Install Docker and Kubectl by running all the commands below

Initialising Kubernetes Cluster

Always remember to disable swap before running Kubernetes

sudo swapoff -a

To initialise cluster, give it a CIDR of 10.244.0.0/16
This may take up to a few minutes to complete, you should receive a notification for a successful initialisation

kubeadm init --pod-network-cidr=10.244.0.0/16

Set the env variable

Every time you logout will clear out KUBECONFIG

export KUBECONFIG=/etc/kubernetes/admin.conf

IP tables settings for Kubernetes

sysctl net.bridge.bridge-nf-call-iptables=1

Use flannel to create layer 3 in Kubernetes

Read more about flannel here: https://github.com/coreos/flannel

kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml

Check that the pods are all running

kubectl --namespace=kube-system get pods

Great! Now you have a fully operational, but blank, Kubernetes cluster. Moving forward we will be using Helm for some of our deployments.

What is Helm?
Helm is a tool for managing Kubernetes charts. Charts are packages of pre-configured Kubernetes resources. Think of it like apt/yum/homebrew for Kubernetes. For more information, visit https://github.com/helm/helm

Setting Up Helm

Untaint all the nodes (this allows proper helm installation)

kubectl taint nodes --all node-role.kubernetes.io/master-

Install Helm
You will be installing helm onto the default kubectl namespace kube-system

curl https://raw.githubusercontent.com/kubernetes/helm/master/scripts/get | bashkubectl --namespace kube-system create serviceaccount tillerkubectl create clusterrolebinding tiller --clusterrole cluster-admin --serviceaccount=kube-system:tiller

helm init --service-account tiller --wait

You should receive the message “Happy Helming!” — indicating a successful helm installation

Set up the helm patch

kubectl patch deployment tiller-deploy --namespace=kube-system --type=json --patch='[{"op": "add", "path": "/spec/template/spec/containers/0/command", "value": ["/tiller", "--listen=localhost:44134"]}]'

Setting up Postgres DB in NFS server

We will need to create a Postgres DB in the Kubernetes Cluster and it has to be on an NFS server. I found that running JupyterHub on a single local volume will cause the default sqlite db to lock up badly. Deploy it in Postgres on a NFS server prevents this issue.

Install NFS server

helm install stable/nfs-server-provisioner --namespace nfsprovisioner --set=storageClass.defaultClass=true

Add Postgres helm repo

helm repo add bitnami https://charts.bitnami.com/bitnami
helm repo update

Install Postgres with parameters
Feel free to change the database name and password if you wish, just note them down since we will be using them later.

helm upgrade --install pgdatabase --namespace pgdatabase bitnami/postgresql \
--set postgresqlPassword=change_this_password \
--set postgresqlDatabase=jhubdb

You will need to wait at least 2 minutes for it to spin up properly. It has to show 1/1 Running otherwise you will cause the postgres deployment to fail if you proceed further. Run this command to check:

kubectl --namespace pgdatabase get pods

Installing Jupyterhub

JupyterHub requires a config.yaml file. It will contain the multiple Helm values to be provided to a JupyterHub Helm chart developed specifically together with this guide.

Generate Proxy.secretToken

openssl rand -hex 32

Generate hub.cookieSecret

openssl rand -hex 32

Write down both values somewhere

Create config.yaml and copy/paste the configuration below

vi config.yaml

You may play around with the setting configuration, please refer to https://zero-to-jupyterhub.readthedocs.io/en/latest/reference.html#helm-chart-configuration-reference

Install Jupyterhub helm repo

helm repo add jupyterhub https://jupyterhub.github.io/helm-chart/
helm repo update

Installing JupyterHub

RELEASE=jhub
NAMESPACE=jhubhelm upgrade --install $RELEASE jupyterhub/jupyterhub \
--namespace $NAMESPACE  \
--version=0.8.2 \
--values config.yaml

Jupyterhub looks like it has been installed, lets do some post-installation checks

Check jhub pods

kubectl --namespace jhub get pods

Pods looks stable and running

Check the internal ip address

curl -i $(kubectl --namespace jhub get service | grep proxy-public | awk '{print $3}')

It appears that the service can be seen locally on the server! But what happens when we try to reach the ip address through the browser?

It appears that we cannot access Jupyterhub from the outside world. Let’s dive in deeper.

Let’s check jhub services

kubectl --namespace jhub get service

The proxy-public is looking for an external ip. Recall that Kubernetes leverages a lot of cloud native services. This proxy-public typically points to an actual cloud load balancer.

This tutorial focuses on using a single bare-metal instance to build the JupyterHub from the ground up, so we won’t be creating any expensive cloud load balancer.

What we can do is to create an internal software load balancer and have the Kubernetes cluster pointed to it!

Configuring Metal Load Balancer

To do this will we use something called “Metal Load Balancer”
Read more here: https://metallb.universe.tf/

Create metal_config.yaml using the following file

Replace PUBLIC_IP_ADDRESS with the actual public ip address of your instance

vi metal_config.yaml

Apply the default metallb.yaml configuration first

kubectl apply -f https://raw.githubusercontent.com/google/metallb/v0.7.3/manifests/metallb.yaml

Apply custom metal load balancer configuration — metal_config.yaml

kubectl apply -f metal_config.yaml

Check the metal load balancer status

kubectl --namespace=metallb-system get pods

kubectl logs -l component=speaker -n metallb-system

Most important of all, check that the jhub picks up the external ip address

kubectl --namespace jhub get service

Well done! The proxy-public managed to picked up the external IP address with port 80 mapped to 31375 and port 443 mapped to 31208

Let’s try accessing from the browser again

Opps looks like we still can’t!

We can tell that the proxy-public has been expose to the world, just not on port 80. In this case it is port 31375. In Kubernetes this is called a node port.

Metal Load Balancer will randomly use any ports from the node port range of 30000–32767
To test this theory, you can open up the node port from the security group and access the ip address with the node port. This will work.

But let’s get it working on port 80

Install NGINX reverse proxy

We will now use NGINX as a reverse proxy to redirect request for port 80 into its internal proxy-public endpoint

Create an NGINX file

vi default

Copy and paste the values from below
Modify the IP_ADDRESS_HERE to the actual public IP address

Install and configure NGINX

sudo apt-get update
sudo apt-get -y install nginx
sudo systemctl restart nginx
mkdir /root/logs
cp default /etc/nginx/sites-available/default
sudo nginx -t
sudo systemctl reload nginx

Now you may visit the ip address on the browser using port 80

You have now successfully configured Jupyterhub with Kubernetes on Single Bare Metal Instance

Tearing It All Down

helm delete jhub --purge
kubectl delete namespace jhub
helm delete pgdatabase --purge
kubectl delete namespace pgdatabase
kubectl drain $(hostname) --delete-local-data --force --ignore-daemonsets
kubectl delete node $(hostname)
kubeadm reset -f
iptables -F && iptables -t nat -F && iptables -t mangle -F && iptables -X

You can also find some of the template files in this repository: https://github.com/gpaw/jupyterhub_with_kubernetes_on_single_bare_metal_instance_tutorial