Installing Pulsar on Kubernetes using Helm

In the first part of our blog post series “getting started with pulsar on kubernetes” we go through the steps of deploying the core components of Apache Pulsar on Kubernetes.

You can find the full list of all blog posts here.

prerequisites

The following prerequisites should be met:

You have a Kubernetes cluster installed and configured locally (kubectl is working).
The cluster is already set up with helm and your local helm client is also configured accordingly.
An Ingress controller (nginx) and endpoints are available.

First steps

To deploy Pulsar on your cluster we are going to utilize the helm chart from the official Pulsar repository. After cloning the repository checkout the release branch v2.4.1:

$ git clone \
--depth 1 \
--single-branch \
--branch v2.4.1 \
https://github.com/apache/pulsar.git

$ cd pulsar

Deployment on Kubernetes

We want to deploy Pulsar in a separate namespace called pulsar-demo. To create the namespace do:

$ echo '{ "kind": "Namespace", "apiVersion": "v1", "metadata": { "name": "pulsar-demo", "labels": { "name": "pulsar-demo" } } }' | kubectl create -f -

Going forward we will use pulsar-demo as namespace, feel free to change it. To deploy the helm chart you will need to provide a values.yaml defining all components for the deployment. Navigate to deployment/kubernetes/helm/pulsar in the repository. This is where you find an example values.yaml and a file called values-mini.yaml. You can use the latter if you just want to test a pulsar build or if your cluster is not very powerful. The file is optimized for a minikube cluster. Using values.yaml will require much more resources. We modified it slightly to cut down the footprint of the installation in terms of CPU and memory.

You can find our file here. Just download and apply it directly using:

$ curl https://www.syscrest.com/pulsar/reduced-footprint-values.yaml > reduced-footprint-values.yaml
$ helm install deployment/kubernetes/helm/pulsar --name pulsar -f reduced-footprint-values.yaml

We use the --name option with our helm install command to create some consistency, which could be useful in case you have to redeploy the cluster. Otherwise, helm would just assign a random deployment name.

So our deployment file looks like this:

## Namespace to deploy pulsar
namespace: pulsar-demo
namespaceCreate: no

persistence: yes

zookeeper:
  resources:
    requests:
      ## default was: 15GB
      memory: 4Gi
      ## default was: 4 
      cpu: 1
  configData:
    ## adjusted memory settings
    PULSAR_MEM: "\"-Xms3g -Xmx3g -Dcom.sun.management.jmxremote -Djute.maxbuffer=10485760 -XX:+ParallelRefProcEnabled -XX:+UnlockExperimentalVMOptions -XX:+AggressiveOpts -XX:+DoEscapeAnalysis -XX:+DisableExplicitGC -XX:+PerfDisableSharedMem -Dzookeeper.forceSync=no\""
    PULSAR_GC: "\"-XX:+UseG1GC -XX:MaxGCPauseMillis=10\""

bookkeeper:
  replicaCount: 4
  resources:
    requests:
      ## default was: 15GB
      memory: 4Gi
      ## default was: 4
      cpu: 1
  configData:
    ## adjusted memory settings
    PULSAR_MEM: "\"-Xms3g -Xmx3g -XX:MaxDirectMemorySize=3g -Dio.netty.leakDetectionLevel=disabled -Dio.netty.recycler.linkCapacity=1024 -XX:+UseG1GC -XX:MaxGCPauseMillis=10 -XX:+ParallelRefProcEnabled -XX:+UnlockExperimentalVMOptions -XX:+AggressiveOpts -XX:+DoEscapeAnalysis -XX:ParallelGCThreads=32 -XX:ConcGCThreads=32 -XX:G1NewSizePercent=50 -XX:+DisableExplicitGC -XX:-ResizePLAB -XX:+ExitOnOutOfMemoryError -XX:+PerfDisableSharedMem -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCApplicationStoppedTime -XX:+PrintHeapAtGC -verbosegc -XX:G1LogLevel=finest\""
    dbStorage_writeCacheMaxSizeMb: "512"
    dbStorage_readAheadCacheMaxSizeMb: "512"
    dbStorage_rocksDB_blockCacheSize: "268435456"
    journalMaxSizeMB: "512"

broker:
  component: broker
  replicaCount: 3
  resources:
    requests:
      ## default was: 15GB
      memory: 4Gi
      ## default was: 4
      cpu: 1
  configData:
    ## adjusted memory settings
    PULSAR_MEM: "\"-Xms3g -Xmx3g -XX:MaxDirectMemorySize=3g -Dio.netty.leakDetectionLevel=disabled -Dio.netty.recycler.linkCapacity=1024 -XX:+ParallelRefProcEnabled -XX:+UnlockExperimentalVMOptions -XX:+AggressiveOpts -XX:+DoEscapeAnalysis -XX:ParallelGCThreads=32 -XX:ConcGCThreads=32 -XX:G1NewSizePercent=50 -XX:+DisableExplicitGC -XX:-ResizePLAB -XX:+ExitOnOutOfMemoryError -XX:+PerfDisableSharedMem\""
    PULSAR_GC: "\"-XX:+UseG1GC -XX:MaxGCPauseMillis=10\""

After executing the helm install, you will have to wait until all pods are running and in ready state. Check the pods’ states with the following command:

$ kubectl -n pulsar-demo get pods

You can either repeatedly enter the above command until all pods are ready or just watch it. The output should finally look something like this.:

NAME                                  READY   STATUS      RESTARTS   AGE
pulsar-autorecovery-d57f6bb5d-p42cl   1/1     Running     0          56s
pulsar-bastion-b7b865cf9-ldfvw        1/1     Running     0          56s
pulsar-bookkeeper-0                   1/1     Running     0          56s
pulsar-bookkeeper-1                   1/1     Running     0          23s
pulsar-bookkeeper-2                   1/1     Running     0          14s
pulsar-bookkeeper-3                   1/1     Running     0          14s
pulsar-broker-57b7f87f78-m28fn        1/1     Running     0          56s
pulsar-broker-57b7f87f78-n8skw        1/1     Running     0          56s
pulsar-broker-57b7f87f78-we394        1/1     Running     0          55s
pulsar-dashboard-658fdff9bb-g8vkd     1/1     Running     0          56s
pulsar-grafana-559456d659-6jm8r       1/1     Running     0          56s
pulsar-prometheus-777f7c8868-ktcjj    1/1     Running     0          56s
pulsar-proxy-57cb48dd9b-hb4p6         1/1     Running     0          56s
pulsar-zookeeper-0                    1/1     Running     0          56s
pulsar-zookeeper-1                    1/1     Running     0          41s
pulsar-zookeeper-2                    1/1     Running     0          34s
pulsar-zookeeper-metadata-hsnd7       0/1     Completed   0          56s

You see a lot of pods there and they each play a part. Let’s go briefly over the important ones:

First and foremost helm initializes a bastion pod (pulsar-bastion-b7b865cf9-ldfvw in the list above). This pod is used for all the administrative work on the cluster. So all your commands (including pulsar-admin) will be executed in this container’s shell.
The bookkeepers are responsible for the durable message storage in your Pulsar cluster (they each have a 50 GB persistent volume attached)
The brokers consist of two components:
1. An HTTP server exposing a REST interface administration and topic lookup.
2. A dispatcher that handles all Pulsar message transfers.
The dashboard enables users to monitor current stats for all topics via a web application. It utilizes prometheus for monitoring and grafana for visualization.
The proxy is an optional gateway that you can run in front of the brokers in a Pulsar cluster.
And finally zookeeper which is responsible for a wide variety of confiugration- and coordination-related tasks.

Setting up the environment

To execute commands in Pulsar you need to access the shell of the bastion pod. You can do that with:

$ kubectl -n pulsar-demo exec $(kubectl get pods --namespace pulsar-demo -l "app=pulsar,component=bastion" -o jsonpath="{.items[0].metadata.name}") -it -- bash

You can also set up aliases of commonly used commands so you don’t have to access the shell everytime you want to do something:

$ alias pulsar-admin='kubectl -n pulsar-demo exec $(kubectl get pods --namespace pulsar-demo -l "app=pulsar,component=bastion" -o jsonpath="{.items[0].metadata.name}") -it -- bin/pulsar-admin'
$ alias pulsar='kubectl -n pulsar-demo exec $(kubectl get pods --namespace pulsar-demo -l "app=pulsar,component=bastion" -o jsonpath="{.items[0].metadata.name}") -it -- bin/pulsar'
$ alias pulsar-client='kubectl -n pulsar-demo exec $(kubectl get pods --namespace pulsar-demo -l "app=pulsar,component=bastion" -o jsonpath="{.items[0].metadata.name}") -it -- bin/pulsar-client'

To reuse your aliases in new bash sessions, you can pipe the above commands to a file (e.g., .pulsar-aliases) and append the following lines to your .bashrc:

if [ -f ~/.pulsar-aliases ]; then
  . ~/.pulsar-aliases
fi

Assuming you are using bash and save the file in your $HOME folder.

Then check if the pulsar cluster has the namespace public/default already set up:

$ pulsar-admin namespaces list public

The output should look like this:

public/default
public/functions

If you can’t see it you need to create it with:

$ pulsar-admin namespaces create public/default

Testing the installation

To check if your Pulsar cluster is operational, we can use pulsar-client to write some text messages to a topic:

$ pulsar-client produce my-test-topic --messages "hello-pulsar"

If the message has been successfully published to the topic, you should see a confirmation like this:

10:40:12.282 [main] INFO  org.apache.pulsar.client.cli.PulsarClientTool - 1 messages successfully produced

Now you have an operational Apache Pulsar up and running on your Kubernetes cluster. In the next part of the series we will talk about how to access the deployed monitoring (prometheus) and dashboards (grafana).