In the first part of our blog post series “getting started with pulsar on kubernetes” we go through the steps of deploying the core components of Apache Pulsar on Kubernetes.
You can find the full list of all blog posts here.
prerequisites
The following prerequisites should be met:
- You have a Kubernetes cluster installed and configured locally (
kubectl
is working). - The cluster is already set up with helm and your local helm client is also configured accordingly.
- An Ingress controller (nginx) and endpoints are available.
First steps
To deploy Pulsar on your cluster we are going to utilize the helm chart from the official Pulsar repository. After cloning the repository checkout the release branch v2.4.1
:
$ git clone \
--depth 1 \
--single-branch \
--branch v2.4.1 \
https://github.com/apache/pulsar.git
$ cd pulsar
Deployment on Kubernetes
We want to deploy Pulsar in a separate namespace called pulsar-demo
. To create the namespace do:
$ echo '{ "kind": "Namespace", "apiVersion": "v1", "metadata": { "name": "pulsar-demo", "labels": { "name": "pulsar-demo" } } }' | kubectl create -f -
Going forward we will use pulsar-demo
as namespace, feel free to change it. To deploy the helm chart you will need to provide a values.yaml
defining all components for the deployment. Navigate to deployment/kubernetes/helm/pulsar
in the repository. This is where you find an example values.yaml
and a file called values-mini.yaml
. You can use the latter if you just want to test a pulsar build or if your cluster is not very powerful. The file is optimized for a minikube
cluster. Using values.yaml
will require much more resources. We modified it slightly to cut down the footprint of the installation in terms of CPU and memory.
You can find our file here. Just download and apply it directly using:
$ curl https://www.syscrest.com/pulsar/reduced-footprint-values.yaml > reduced-footprint-values.yaml
$ helm install deployment/kubernetes/helm/pulsar --name pulsar -f reduced-footprint-values.yaml
We use the --name
option with our helm install command to create some consistency, which could be useful in case you have to redeploy the cluster. Otherwise, helm would just assign a random deployment name.
So our deployment file looks like this:
## Namespace to deploy pulsar
namespace: pulsar-demo
namespaceCreate: no
persistence: yes
zookeeper:
resources:
requests:
## default was: 15GB
memory: 4Gi
## default was: 4
cpu: 1
configData:
## adjusted memory settings
PULSAR_MEM: "\"-Xms3g -Xmx3g -Dcom.sun.management.jmxremote -Djute.maxbuffer=10485760 -XX:+ParallelRefProcEnabled -XX:+UnlockExperimentalVMOptions -XX:+AggressiveOpts -XX:+DoEscapeAnalysis -XX:+DisableExplicitGC -XX:+PerfDisableSharedMem -Dzookeeper.forceSync=no\""
PULSAR_GC: "\"-XX:+UseG1GC -XX:MaxGCPauseMillis=10\""
bookkeeper:
replicaCount: 4
resources:
requests:
## default was: 15GB
memory: 4Gi
## default was: 4
cpu: 1
configData:
## adjusted memory settings
PULSAR_MEM: "\"-Xms3g -Xmx3g -XX:MaxDirectMemorySize=3g -Dio.netty.leakDetectionLevel=disabled -Dio.netty.recycler.linkCapacity=1024 -XX:+UseG1GC -XX:MaxGCPauseMillis=10 -XX:+ParallelRefProcEnabled -XX:+UnlockExperimentalVMOptions -XX:+AggressiveOpts -XX:+DoEscapeAnalysis -XX:ParallelGCThreads=32 -XX:ConcGCThreads=32 -XX:G1NewSizePercent=50 -XX:+DisableExplicitGC -XX:-ResizePLAB -XX:+ExitOnOutOfMemoryError -XX:+PerfDisableSharedMem -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCApplicationStoppedTime -XX:+PrintHeapAtGC -verbosegc -XX:G1LogLevel=finest\""
dbStorage_writeCacheMaxSizeMb: "512"
dbStorage_readAheadCacheMaxSizeMb: "512"
dbStorage_rocksDB_blockCacheSize: "268435456"
journalMaxSizeMB: "512"
broker:
component: broker
replicaCount: 3
resources:
requests:
## default was: 15GB
memory: 4Gi
## default was: 4
cpu: 1
configData:
## adjusted memory settings
PULSAR_MEM: "\"-Xms3g -Xmx3g -XX:MaxDirectMemorySize=3g -Dio.netty.leakDetectionLevel=disabled -Dio.netty.recycler.linkCapacity=1024 -XX:+ParallelRefProcEnabled -XX:+UnlockExperimentalVMOptions -XX:+AggressiveOpts -XX:+DoEscapeAnalysis -XX:ParallelGCThreads=32 -XX:ConcGCThreads=32 -XX:G1NewSizePercent=50 -XX:+DisableExplicitGC -XX:-ResizePLAB -XX:+ExitOnOutOfMemoryError -XX:+PerfDisableSharedMem\""
PULSAR_GC: "\"-XX:+UseG1GC -XX:MaxGCPauseMillis=10\""
After executing the helm install
, you will have to wait until all pods are running and in ready state. Check the pods’ states with the following command:
$ kubectl -n pulsar-demo get pods
You can either repeatedly enter the above command until all pods are ready or just watch
it. The output should finally look something like this.:
NAME READY STATUS RESTARTS AGE
pulsar-autorecovery-d57f6bb5d-p42cl 1/1 Running 0 56s
pulsar-bastion-b7b865cf9-ldfvw 1/1 Running 0 56s
pulsar-bookkeeper-0 1/1 Running 0 56s
pulsar-bookkeeper-1 1/1 Running 0 23s
pulsar-bookkeeper-2 1/1 Running 0 14s
pulsar-bookkeeper-3 1/1 Running 0 14s
pulsar-broker-57b7f87f78-m28fn 1/1 Running 0 56s
pulsar-broker-57b7f87f78-n8skw 1/1 Running 0 56s
pulsar-broker-57b7f87f78-we394 1/1 Running 0 55s
pulsar-dashboard-658fdff9bb-g8vkd 1/1 Running 0 56s
pulsar-grafana-559456d659-6jm8r 1/1 Running 0 56s
pulsar-prometheus-777f7c8868-ktcjj 1/1 Running 0 56s
pulsar-proxy-57cb48dd9b-hb4p6 1/1 Running 0 56s
pulsar-zookeeper-0 1/1 Running 0 56s
pulsar-zookeeper-1 1/1 Running 0 41s
pulsar-zookeeper-2 1/1 Running 0 34s
pulsar-zookeeper-metadata-hsnd7 0/1 Completed 0 56s
You see a lot of pods there and they each play a part. Let’s go briefly over the important ones:
- First and foremost helm initializes a bastion pod (
pulsar-bastion-b7b865cf9-ldfvw
in the list above). This pod is used for all the administrative work on the cluster. So all your commands (includingpulsar-admin
) will be executed in this container’s shell. - The bookkeepers are responsible for the durable message storage in your Pulsar cluster (they each have a 50 GB persistent volume attached)
- The brokers consist of two components:
- An HTTP server exposing a REST interface administration and topic lookup.
- A dispatcher that handles all Pulsar message transfers.
- The dashboard enables users to monitor current stats for all topics via a web application. It utilizes prometheus for monitoring and grafana for visualization.
- The proxy is an optional gateway that you can run in front of the brokers in a Pulsar cluster.
- And finally zookeeper which is responsible for a wide variety of confiugration- and coordination-related tasks.
Setting up the environment
To execute commands in Pulsar you need to access the shell of the bastion pod. You can do that with:
$ kubectl -n pulsar-demo exec $(kubectl get pods --namespace pulsar-demo -l "app=pulsar,component=bastion" -o jsonpath="{.items[0].metadata.name}") -it -- bash
You can also set up aliases
of commonly used commands so you don’t have to access the shell everytime you want to do something:
$ alias pulsar-admin='kubectl -n pulsar-demo exec $(kubectl get pods --namespace pulsar-demo -l "app=pulsar,component=bastion" -o jsonpath="{.items[0].metadata.name}") -it -- bin/pulsar-admin'
$ alias pulsar='kubectl -n pulsar-demo exec $(kubectl get pods --namespace pulsar-demo -l "app=pulsar,component=bastion" -o jsonpath="{.items[0].metadata.name}") -it -- bin/pulsar'
$ alias pulsar-client='kubectl -n pulsar-demo exec $(kubectl get pods --namespace pulsar-demo -l "app=pulsar,component=bastion" -o jsonpath="{.items[0].metadata.name}") -it -- bin/pulsar-client'
To reuse your aliases in new bash sessions, you can pipe the above commands to a file (e.g., .pulsar-aliases
) and append the following lines to your .bashrc
:
if [ -f ~/.pulsar-aliases ]; then
. ~/.pulsar-aliases
fi
Assuming you are using bash and save the file in your $HOME
folder.
Then check if the pulsar cluster has the namespace public/default
already set up:
$ pulsar-admin namespaces list public
The output should look like this:
public/default
public/functions
If you can’t see it you need to create it with:
$ pulsar-admin namespaces create public/default
Testing the installation
To check if your Pulsar cluster is operational, we can use pulsar-client to write some text messages to a topic:
$ pulsar-client produce my-test-topic --messages "hello-pulsar"
If the message has been successfully published to the topic, you should see a confirmation like this:
10:40:12.282 [main] INFO org.apache.pulsar.client.cli.PulsarClientTool - 1 messages successfully produced
Now you have an operational Apache Pulsar up and running on your Kubernetes cluster. In the next part of the series we will talk about how to access the deployed monitoring (prometheus) and dashboards (grafana).