datahub: user and group management with keycloak

by Thomas Memenga on 2022-11-14

datahub: user and group management with keycloak

About Datahub

The Datahub metadata platform enables data discovery, data observability, and federated governance that helps tame the complexity of your data ecosystem.

Goals

The official datahub documentation describes multiple options to use an OIDC identity provider with Datahub. It even has some explicit support for okta and google in the official datahub helm chart. But configuring it to use keycloak requires some workarounds and custom setup on the keycloak client if you also want to manage group membership via keycloak.

Note: This tutorial wants to show you a minimal working setup, the settings you see in the screenshots (realm/client/user) should not be considered secure and production ready.

In this example, we will use helm and the official Datahub helm-charts provided by Acrydata.

keycloak setup

This tutorial will not include how to setting up a vanilla keycloak instance from scratch. It will require a running instance with a dedicated realm where we can create a keycloak client (datahub) and also add some client roles and a custom mapper.

Let’s start by creating a new client named “datahub” in our realm:

create keycloak client

create keycloak client

Click on edit:

edit keycloak client

Select the “Roles” tab add and click on “Add Role”:

add client roles

Add two roles, datahub-admins and data-office

add client roles

Add two demo users, dh-admin and dh-john-doe, make sure that user enabled and email verified is set to ON

add user

add user

Make sure to set a password for each user afterwards:

set password

Navigate to role mappings and assign datahub-admins to the user dh-admin and data-office to dh-john-doe:

assign role to user

These role assignments are not yet visible to Datahub. We will now add a custom mapper to add the client roles as a list to the jwt under the claim name datahub-groups :

Switch to the “Mappers” tab on the datahub client configuration page:

add custom mapper

Create a new mapper (use “create”, not “add builtin”) with these values:

Mapper type: User client role

Client ID: datahub

Token Claim Mame: datahub-groups

Claim Json Type: String

add custom mapper

With this mapper in place, the jwt token now includes a new claim (datahub-groups) that can be picked up by Datahub to do the user -> group assignment:

{
  "exp": 1668521254,
  "iat": 1668520954,
  ...
  "scope": "openid email profile",
  ...
  ...
  "datahub-groups": [
    "datahub-admins"
  ],
  "preferred_username": "dh-admin"
}

Now everything is setup in keycloak, and we have two demo users ready to be used logging into Datahub.

Datahub configuration

The official Datahub oidc documentation does describe all configuration properties in detail. we will use the minimum set of properties to connect our Datahub instance to keycloak and use as much automation as possible:

AUTH_OIDC_ENABLED

Delegates authentication to a OIDC identity provider if set to true.

AUTH_OIDC_CLIENT_ID

The client id (name) we used in keycloak. Will be datahub in this example setup.

AUTH_OIDC_CLIENT_SECRET

Unique client secret received from identity provider. In our minimal example the client secret is not used and will be set to an arbitrary value.

AUTH_OIDC_DISCOVERY_URI

Location of the identity provider OIDC discovery API. Normally ends with .well-known/openid-configuration

AUTH_OIDC_JIT_PROVISIONING_ENABLED

Whether Datahub users & groups should be provisioned on login if they do not exist. Defaults to true. We will also use this behaviour in our tutorial, as it reduces complexity.

AUTH_OIDC_PRE_PROVISIONING_REQUIRED

Whether the user should already exist in DataHub when they login, failing login if they are not, this is appropriate for situations where users and groups are batch ingested and tightly controlled inside your environment. Defaults to false.

AUTH_OIDC_USER_NAME_CLAIM

The attribute that will contain the username used on the DataHub platform. We will use preferred_username (also the default).

AUTH_OIDC_EXTRACT_GROUPS_ENABLED

Only applies if AUTH_OIDC_JIT_PROVISIONING_ENABLED is set to true. This determines whether datahub should attempt to extract a list of group names from a particular claim in the OIDC attributes. Note that if this is enabled, each login will re-sync group membership with the groups in your Identity Provider, clearing the group membership assigned through the Datahub UI.

AUTH_OIDC_GROUPS_CLAIM

This determines which OIDC claims will contain a list of string group names. In our example, this has to match the claim names used in the custom mapper we created in keycloak (datahub-groups).

configuring the Datahub helm chart

Let’s put everything in place: As the helm chart does not support a keycloak-based OIDC setup, we need to keep oidcAuthentication.enabled set to false and use extraEnv to inject our relevant configuration properties:

datahub-frontend:
  oidcAuthentication:
    # the datahub helm chart only supports google and okta out of the box, 
    # we will use extraEnvs to setup the integration with our keycloak instance
    enabled: false 
    
  extraEnvs: 
    - name: "AUTH_OIDC_ENABLED"
      value: "true"
    - name: "AUTH_OIDC_CLIENT_ID"
      value: "datahub"
    - name: "AUTH_OIDC_CLIENT_SECRET"
      value: "notused"
    - name: AUTH_OIDC_JIT_PROVISIONING_ENABLED
      value: "true" 
    - name: AUTH_OIDC_PRE_PROVISIONING_REQUIRED
      value: "false"           
    - name: "AUTH_OIDC_DISCOVERY_URI"
    value: "https://keycloak.yourdomainhere.com/auth/realms/yourrealmnamehere/.well-known/openid-configuration"
    - name: "AUTH_OIDC_BASE_URL"
      value: "https://datahub.yourdomainhere.com"
    - name: "AUTH_OIDC_USER_NAME_CLAIM"
      value: "preferred_username"
    - name: AUTH_OIDC_EXTRACT_GROUPS_ENABLED
      value: "true"
    - name: AUTH_OIDC_GROUPS_CLAIM
      value: "datahub-groups" 

testing the integration

With this configuration in place, your Datahub instance should redirect you to keycloak, where you can use our two users:

add custom mapper

Your profile page should show a group assignment:

datahub-admin:

user dh-admin is in group datahub-admins

dh-john-doe:

userr dh-john-doe is in group data-office

drawbacks

Reassigning these client roles (datahub-admins and data-office in this example) to users in keycloak is not reflected in Datahub immediately. These assignments will be updated in Datahub on the next login of the user(s). So from the user-perspective, these changes are applied just in time, but if you look at user <-> group assignments in general on Datahub, you need to be aware it might be outdated.

Be aware that the “standard” login form is still active. The keycloak setup can be seen as an optional method to login in. So when you go to https://datahub.yourdomainhere.com/login you still can use Datahub-local accounts (e.g., the default “datahub” admin account).