Partilhar via


Azure Kubernetes Service (AKS): Deploying Elasticsearch, Logstash and Kibana (ELK) and consume messages from Azure Event Hub

This article is part of the series of blogs on Azure Kubernetes Service (AKS). In this article I am going to share my experience of setting up Elasticsearch, Logstash and Kibana cluster in Azure Kubernetes Service (AKS) and consume messages from Event Hub. After the end of this article, we are going to have fully functional ELK stack with Azure Event Hub integration.

A sample client App (e.g. IOT device) will be publishing messages to Event Hub and these messages will be ingested into Elasticsearch using 'Azure Event Hub' plugin of Logstash. This article needs x-pack features of Elasticsearch thus I will show steps needed to activate trial license.

The second part of this series  goes through steps needed to enable Azure AD SAML based single sign on to secure Elasticsearch and Kibana hosted in AKS. The third part of this series goes through steps needed to ingest Azure Redis Cache messages into Elasticsearch using Logstash's Redis plugin.

The dev tools used to develop these components are Visual Studio for Mac/Visual Studio 2017, AKS Dashboard as well as kubectl commands are used to create/manager Kubernetes resources in AKS.

Azure Kubernetes Service (AKS) Cluster

In case you don’t have AKS cluster up and running, please go through this article to Create AKS Cluster. Once AKS cluster is up and running, you can go through the rest of the article. The code snippets in this article are mostly yaml snippets and are included for reference only as formatting may get distorted thus please refer to GitHub repository for formatted resources.

Event Hub

Event hub messages will be ingested into Elasticsearch using Logstash pipeline, thus first resource I am going to create is Event Hub. Please follow steps listed in this article to create an Event Hub. The main pointers are

  • Event Hub name I have selected for this sample solution is 'logstash'. If you select a different name, update event hub name in the source code.
  • Keep a note of Event Hub connection string as this needs to be updated in Logstash pipeline and Event Hub messages publisher client.
  • Source code uses '$Default' Resource_Group. Update this value if you create a Resource_Group

Azure Blob Storage

The next resource which you will need to create is Azure Blob Storage. Please follow steps listed in this article to create a storage account. Once storage account is created, create Blob Service-> Blobs. The main pointers are

  • Blob container name I have specified in source code is 'logstash'. If you select a different name, update storage container name in the source code.
  • Keep a note of Storage connection string as this needs to be updated in Logstash pipeline.

Client App to send messages to Event Hub

AzureEventHubSample project is client app to send messages to Event hub. You will need to update connectionString variable with Event Hub connection string and name of the hub. You can download the source code of this publisher client from Github.

Deploy Elasticsearch to Azure Kubernetes Service

Elasticsearch is a near real time search platform. The steps needed to deploy Elasticsearch to AKS cluster are listed below

Create a Persistent Volume

Persistent volume claim is needed to store Elasticsearch data. The yaml snippet to create a 5 GB storage is displayed below. The StatefulSet resource is going to mount files to this storage claim. You can read more about Persistent Volumes. apiVersion: v1kind: PersistentVolumeClaimmetadata:  name: sample-elasticsearch-data-claimspec:  accessModes:  - ReadWriteOnce  resources:   requests:    storage: 5Gi

Create a Kubernetes ConfigMap

ConfigMaps allow you to decouple configuration aspects from image and a few pointers about the yaml snippet displayed below are

  • elasticsearch.yml and role_mapping.yml files will be mounted from ConfigMap
  • xpack-security is enabled
  • Update max_local_storage_nodes based on number of Pods
  • role_mapping.yml is intentionally left blank as it will be discussed in detail in next blogs of this series

apiVersion: v1kind: ConfigMapmetadata:  name: sample-elasticsearch-configmap  namespace: defaultdata:  elasticsearch.yml: |    cluster.name: "sample-elasticsearch-cluster"    network.host: 0.0.0.0    discovery.zen.minimum_master_nodes: 1    #Update max_local_storage_nodes value based on number of nodes    node.max_local_storage_nodes: 1    xpack.security.enabled: true    xpack.monitoring.collection.enabled: true  role_mapping.yml: |

You can read more about ConfigMap.

Create a Kubernetes Service

The next step is to create a Kubernetes Service for Elasticsearch. As you can see in yaml snippet below, port 9200/9300 is defined and type is ClusterIP i.e. this service doesn't has external endpoints. Kubernetes will use to selector 'service: sample-elasticsearch' to map to the deployment as you are going to see next. You can read more about Services apiVersion: v1               kind: Service                metadata:                      name: sample-elasticsearch          labels:                        service: sample-elasticsearchspec:                          type: ClusterIP  selector:                       service: sample-elasticsearch  ports:                       - name: http                   port: 9200                   targetPort: 9200    protocol: TCP              - name: transport              port: 9300                   targetPort: 9300    protocol: TCP

Create a Kubernetes StatefulSet

Kubernetes StatefulSet is the workload API used to manage stateful applications. You can read more about StatefulSets. The yaml snippet to create a StatefulSet is displayed below and a few pointers are

  • service: sample-elasticsearch matches to the selector defined in the service.
  • I have specified replicas: 1 which means that only one instance of Pod will be created by Kubernetes. You can update this value as needed, however you will need to update ConfigMap value .max_local_storage_nodes: 1 too.
  • The docker image being used to create this resource is image:docker.elastic.co/elasticsearch/elasticsearch:6.4.1.
  • By default, Elasticsearch runs inside the container as user elasticsearch using uid:gid 1000:1000. If you are bind-mounting a local directory or file, ensure it is readable by this user, while the data and log dirs additionally require write access. This is the reason I have used environment variable  - name: "TAKE_FILE_OWNERSHIP" value: “1”. The alternate option is to grant write access by adding an initcontainer step e.g. command: - sh - -c - chown -R 1000:1000 /usr/share/elasticsearch/data. You can read more at Elasticsearch Docker.
  • mmap count has been increased  vm.max_map_count=262144 by adding an initcontainer step. You can read more about Elasticsearch virtual memory.
  • ConfigMap defined in previous step is used to mount elasticsearch.yml and role_mapping.yml files i.e. configMap: name: sample-elasticsearch-configmap.
  • Lastly, persistent volume claim created above is used for storage i.e. persistentVolumeClaim: claimName:sample-elasticsearch-data-claim.
  • Assign CPU resource as needed by updating section resources:limits:. You can read more about Assign CPU Resources to Containers and Pods

apiVersion: apps/v1kind: StatefulSetmetadata:  name: sample-elasticsearch  labels:    service: sample-elasticsearchspec:  serviceName: sample-elasticsearch  # Number of PODS  replicas: 1  selector:    matchLabels:      service: sample-elasticsearch  template:    metadata:      labels:        service: sample-elasticsearch    spec:      terminationGracePeriodSeconds: 15      initContainers:      # https://www.elastic.co/guide/en/elasticsearch/reference/current/vm-max-map-count.html      - name: increase-the-vm-max-map-count        image: busybox        command:        - sysctl        - -w        - vm.max_map_count=262144        securityContext:          privileged: true      containers:      - name: sample-elasticsearch        image: docker.elastic.co/elasticsearch/elasticsearch:6.4.1        resources:          limits:            cpu: 2000m            memory: 2Gi          requests:            cpu: 100m            memory: 1Gi        ports:        - containerPort: 9200          name: http        - containerPort: 9300          name: tcp        env:          - name: cluster.name            value: "sample-elasticsearch-cluster"          - name: "xpack.security.enabled"            value: "true"          - name: "TAKE_FILE_OWNERSHIP"            value: “1”        volumeMounts:        - name: sample-elasticsearch-data-claim          mountPath: /usr/share/elasticsearch/data        - name: sample-elasticsearch-configmap          mountPath: /usr/share/elasticsearch/config/elasticsearch.yml          subPath: elasticsearch.yml        - name: sample-elasticsearch-configmap          mountPath: /usr/share/elasticsearch/config/role_mapping.yml          subPath: role_mapping.yml      volumes:      - name: sample-elasticsearch-data-claim        persistentVolumeClaim:           claimName: sample-elasticsearch-data-claim      - name: sample-elasticsearch-configmap        configMap:            name: sample-elasticsearch-configmap

By default Elasticsearch will be deployed with basic license. After Elasticsearch is deployed, the next step is to activate trail license of Elasticsearch to use x-pack features of Elasticsearch.

Enable Trial Version of Elasticsearch

The steps needed to activate trail license are

  • Run command kubectl port-forward sample-elasticsearch-0 9200:9200 and now you can access Elasticsearch endpoint at https://localhost:9200. POST https://localhost:9200/_xpack/license/start_trial?acknowledge=true request from any rest client. This is going to activate the trial license.

  • You can verify if trail version is activated by GET https://localhost:9200/_xpack/license request from any rest client.
  • Once trial license is activated you can close the terminal as port forwarding isn't needed anymore.

Setup Password for buit-in user accounts of Elasticsearch

x-pack security feature of Elasticsearch is used to secure access thus we now need to setup passwords for built-in user accounts and the steps are

  • Connect to Elasticsearch POD by running command kubectl exec -ti sample-elasticsearch-0 bash
  • Run command bin/elasticsearch-setup-passwords interactive to setup built-in user passwords interactively. For this sample I have specified same password i.e. Password1$ for all accounts thus you will need to change password you specified in a few places in source code.

Deploy Kibana to Azure Kubernetes Service

Kibana lets you visualize your Elasticsearch data and navigate the Elastic Stack. The steps needed to deploy Kibana to AKS are listed below

Create a Kubernetes ConfigMap

The yaml snippet to create this resource is displayed below and a few pointers are

  • kibana.yml file will be mounted from ConfigMap
  • Kibana points to Elasticsearch based on elasticsearch.url: https://sample-elasticsearch:9200
  • Updateelasticsearch.password: Password1$ based on your specified password for kibana (built-in) user

apiVersion: v1kind: ConfigMapmetadata:  name: sample-kibana-configmap  namespace: defaultdata:  kibana.yml: |    server.name: sample-kibana    server.host: "0"    elasticsearch.url: https://sample-elasticsearch:9200    xpack.monitoring.ui.container.elasticsearch.enabled: true    elasticsearch.username: kibana    elasticsearch.password: Password1$

Create a Kubernetes Service

The next step is to create a Kubernetes Service for Kibana. As you can see in yaml snippet below, port 80 is defined and type is LoadBalancer i.e. this service has external endpoints. Kubernetes will use to selector 'component: sample-kibana' to map to the deployment as you are going to see next. The creation of this service is going to take a while and once done you can get the external endpoint of this service either by opening AKS Dashboard or running Kubectl command kubectl describe services sample-kibana.

apiVersion: v1kind: Servicemetadata:  name: sample-kibana  labels:    component: sample-kibanaspec:  type: LoadBalancer  selector:    component: sample-kibana  ports:  - name: http    port: 80    targetPort: http

Create a Kubernetes Deployment

The next step is to create a Kubernetes Deployment for Kibana. The yaml snippet is displayed below and a few pointers are

  • The docker image being used to create this resource is image: docker.elastic.co/kibana/kibana:6.4.1
  • You can change the number of pods by updating replicas: 1
  • Labelcomponent: sample-kibana has to match the selector defined in the service
  • ConfigMap defined in previous step is used to mount kibana.yml file i.e. configMap: name: sample-kibana-configmap

apiVersion: apps/v1beta1kind: Deploymentmetadata:  name: sample-kibana  labels:    component: sample-kibanaspec:  replicas: 1  selector:    matchLabels:     component: sample-kibana  template:    metadata:      labels:        component: sample-kibana    spec:      containers:      - name: sample-kibana        image: docker.elastic.co/kibana/kibana:6.4.1        resources:          limits:            cpu: 1000m          requests:            cpu: 100m        ports:        - containerPort: 5601          name: http        volumeMounts:        - name: sample-kibana-configmap          mountPath: /usr/share/kibana/config/kibana.yml          subPath: kibana.yml        resources: {}        terminationMessagePath: "/dev/termination-log"        terminationMessagePolicy: File        imagePullPolicy: Always        securityContext:          privileged: false      volumes:      - name: sample-kibana-configmap        configMap:            name: sample-kibana-configmap      restartPolicy: Always      terminationGracePeriodSeconds: 5      dnsPolicy: ClusterFirst      securityContext: {}      schedulerName: default-scheduler

Open the external endpoint of Kibana service once deployment is completed. Since x-pack security is enabled, Kibana will prompt for credentials.

Deploy Logstash to Azure Kubernetes Service

Logstash is data processing pipeline that ingests data from a multitude of sources simultaneously, transforms it, and then sends it to Elasticsearch. Logstash will use Azure Event Hub plugin to ingest data into Elasticsearch. The steps needed to deploy Logstash to AKS are listed below

Create a Kubernetes ConfigMap

The yaml snippet to create ConfigMap is displayed below and a few pointers are

  • logstash.yml file will be mounted from ConfigMap
  • pipelines.yml file will be mounted from ConfigMap. Multiple pipelines can be defined in this file e.g. you can see that AzureEventHubs pipeline is defined.
  • azureeventhub.cfg file will be mounted from ConfigMap. The Logstash event processing pipeline has three stages: inputs → filters → outputs. This file defines the logstash pipeline for Azure Event Hub.
    • Update {AZURE_EVENT_HUB_CONNECTION_STRING} and {STORAGE_ACCOUNT_CONNECTION_STRING} values based on your Event Hub and Storage Account values.
    • Update EntityPath in case your event hub is named differently then logstash.
    • Update storage_container in case your storage container is named differently than logstash.
    • Update consumer_group in case your consumer group is different than $Default. Specifying $Default isn't an ideal option.
    • Index name is defined as index => "azureeventhub-%{+YYYY.MM.dd}
  • logstash.conf file will be mounted from ConfigMap
  • Elasticsearch URL is defined in xpack.monitoring.elasticsearch.url: https://sample-elasticsearch:9200
  • Updatexpack.monitoring.elasticsearch.password: Password1$ based on your specified password for logstash_system (built-in) user.
  • Elasticsearch endpoint is defined in hosts => [ "sample-elasticsearch:9200" ]
  • Update password => "Password1$" to the specified password of your elastic (built-in) user

apiVersion: v1kind: ConfigMapmetadata:  name: sample-logstash-configmap  namespace: defaultdata:  logstash.yml: |    xpack.monitoring.elasticsearch.url: https://sample-elasticsearch:9200    dead_letter_queue.enable: true    xpack.monitoring.enabled: true    xpack.monitoring.elasticsearch.username: logstash_system     xpack.monitoring.elasticsearch.password: Password1$  pipelines.yml: |    - pipeline.id: azureeventhubs      path.config: "/usr/share/logstash/azureeventhubs.cfg"  azureeventhubs.cfg: |    input {      azure_event_hubs {        event_hub_connections => ["{AZURE_EVENT_HUB_CONNECTION_STRING};EntityPath=logstash"]        threads => 2        decorate_events => true        consumer_group => "$Default"        storage_connection => "{STORAGE_ACCOUNT_CONNECTION_STRING}"        storage_container => "logstash"        }    }    filter {    }    output {      elasticsearch {        hosts => [ "sample-elasticsearch:9200" ]        user => "elastic"        password => "Password1$"        index => "azureeventhub-%{+YYYY.MM.dd}"      }    }  logstash.conf: |

Create a Kubernetes Service

The next step is to create a Kubernetes Service for Logstash. As you can see in yaml snippet below, port 80 is defined and type is ClusterIP i.e. this service has no external endpoints. Kubernetes will use to selector 'component: sample-logstash' to map to the deployment as you are going to see next.

apiVersion: v1kind: Servicemetadata:  name: sample-logstash  labels:    component: sample-logstashspec:  type: ClusterIP  selector:    component: sample-logstash  ports:  - name: http    port: 80    targetPort: http

Create a Kubernetes Deployment

The next step is to create a Kubernetes Deployment for Logstash. The yaml snippet is displayed below and a few pointers are

  • The docker image being used to create this resource is docker.elastic.co/logstash/logstash:6.4.1
  • You can change the number of pods by updating replicas: 1
  • Labelcomponent: sample-logstash has to match the selector defined in the service
  • ConfigMap defined in previous step is used to mount logstash.yml, logstash.conf, pipelines.yml and azureeventhubs.cfg files i.e. configMap: name: sample-logstash-configmap

apiVersion: apps/v1beta1kind: Deploymentmetadata:  name: sample-logstash  labels:    component: sample-logstashspec:  replicas: 1  selector:    matchLabels:     component: sample-logstash  template:    metadata:      labels:        component: sample-logstash    spec:      containers:      - name: sample-logstash        image: docker.elastic.co/logstash/logstash:6.4.1        volumeMounts:        - name: sample-logstash-configmap          mountPath: /usr/share/logstash/config/logstash.yml          subPath: logstash.yml        - name: sample-logstash-configmap          mountPath: /usr/share/logstash/pipeline/logstash.conf          subPath: logstash.conf        - name: sample-logstash-configmap          mountPath: /usr/share/logstash/azureeventhubs.cfg          subPath: azureeventhubs.cfg        - name: sample-logstash-configmap          mountPath: /usr/share/logstash/config/pipelines.yml          subPath: pipelines.yml        resources:          limits:            cpu: 1000m          requests:            cpu: 100m        ports:        - containerPort: 5601          name: http      volumes:        - name: sample-logstash-configmap          configMap:            name: sample-logstash-configmap

As mentioned earlier, Logstash will use azure_event_hubs plugin to ingest data into Elasticsearch. You can get installed plugins list by following steps listed below

  • Run command kubectl exec -ti {Logstash_Pod_Name}  bash to connect to Logstash POD.
  • Run command bin/logstash-plugin list to see installed plugins

Demo

After all resources are deployed to AKS, run client app to send messages to Event hub. Open Kibana and you will see ELK stack statistics in Monitoring section. The number of messages sent by client App to Event hub will be displayed in Logstash's Events received and events emitted statistics.

The Discover tab of Kibana will display events ingested into Elasticsearch once you create index filters e.g. I have selected index filter as 'azureeventhub*' and you can see that this index naming convention was defined in azureeventshub.cfg pipelineindex => "azureeventhub-%{+YYYY.MM.dd}.

 

You can download source code for this article from GitHub repository

 

The second part of this series is Azure Kubernetes Service (AKS): Azure AD SAML based Single Sign on to secure Elasticsearch and Kibana and securing communications in ELK