Centralizing Log Data with Elasticsearch using Filebeat and Logstash

In the world of system administration and development, efficiently managing log files is crucial. This blog post digs into the process of collecting log files from various hosts and sending them to Elasticsearch, using Filebeat and Logstash. We’ll explore different methods, configurations, and best practices to streamline your log management process.

Introduction to Filebeat and Logstash

Filebeat is a lightweight, open-source tool that specializes in forwarding and centralizing log data. It’s part of the Elastic Stack (formerly known as ELK Stack), which also includes Elasticsearch, Logstash, and Kibana. Filebeat is installed on the client machine and is responsible for collecting log files and forwarding them, either directly to Elasticsearch or to Logstash for further processing.

Logstash is another integral component of the Elastic Stack, used for processing and transforming logs before they are sent to Elasticsearch. It can aggregate data from multiple sources, transform it, and then send it to a “stash” like Elasticsearch.

Setting Up Filebeat

Installation

Filebeat can be installed on various platforms. Here’s a quick guide for Debian-based systems:

wget -qO - https://artifacts.elastic.co/GPG-KEY-elasticsearch | sudo apt-key add -
echo "deb https://artifacts.elastic.co/packages/7.x/apt stable main" | sudo tee -a /etc/apt/sources.list.d/elastic-7.x.list
sudo apt-get update && sudo apt-get install filebeat

Configuration

After installation, configure Filebeat by editing the filebeat.yml file, typically located at /etc/filebeat/filebeat.yml. Here’s an example configuration:

filebeat.inputs:

- type: log
  enabled: true
  paths:
    - /var/log/*.log
output.logstash:
  hosts: ["localhost:5044"]

This configuration sets Filebeat to collect all logs ending with .log in /var/log directory and forwards them to Logstash running on localhost at port 5044 (a logstash service running on the same host).

Setting Up Logstash

Installation

For Debian-based systems, Logstash can be installed as follows:

sudo apt-get install logstash

Configuration

Logstash configurations are stored in /etc/logstash/conf.d/. Create a configuration file, for example, logstash.conf, with the following

content:

input {
  beats {
    port => 5044
  }
}

filter {
  grok {
    match => { "message" => "%{COMBINEDAPACHELOG}" }
  }
}

output {
  elasticsearch {
    hosts => ["localhost:9200"]
    manage_template => false
    index => "%{[@metadata][beat]}-%{[@metadata][version]}-%{+YYYY.MM.dd}"
    document_type => "%{[@metadata][type]}"
  }
}

This configuration sets up Logstash to listen for incoming data from Filebeat on port 5044. It uses a grok filter to parse and structure the log data, then sends the processed data to Elasticsearch.

filestash vs filestash+logstash

When deciding whether to use just Filebeat or both Filebeat and Logstash in your log management pipeline with Elasticsearch, it’s important to understand the capabilities and limitations of each tool. Here’s a breakdown to help differentiate when to use each setup:

Using Just Filebeat

Filebeat is a lightweight, efficient log shipper that can directly send your log data to Elasticsearch. This setup is ideal in scenarios where:

Simple Log Forwarding: If your logs don’t require complex processing or transformation and are already in a format that Elasticsearch can easily index, using just Filebeat is sufficient.

Resource Constraints: In environments where resources are limited, such as on edge devices or in situations with minimal available computing power, Filebeat’s lightweight nature makes it a better choice.

Minimal Processing Needs: If your log data only needs basic processing, such as splitting or filtering, Filebeat can handle this with its built-in processors.

Ease of Setup and Maintenance: For smaller or simpler environments, using just Filebeat reduces complexity in setup and maintenance.

Using Filebeat with Logstash

Incorporating Logstash into your pipeline is beneficial when:

Complex Log Processing: Logstash offers a wide range of input, filter, and output plugins. If your logs require complex processing, such as enriching, mutating, or reformatting data, Logstash is the tool for the job.

Multiple Data Sources: If you’re aggregating logs from various sources and these logs are in different formats, Logstash can normalize this data before it’s indexed in Elasticsearch.

Advanced Data Filtering: Logstash’s filtering capabilities are more advanced than Filebeat’s. It can perform deep inspection and transformation of log data, which is crucial for complex log analysis.

High-Volume Data: For environments generating a large volume of logs, Logstash can efficiently handle and process this data before sending it to Elasticsearch, thereby reducing the load on Elasticsearch nodes.

Data Buffering: Logstash can act as a buffer for log data, which is beneficial in scenarios where Elasticsearch might be temporarily unavailable or overwhelmed. This helps in preventing log data loss.

Integration with Other Systems: If you need to integrate with various external systems for data enrichment or forwarding logs to multiple destinations, Logstash provides the flexibility to do so.