Real-Time Log Analysis with Filebeat on CentOS: A Step-by-Step Implementation Guide
Filebeat is a lightweight, efficient log shipper designed to collect, parse, and forward log data from local or remote servers to centralized systems like Elasticsearch for storage and analysis. Its real-time capabilities stem from its ability to monitor log files for changes (e.g., new lines appended due to application activity) and immediately forward those updates. Below is a structured guide to setting up Filebeat on CentOS for real-time log analysis, covering installation, configuration, verification, and optimization.
Before installing Filebeat, ensure your CentOS system meets the following requirements:
sudo systemctl status elasticsearch sudo systemctl status kibana
sudo
or as a user with root privileges.Filebeat can be installed via the official Elastic YUM repository to ensure access to the latest versions. Follow these steps:
sudo rpm --import https://artifacts.elastic.co/GPG-KEY-elasticsearch
/etc/yum.repos.d/elasticsearch.repo
:echo "[elasticsearch-7.x] name=Elasticsearch repository for 7.x packages baseurl=https://artifacts.elastic.co/packages/7.x/yum gpgcheck=1 gpgkey=https://artifacts.elastic.co/GPG-KEY-elasticsearch enabled=1 autorefresh=1 type=rpm-md" | sudo tee -a /etc/yum.repos.d/elasticsearch.repo
yum
to install the latest version of Filebeat:sudo yum install filebeat -y
The core of Filebeat’s real-time functionality lies in its configuration file (/etc/filebeat/filebeat.yml
). Below are key settings to enable and optimize real-time log collection:
Specify the log files or directories to monitor. For example, to monitor all .log
files in /var/log/
:
filebeat.inputs: - type: log enabled: true paths: - /var/log/*.log # Optional: Ignore logs older than 72 hours to reduce processing load ignore_older: 72h
You can monitor multiple directories or specific files by adding more entries to the paths
list (e.g., - /opt/myapp/logs/*.log
).
Adjust the following parameters in the filebeat.inputs
section to enhance real-time performance:
scan_frequency
: Controls how often Filebeat scans for new or updated files (default: 10s). Reduce this to 5s for faster detection (e.g., scan_frequency: 5s
).close_inactive
: Closes a file if no new data is written for the specified duration (default: 5m). Set to a shorter interval (e.g., 1m
) to immediately detect new log entries after inactivity.tail_files
: If set to true
, Filebeat starts reading from the end of new files (useful for avoiding old log entries). Default is false
.Example configuration with optimized real-time settings:
filebeat.inputs: - type: log enabled: true paths: - /var/log/*.log scan_frequency: 5s close_inactive: 1m tail_files: true
Send collected logs to Elasticsearch for storage and indexing. Replace localhost:9200
with your Elasticsearch server’s address if it’s remote:
output.elasticsearch: hosts: ["localhost:9200"] index: "filebeat-%{+yyyy.MM.dd}" # Daily indices for better manageability
Processors modify log data before sending it to Elasticsearch. For example, the add_fields
processor adds a custom field to categorize logs:
processors: - add_fields: target: "" # Add fields to the root of the event fields: environment: "production" application: "myapp"
After configuring Filebeat, start the service and configure it to launch at boot:
sudo systemctl start filebeat sudo systemctl enable filebeat
Verify Filebeat’s status to ensure it’s running without errors:
sudo systemctl status filebeat
Check that Filebeat is successfully sending logs to Elasticsearch:
filebeat-2025.09.20
):curl -X GET "localhost:9200/_cat/indices?v"
_search
API to retrieve the latest logs. For example, to get logs from the last 5 minutes:curl -X GET "localhost:9200/filebeat-*/_search" -H 'Content-Type: application/json' -d' { "query": { "range": { "@timestamp": { "gte": "now-5m/m", "lte": "now/m" } } }, "size": 10 }'
Kibana provides a user-friendly interface for real-time log analysis. Follow these steps to set it up:
http://<server-ip>:5601
) and navigate to Stack Management > Index Patterns. Click “Create index pattern”, enter filebeat-*
, and select @timestamp
as the time field.filebeat-*
index pattern, and you’ll see real-time logs streaming in. Use filters (e.g., level: ERROR
) to narrow down results.For production environments, consider these advanced configurations to improve reliability and performance:
myapp.log.1
), but you can configure close_removed
(close files when deleted) and close_renamed
(close files when renamed) to avoid missing data.bulk_max_size
parameter (default: 50) in the output.elasticsearch
section to control how many logs are sent in each batch (higher values improve throughput but increase memory usage).By following these steps, you can configure Filebeat on CentOS to achieve real-time log analysis, enabling you to quickly identify and respond to issues in your applications and infrastructure.