Kafka Monitoring and Kafka topic Configuration Presented By: Neelam Software Consultant Knoldus Inc.
About Knoldus Knoldus is a technology consulting firm with focus on modernizing the digital systems at the pace your business demands. Functional. Reactive. Cloud Native DevOps
01 Introduction to kafka monitoring 02 Why to monitor kafka 03 Important Metrics to focus on first 04 Kafka Topic Introduction Default kafka topic configuration Our Agenda 05 05 06 Modify topic configuration with Demo
Introduction to Kafka Monitoring Apache kafka deals with transfering of large amount of real-time data( we can call it data in a motion). To assure end-to-end stream monitoring and every message is delivered from producer to consumer. How long messages take to be delivered, also determines the source of issue in your cluster. We can monitor kafka with the help of metrics. While monitoring kafka, it’s important to also monitor Zookeeper as kafka depends on it. LEARN NOW
c Why to Monitor Kafka Kafka monitoring is important to ensure timeliness of data delivery, overall application performance , knowing when to scale up , connectivity issues and ensuring data is not lost as we deal with streaming data. Volume of data is large and there are different components involved into kafka cluster which are: Producer , Consumer and Broker. To ensure every component is working fine. LEARN NOW
BANNER INFOGRAPHICInsert Your Subtitle Here Network Request Rate01 02 03 Since the goal of kafka brokers is to gather and move for processing, they can also be sources of high network traffic. Monitor and compare the network throughput per server, if possible by tracking the number of network requests per second. Kafka.network: type=RequestMetrics, name=RequestsPerSec Important Metrics to Focus on Network error Rate Under-Replicated Partitions Cross referencing network throughput with related network error rates can help diagnose the reasons for latency. Error conditions include dropped network packets, error rates in responses per request type, and the types of error(s) occurring. Kafka.network: type=RequestMetrics, name=ErrorsPerSec To ensure data durability and that brokers are always available to deliver data , you can set a replication number per topic as applicable This metric alert you to cases where there are fewer than the minimum number of active brokers for a given topic. Kafka.server: type=ReplicaManager, name=UnderReplicatedPartitions
BANNER INFOGRAPHICInsert Your Subtitle Here Total broker Partitions 04 05 06 Simply knowing how many partitions a broker is managing can help you avoid errors and know when it’s time to scale out. The goal should be to keep the count balanced across brokers. Kafka.server: type=ReplicaManager, name=PartitionCount – Number of partitions on the brokers. Important Metrics to Focus on Log Flush Latency Consumer Message Rate Kafka Stores data by appending to existing log files .Cache based writes are flushed to physical storage. Your monitoring strategy should include combination of data replication and latency in the asynchronous disk log flush time. Kafka.log: type=LogFlushStats, name=LogFlushRateAndTimeMs Set baselines for expected consumer message throughput and measure fluctuations in the rate to detect latency and the need to scale the number of consumers up and down accordingly. Kafka.consumer type=ConsumerTopicMetrics, name=MessagePerSec, clientId=([-.w]+) Messages consumed per sec.
BANNER INFOGRAPHICInsert Your Subtitle Here Consumer Max Lag 07 08 Even with consumers fetching messages at a high rate, producers can still outspace them. This metrics works at the level of consumer and partition , means each partition in each topic has its own lag for a given consumer. Kafka.consumer: type=ConsumerFetcherManager, name=MaxLag, clientId=([-.w]+) Number of messages by which consumer lags behind the producer. Important Metrics to Focus on Fetcher Lag This metrics indicates the lag in the number of messages per follower replica, indicating that replication has potentially stopped or has been interrupted. Monitoring the replica.lag.time.max.ms configuration parameter you can measure the time for which the replica has not attempted to fetch new data from the leader. Kafka.server: type=FetcherLagMetrics, name=ConsumerLag, clientId=([-.w]+), partition=([0-9]+)
BANNER INFOGRAPHICInsert Your Subtitle Here Offline Partition Count 09 10 Offline partitions represent data stores unavailable to your application due to a server failure or restart. In kafka cluster one of the broker server acts as a controller for managing the states of partitions and replicas and to reassign partitions when needed. Kafka.controller: type=KafkaController, name=OfflinePartitionCount – Number of partitions without an active leader. Important Metrics to Focus on Free Memory and Swap space Usage Kafka performance is best when swapping is kept to minimum. To do this set the JVM max heap size large enough to avoid frequent garbage collecion activity, but small enough to allow space for filesystem caching . Additionally , watch for swap usage if you have swap enabled , watching for increases in server swapping activity, as this can lead to kafka operations timeout. In many cases its best to turn off swap entirely, we have to adjust our monitoring accordingly.
Introduction to Kafka Topic We can say that kafka topic is the same concept as a table in the database. But its definetly not a table and kafka isn’t a database. A topic is where data(messages) get published by the producer and pulled from by a consumer.
20XX STRATEGY Kafka Topic
ABOUT COMPANYBy default kafka topic being created with the replication-factor 1 and Partitions as 1 for a particular topic. Description Kafka Topic Configuration 1) For changing the configuration of partitions of the topic use --alter . ./kafka-topics.sh --zookeeper localhost:2181 --alter --topic sendInvitation --partitions 3 2) For changing the configuration of replication-factor of a topic : add a json script with the content provided in the next slide... Modify kafka topic configuration at runtime.
ABOUT COMPANY Kafka Topic Configuration 2) For changing the configuration of replication-factor of a topic : add a json script with the content provided below: Assume the script name is increase-replication-factor.json. {"version":1, "partitions":[ {"topic":"sendInvitation","partition":0,"replicas":[0,1,2]}, {"topic":"sendInvitation","partition":1,"replicas":[0,1,2]}, {"topic":"sendInvitation","partition":2,"replicas":[0,1,2]}, {"topic":"xyz","partition":0,"replicas":[0,1,2]}, {"topic":"xyz","partition":1,"replicas":[0,1,2]}, ]} Than execute the following command to run and apply this script: ./kafka-reassign-partitions --zookeeper localhost:2181 --reassignment-json-file increase-replication-factor.json --execute Modify kafka topic configuration at runtime.
References: ● https://kafka.apache.org/documentation/ ● https://sematext.com/blog/kafka-metrics-to-monitor
Contact me at: neelam@knoldus.com Thank You!

Removing performance bottlenecks with Kafka Monitoring and topic configuration

  • 1.
    Kafka Monitoring and Kafkatopic Configuration Presented By: Neelam Software Consultant Knoldus Inc.
  • 2.
    About Knoldus Knoldus isa technology consulting firm with focus on modernizing the digital systems at the pace your business demands. Functional. Reactive. Cloud Native DevOps
  • 3.
    01 Introduction tokafka monitoring 02 Why to monitor kafka 03 Important Metrics to focus on first 04 Kafka Topic Introduction Default kafka topic configuration Our Agenda 05 05 06 Modify topic configuration with Demo
  • 4.
    Introduction to Kafka Monitoring Apachekafka deals with transfering of large amount of real-time data( we can call it data in a motion). To assure end-to-end stream monitoring and every message is delivered from producer to consumer. How long messages take to be delivered, also determines the source of issue in your cluster. We can monitor kafka with the help of metrics. While monitoring kafka, it’s important to also monitor Zookeeper as kafka depends on it. LEARN NOW
  • 5.
    c Why to Monitor Kafka Kafkamonitoring is important to ensure timeliness of data delivery, overall application performance , knowing when to scale up , connectivity issues and ensuring data is not lost as we deal with streaming data. Volume of data is large and there are different components involved into kafka cluster which are: Producer , Consumer and Broker. To ensure every component is working fine. LEARN NOW
  • 6.
    BANNER INFOGRAPHICInsert YourSubtitle Here Network Request Rate01 02 03 Since the goal of kafka brokers is to gather and move for processing, they can also be sources of high network traffic. Monitor and compare the network throughput per server, if possible by tracking the number of network requests per second. Kafka.network: type=RequestMetrics, name=RequestsPerSec Important Metrics to Focus on Network error Rate Under-Replicated Partitions Cross referencing network throughput with related network error rates can help diagnose the reasons for latency. Error conditions include dropped network packets, error rates in responses per request type, and the types of error(s) occurring. Kafka.network: type=RequestMetrics, name=ErrorsPerSec To ensure data durability and that brokers are always available to deliver data , you can set a replication number per topic as applicable This metric alert you to cases where there are fewer than the minimum number of active brokers for a given topic. Kafka.server: type=ReplicaManager, name=UnderReplicatedPartitions
  • 7.
    BANNER INFOGRAPHICInsert YourSubtitle Here Total broker Partitions 04 05 06 Simply knowing how many partitions a broker is managing can help you avoid errors and know when it’s time to scale out. The goal should be to keep the count balanced across brokers. Kafka.server: type=ReplicaManager, name=PartitionCount – Number of partitions on the brokers. Important Metrics to Focus on Log Flush Latency Consumer Message Rate Kafka Stores data by appending to existing log files .Cache based writes are flushed to physical storage. Your monitoring strategy should include combination of data replication and latency in the asynchronous disk log flush time. Kafka.log: type=LogFlushStats, name=LogFlushRateAndTimeMs Set baselines for expected consumer message throughput and measure fluctuations in the rate to detect latency and the need to scale the number of consumers up and down accordingly. Kafka.consumer type=ConsumerTopicMetrics, name=MessagePerSec, clientId=([-.w]+) Messages consumed per sec.
  • 8.
    BANNER INFOGRAPHICInsert YourSubtitle Here Consumer Max Lag 07 08 Even with consumers fetching messages at a high rate, producers can still outspace them. This metrics works at the level of consumer and partition , means each partition in each topic has its own lag for a given consumer. Kafka.consumer: type=ConsumerFetcherManager, name=MaxLag, clientId=([-.w]+) Number of messages by which consumer lags behind the producer. Important Metrics to Focus on Fetcher Lag This metrics indicates the lag in the number of messages per follower replica, indicating that replication has potentially stopped or has been interrupted. Monitoring the replica.lag.time.max.ms configuration parameter you can measure the time for which the replica has not attempted to fetch new data from the leader. Kafka.server: type=FetcherLagMetrics, name=ConsumerLag, clientId=([-.w]+), partition=([0-9]+)
  • 9.
    BANNER INFOGRAPHICInsert YourSubtitle Here Offline Partition Count 09 10 Offline partitions represent data stores unavailable to your application due to a server failure or restart. In kafka cluster one of the broker server acts as a controller for managing the states of partitions and replicas and to reassign partitions when needed. Kafka.controller: type=KafkaController, name=OfflinePartitionCount – Number of partitions without an active leader. Important Metrics to Focus on Free Memory and Swap space Usage Kafka performance is best when swapping is kept to minimum. To do this set the JVM max heap size large enough to avoid frequent garbage collecion activity, but small enough to allow space for filesystem caching . Additionally , watch for swap usage if you have swap enabled , watching for increases in server swapping activity, as this can lead to kafka operations timeout. In many cases its best to turn off swap entirely, we have to adjust our monitoring accordingly.
  • 10.
    Introduction to Kafka Topic Wecan say that kafka topic is the same concept as a table in the database. But its definetly not a table and kafka isn’t a database. A topic is where data(messages) get published by the producer and pulled from by a consumer.
  • 11.
  • 12.
    ABOUT COMPANYBy defaultkafka topic being created with the replication-factor 1 and Partitions as 1 for a particular topic. Description Kafka Topic Configuration 1) For changing the configuration of partitions of the topic use --alter . ./kafka-topics.sh --zookeeper localhost:2181 --alter --topic sendInvitation --partitions 3 2) For changing the configuration of replication-factor of a topic : add a json script with the content provided in the next slide... Modify kafka topic configuration at runtime.
  • 13.
    ABOUT COMPANY Kafka TopicConfiguration 2) For changing the configuration of replication-factor of a topic : add a json script with the content provided below: Assume the script name is increase-replication-factor.json. {"version":1, "partitions":[ {"topic":"sendInvitation","partition":0,"replicas":[0,1,2]}, {"topic":"sendInvitation","partition":1,"replicas":[0,1,2]}, {"topic":"sendInvitation","partition":2,"replicas":[0,1,2]}, {"topic":"xyz","partition":0,"replicas":[0,1,2]}, {"topic":"xyz","partition":1,"replicas":[0,1,2]}, ]} Than execute the following command to run and apply this script: ./kafka-reassign-partitions --zookeeper localhost:2181 --reassignment-json-file increase-replication-factor.json --execute Modify kafka topic configuration at runtime.
  • 14.
  • 15.