This documentation is for an unreleased version of Apache Flink. We recommend you use the latest stable version.
Events #
Flink exposes a event reporting system that allows gathering and exposing events to external systems.
Reporting events #
You can access the event system from any user function that extends RichFunction by calling getRuntimeContext().getMetricGroup(). This method returns a MetricGroup object via which you can report a new single event.
Reporting single Event #
An Event represents something that happened in Flink at certain point of time, that will be reported to a TraceReporter. To report an Event you can use the MetricGroup#addEvent(EventBuilder) method.
public class MyClass { void doSomething() { // (...) metricGroup.addEvent( Event.builder(MyClass.class, "SomeEvent") .setObservedTsMillis(observedTs) // Optional .setAttribute("foo", "bar")); // Optional } } Currently reporting Events from Python is not supported. Reporter #
For information on how to set up Flink’s event reporters please take a look at the event reporters documentation.
System traces #
Flink reports events listed below.
The tables below generally feature 5 columns:
-
The “Scope” column describes what is that trace reported scope.
-
The “Name” column describes the name of the reported trace.
-
The “Attributes” column lists the names of all attributes that are reported with the given trace.
-
The “Description” column provides information as to what a given attribute is reporting.
| Scope | Name | Severity | Attributes | Description |
|---|---|---|---|---|
| org.apache.flink.runtime.checkpoint.CheckpointStatsTracker | CheckpointEvent | INFO | ||
| observedTs | Timestamp when the checkpoint has finished. | |||
| checkpointId | Id of the checkpoint. | |||
| checkpointedSize | Size in bytes of checkpointed state during this checkpoint. Might be smaller than fullSize if incremental checkpoints are used. | |||
| fullSize | Full size in bytes of the referenced state by this checkpoint. Might be larger than checkpointSize if incremental checkpoints are used. | |||
| checkpointStatus | What was the state of this checkpoint: FAILED or COMPLETED. | |||
| checkpointType | Type of the checkpoint. For example: "Checkpoint", "Full Checkpoint" or "Terminate Savepoint" ... | |||
| isUnaligned | Whether checkpoint was aligned or unaligned. | |||
| org.apache.flink.runtime.jobmaster.JobMaster | JobStatusChangeEvent | INFO | ||
| observedTs | Timestamp when the job's status has changed. | |||
| newJobStatus | New job status that is being reported by this event. | |||
| org.apache.flink.runtime.scheduler.adaptive.JobFailureMetricReporter | JobFailureEvent | INFO | ||
| observedTs | Timestamp when the job has failed. | |||
| canRestart | (optional) Whether the failure is terminal. | |||
| isGlobalFailure | (optional) Whether the failure is global. Global failover requires all tasks to failver. | |||
| failureLabel.KEY | (optional) For every failure label attached to this failure with a given KEY, the value of that label is attached as an attribute value. | |||
| org.apache.flink.runtime.scheduler.metrics.AllSubTasksRunningOrFinishedStateTimeMetrics | AllSubtasksStatusChangeEvent(streaming jobs only) | INFO | ||
| observedTs | Timestamp when all subtasks reached given status. | |||
| status | ALL_RUNNING_OR_FINISHED means all subtasks are RUNNING or have already FINISHED | |||
| NOT_ALL_RUNNING_OR_FINISHED means at least one subtask has switched away from RUNNING or FINISHED, after previously ALL_RUNNING_OR_FINISHED being reported |