Skip to content

Conversation

@kush-elastic
Copy link
Contributor

@kush-elastic kush-elastic commented Jul 8, 2024

Description

  • PostgreSQL supports several methods for logging server messages, including stderr, csvlog, jsonlog, and syslog. On Windows, eventlog is also supported. Set this parameter to a list of desired log destinations separated by commas. The default is to log to stderr only. This parameter can only be set in the postgresql.conf file or on the server command line.
    cf.https://www.postgresql.org/docs/current/runtime-config-logging.html

  • raw message example :

[3192491]: [50743-1] db=,user=,app=,client= WARNING: archiving write-ahead log file "00000001000018920000000D" failed too many times, will try again later 
2024-08-11 00:00:00 +03 [4124669]: [2-1] user=ejar_root,db=ejar3_entities_prod,app=[unknown],client=127.0.0.1 LOG: connection authorized: user=ejar_root database=ejar3_entities_prod SSL enabled (protocol=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384, bits=256, compression=off) 2024-08-11 00:00:00 +03 [4113464]: [3-1] user=ejar_root,db=ejar3_entities_prod,app=/usr/local/bundle/bin/puma,client=127.0.0.1 LOG: disconnection: session time: 0:13:57.268 user=ejar_root database=ejar3_entities_prod host=127.0.0.1 port=55684 2024-08-11 00:00:00 +03 [4116062]: [3-1] user=ejar_root,db=ejar3_notifications_prod,app=/usr/local/bundle/bin/puma,client=127.0.0.1 LOG: disconnection: session time: 0:10:49.700 user=ejar_root database=ejar3_notifications_prod host=127.0.0.1 port=54728 2024-08-11 00:00:00 +03 [4124673]: [1-1] user=[unknown],db=[unknown],app=[unknown],client=127.0.0.1 LOG: connection received: host=127.0.0.1 port=25012 
  • Added system tests for log data stream. Found input.type, log.flags and log.offset fields are not defined in the agent.yml file. Hence added it to agent.yml file.

Related Issues

@kush-elastic kush-elastic requested a review from a team as a code owner July 8, 2024 13:15
@kush-elastic kush-elastic changed the title fix Grok pattern Failure [PostgreSQL] Grok pattern Failure Jul 8, 2024
@kush-elastic kush-elastic self-assigned this Jul 8, 2024
@kush-elastic kush-elastic added Integration:postgresql PostgreSQL Team:Obs-InfraObs Observability Infrastructure Monitoring team [elastic/obs-infraobs-integrations] labels Jul 8, 2024
@elasticmachine
Copy link

elasticmachine commented Jul 8, 2024

🚀 Benchmarks report

To see the full report comment with /test benchmark fullreport

@harnish-crest-data harnish-crest-data self-requested a review July 9, 2024 05:41
Copy link
Contributor

@harnish-crest-data harnish-crest-data left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! 🚀

Please resolve merge conflicts!

@shmsr shmsr added the enhancement New feature or request label Jul 24, 2024
@kush-elastic
Copy link
Contributor Author

/test

@kush-elastic
Copy link
Contributor Author

/test

@shmsr shmsr changed the title [PostgreSQL] Grok pattern Failure [PostgreSQL] Enhance grok pattern Jul 29, 2024
ignore_missing: true
patterns:
- '^(\[%{NUMBER:process.pid:long}(-%{BASE16FLOAT:postgresql.log.session_line_number:long})?\] ((\[%{USERNAME:user.name}\]@\[%{POSTGRESQL_DB_NAME:postgresql.log.database}\]|%{USERNAME:user.name}@%{POSTGRESQL_DB_NAME:postgresql.log.database}) )?)?%{WORD:log.level}: (?:%{POSTGRESQL_ERROR:postgresql.log.sql_state_code}|%{SPACE})(duration: %{NUMBER:temp.duration:float} ms %{POSTGRESQL_QUERY_STEP}: %{GREEDYDATA:postgresql.log.query}|: %{GREEDYDATA:message}|%{GREEDYDATA:message})'
- '^(\[%{NUMBER:process.pid:long}(-%{BASE16FLOAT:postgresql.log.session_line_number:long})?\]:? ?)?(\[%{NUMBER:postgresql.log.session_line_number:long}(-%{BASE16FLOAT:postgresql.log.sequence_number:long})?\] )?((\[%{USERNAME:user.name}\]@\[%{POSTGRESQL_DB_NAME:postgresql.log.database}\]|%{USERNAME:user.name}@%{POSTGRESQL_DB_NAME:postgresql.log.database} )?)?((db=((\[%{DATA:postgresql.log.database}\])|%{DATA:postgresql.log.database})?,user=((\[%{DATA:user.name}\])|%{DATA:user.name})?,app=((\[%{DATA:postgresql.log.application_name}\])|%{DATA:postgresql.log.application_name})?,client=((\[%{DATA:postgresql.log.client_addr}\])|%{DATA:postgresql.log.client_addr})?)? ?)?%{WORD:log.level}: (?:%{POSTGRESQL_ERROR:postgresql.log.sql_state_code}|%{SPACE})(duration: %{NUMBER:temp.duration:float} ms %{POSTGRESQL_QUERY_STEP}: %{GREEDYDATA:postgresql.log.query}|: %{GREEDYDATA:message}|%{GREEDYDATA:message})'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curios why we are not considering timestamp? Any idea? As simple as this: ^%{TIMESTAMP_ISO8601:timestamp} %{TZ:timezone}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is already being extracted in default pipeline and then we are extracting more details from remaining raw message.

Co-authored-by: subham sarkar <sarkar.subhams2@gmail.com>
@andrewkroh andrewkroh added bugfix Pull request that fixes a bug issue and removed enhancement New feature or request labels Aug 20, 2024
@shmsr
Copy link
Member

shmsr commented Sep 3, 2024

AFAIK, we do not pack support for custom log formats in our integration packages, right?

# newer versions go on top
- version: "1.24.0"
changes:
- description: Fix log datastream grok pattern to support standard logs.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Along with standard, aren't you also handling the custom formats like the one mentioned in the SDH?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes we are!

harnish-crest-data and others added 4 commits September 3, 2024 12:12
Co-authored-by: subham sarkar <sarkar.subhams2@gmail.com>
Co-authored-by: subham sarkar <sarkar.subhams2@gmail.com>
Co-authored-by: subham sarkar <sarkar.subhams2@gmail.com>
Co-authored-by: subham sarkar <sarkar.subhams2@gmail.com>
@harnish-crest-data
Copy link
Contributor

harnish-crest-data commented Sep 3, 2024

AFAIK, we do not pack support for custom log formats in our integration packages, right?

Yes, we are not. But there are 3 new fields that we have seen in the client logs appeared in the two different SDHs that seems crucial. Also we saw the new timezones which is not the default like UTC/IST. We have supported those new timezones (CEST, EST, +02:00 and +02). Refer. The changes that we have made is not affecting our existing behavior. You can refer the generated pipeline tests.

@shmsr
Copy link
Member

shmsr commented Sep 3, 2024

AFAIK, we do not pack support for custom log formats in our integration packages, right?

Yes, we are not. But there are 3 new fields that we have seen in the client logs appeared in the two different SDHs that seems crucial. Also we saw the new timezones which is not the default like UTC/IST. We have supported those new timezones (CEST, EST, +02:00 and +02). Refer. The changes that we have made is not affecting our existing behavior. You can refer the generated pipeline tests.

For defaults, UTC is fine. But IST or any other shouldn't be considered a default. Also, if I see correctly there are more such time zones.

See how the "date" processor is considering the timezone: https://github.com/elastic/elasticsearch/blob/2871b94603655ff3b6120788728bfbf7c009d27f/modules/ingest-common/src/main/java/org/elasticsearch/ingest/common/DateProcessor.java#L96
Also, see this doc.

So, I was playing with the script and made some changes and noticed that some results in the pipeline tests are incorrect. I'll share the patch with you here, please confirm:

diff --git a/packages/postgresql/data_stream/log/elasticsearch/ingest_pipeline/default.yml b/packages/postgresql/data_stream/log/elasticsearch/ingest_pipeline/default.yml index 523134dfe9..23447adaad 100644 --- a/packages/postgresql/data_stream/log/elasticsearch/ingest_pipeline/default.yml +++ b/packages/postgresql/data_stream/log/elasticsearch/ingest_pipeline/default.yml @@ -33,43 +33,53 @@ processors: UTC as before. if: ctx._temp_?.timestamp != null source: |- - String get_timezone(def ctx) {  + String get_timezone(def ctx) { if (ctx.event?.timezone != null) { -  - // if timezone is in format of CEST,EST etc. - if (ctx.event.timezone.length() <= 4) { - // all time zone abbreviations need to be uppercase - return ctx.event.timezone.toUpperCase(); + String tz = ctx.event.timezone.trim().toUpperCase(); + + // Handle abbreviations (e.g., CEST, EST) + if (tz.length() <= 4) { + return tz; } - // if timezone is in format of +00:00, -00:00, +00 & -00 - if(ctx.event?.timezone.contains('+') || ctx.event?.timezone.contains('-')) { - // if timezone is in format of +00 or -00 - if (ctx.event.timezone.length() == 3) { - return "UTC"+ctx.event.timezone+":00"; - } - return "UTC"+ctx.event.timezone; + // Handle offset formats (+00:00, -00:00, +00, -00) + if (tz.startsWith('+') || tz.startsWith('-')) { + return tz.length() == 3 ? "UTC" + tz + ":00" : "UTC" + tz; } -  - return ctx.event.timezone; + + return tz; } -  + ctx.event.timezone = 'UTC'; return 'UTC'; } -  + def event_timezone = get_timezone(ctx); - if (!(event_timezone.contains('+')) && !(event_timezone.contains('-')) && !(event_timezone.length() > 4)) { - // timezone abbreviation e.g. CEST need to be put inside the timestamp - SimpleDateFormat sdf = new SimpleDateFormat("z"); - sdf.parse(event_timezone); - ctx._temp_.date_timezone = ZoneId.of(sdf.getTimeZone().getID(), ZoneId.SHORT_IDS).getId(); - ctx?._temp_.timestamp = ctx?._temp_.timestamp + " " + event_timezone; + boolean isAbbreviation = !(event_timezone.contains('+') || event_timezone.contains('-') || event_timezone.length() > 4); + + if (isAbbreviation) { + Map CUSTOM_ZONE_IDS = new HashMap(ZoneId.SHORT_IDS); + CUSTOM_ZONE_IDS.put("CEST", "Europe/Paris"); + CUSTOM_ZONE_IDS.put("CET", "Europe/Paris"); + CUSTOM_ZONE_IDS.put("BST", "Europe/London"); + CUSTOM_ZONE_IDS.put("IST", "Asia/Kolkata"); + CUSTOM_ZONE_IDS.put("EEST", "Europe/Athens"); + CUSTOM_ZONE_IDS.put("EET", "Europe/Athens"); + CUSTOM_ZONE_IDS.put("AEST", "Australia/Sydney"); + CUSTOM_ZONE_IDS.put("AEDT", "Australia/Sydney"); + CUSTOM_ZONE_IDS.put("ACST", "Australia/Adelaide"); + CUSTOM_ZONE_IDS.put("ACDT", "Australia/Adelaide"); + CUSTOM_ZONE_IDS.put("AWST", "Australia/Perth"); + + // Timezone abbreviation (e.g., CEST) needs to be put inside the timestamp + ZoneId zoneId = ZoneId.of(event_timezone, CUSTOM_ZONE_IDS); + ctx._temp_.date_timezone = zoneId.getId(); + ctx?._temp_.timestamp += " " + event_timezone; } else { - // timezone is either abbreviation+-offset e.g. UTC+1 or long representation - // e.g. Europe/Athens needs to be put as a ZoneId and *not* inside the timestamp + // Timezone is either abbreviation+-offset (e.g., UTC+1) or long representation (e.g., Europe/Athens) ctx._temp_.date_timezone = event_timezone; - }  + } + - pipeline: name: '{{ IngestPipeline "pipeline-log" }}' if: ctx.separator != ',' && ctx.separator != ':' @@ -86,8 +96,8 @@ processors: formats: - yyyy-MM-dd HH:mm:ss.SSS zz - yyyy-MM-dd HH:mm:ss zz - - yyyy-MM-dd HH:mm:ss.SSS  - - yyyy-MM-dd HH:mm:ss  + - yyyy-MM-dd HH:mm:ss.SSS + - yyyy-MM-dd HH:mm:ss on_failure: - set: field: error.message diff --git a/packages/postgresql/data_stream/log/elasticsearch/ingest_pipeline/pipeline-log.yml b/packages/postgresql/data_stream/log/elasticsearch/ingest_pipeline/pipeline-log.yml index 7e697a59cc..51693ff1e3 100644 --- a/packages/postgresql/data_stream/log/elasticsearch/ingest_pipeline/pipeline-log.yml +++ b/packages/postgresql/data_stream/log/elasticsearch/ingest_pipeline/pipeline-log.yml @@ -16,11 +16,11 @@ processors: - kv: tag: kv_log_database_info field: _temp_.database_connection_str - target_field: _temp_.database_connection_obj  + target_field: _temp_.database_connection_obj field_split: ',' value_split: '=' ignore_missing: true - strip_brackets: true  + strip_brackets: true description: Splits PostgreSQL log info string into key-value pairs. ignore_failure: true on_failure: @@ -32,16 +32,16 @@ processors: - rename: field: _temp_.database_connection_obj.app target_field: postgresql.log.application_name - ignore_missing: true  + ignore_missing: true - rename: field: _temp_.database_connection_obj.client target_field: postgresql.log.client_addr - ignore_missing: true  + ignore_missing: true - rename: field: _temp_.database_connection_obj.db target_field: postgresql.log.database - ignore_missing: true  + ignore_missing: true - rename: field: _temp_.database_connection_obj.user target_field: user.name - ignore_missing: true  \ No newline at end of file + ignore_missing: true \ No newline at end of file

This is a diff patch. You can directly apply this using git commands.

Example failures:

Search for @timestamp": "2017-07-31T13:36:43.000-04:00 in postgresql/log test-postgresql-9-6-new-timestamp.log
and if you notice the timezone is "EST" for which TZ offset is -05:00 but in the timestamp it is -04:00.

Also, had to introduce a custom map because https://docs.oracle.com/javase/8/docs/api/java/time/ZoneId.html#SHORT_IDS only supports a few; not all.

@shmsr
Copy link
Member

shmsr commented Sep 3, 2024

Except for my previous comment, the rest of the changes look good.

@harnish-crest-data
Copy link
Contributor

@shmsr Thanks for the suggestion. Based on your provided changes I have implemented the pipeline tests, here are the result of the current pipeline behavior vs suggested behavior and please observe the @timestamp field. Technically both are giving the same output and the pipeline test were correct in the current behavior. Let me do one round of live data collection of the suggested changes and will push it after that!

Current behaviour

{ "@timestamp": "2017-07-31T13:36:43.557-04:00", "_temp_": { "date_timezone": "America/New_York", "timestamp": "2017-07-31 13:36:43.557 EST" } } 

Suggested behavior

{ "@timestamp": "2017-07-31T12:36:43.557-05:00", "_temp_": { "date_timezone": "-05:00", "timestamp": "2017-07-31 13:36:43.557 EST" } } 
ignore_missing: true
- rename:
field: _temp_.database_connection_obj.client
target_field: postgresql.log.client_addr
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this be converted to ip type? Also would it make sense here to remove/cleanup _temp_ after you have finished renaming these fields?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this be converted to ip type?

There are values like local for this field. Also there are IP values as well. I have updated the pipeline that can set IP values to related.ip field and the keyword values can be set to related.host field.

Also would it make sense here to remove/cleanup temp after you have finished renaming these fields?

Yes, after renaming the fields there are some fields still remaining that needed cleanup in _temp_. Hence removed _temp_.

@elasticmachine
Copy link

💚 Build Succeeded

History

cc @kush-elastic

@shmsr
Copy link
Member

shmsr commented Sep 16, 2024

#10412 (comment)

here are the result of the current pipeline behavior vs suggested behavior and please observe the @timestamp field. Technically both are giving the same output and the pipeline test were correct in the current behavior.

Didn't understand this.

Example failures:

Search for @timestamp": "2017-07-31T13:36:43.000-04:00 in postgresql/log test-postgresql-9-6-new-timestamp.log
and if you notice the timezone is "EST" for which TZ offset is -05:00 but in the timestamp it is -04:00.

Did you check this ^^? Previously, EST's TZ offset was shown as -05:00 but if you search EST is actually -04:00

Also, let me take a look at the changes again. I have lost some context over 2 weeks. And sorry for missing out the re-review; didn't get the notification.

@harnish-crest-data
Copy link
Contributor

#10412 (comment)

here are the result of the current pipeline behavior vs suggested behavior and please observe the @timestamp field. Technically both are giving the same output and the pipeline test were correct in the current behavior.

Didn't understand this.

Example failures:

Search for @timestamp": "2017-07-31T13:36:43.000-04:00 in postgresql/log test-postgresql-9-6-new-timestamp.log
and if you notice the timezone is "EST" for which TZ offset is -05:00 but in the timestamp it is -04:00.

Did you check this ^^? Previously, EST's TZ offset was shown as -05:00 but if you search EST is actually -04:00

Yes @shmsr I have checked your previous comment and as shown in above comment, both timestamps are logically same if you notice the hour of that timestamp. In the -04:00 the hour was 13:36 and in the -05:00 the hour was 12:36.

I have already made changes as per your suggestions which you can see in the updated PR.

@shmsr
Copy link
Member

shmsr commented Sep 17, 2024

#10412 (comment)

here are the result of the current pipeline behavior vs suggested behavior and please observe the @timestamp field. Technically both are giving the same output and the pipeline test were correct in the current behavior.

Didn't understand this.

Example failures:

Search for @timestamp": "2017-07-31T13:36:43.000-04:00 in postgresql/log test-postgresql-9-6-new-timestamp.log
and if you notice the timezone is "EST" for which TZ offset is -05:00 but in the timestamp it is -04:00.

Did you check this ^^? Previously, EST's TZ offset was shown as -05:00 but if you search EST is actually -04:00

Yes @shmsr I have checked your previous comment and as shown in above comment, both timestamps are logically same if you notice the hour of that timestamp. In the -04:00 the hour was 13:36 and in the -05:00 the hour was 12:36.

I have already made changes as per your suggestions which you can see in the updated PR.

Thanks Harnish. But I am thinking of this: https://stackoverflow.com/a/68821105/5821408

For 3 letter timezones, it is tricky, that's why 4 letter timezones are there to clear it up. For example:

For compatibility with JDK 1.1.x, some other three-letter time zone IDs (such as "PST", "CTT", "AST") are also supported. However, their use is deprecated because the same abbreviation is often used for multiple time zones (for example, "CST" could be U.S. "Central Standard Time" and "China Standard Time"), and the Java platform can then only recognize one of them.

Same for IST; it could be Indian Standard Time or maybe even Israel Standard Time. And depending on what we choose to map, we will get the offset accordingly.

Change looks good. I don't think we can do much here. In case, there's ever an SDH/ clarification required, we can point them to assumed defaults. Approving this PR.

@harnish-crest-data harnish-crest-data self-assigned this Sep 17, 2024
@harnish-crest-data harnish-crest-data merged commit 9797724 into elastic:main Sep 18, 2024
@shaunbugler
Copy link

How does one go about adding to the list of CUSTOM_ZONE_IDS? Since this update we have been receiving:
Unknown time-zone ID: SAST
the example postgresql.log.timestamp is 2024-11-22 11:26:25.482 SAST

If I update the logs-postgresql.log-1.25.0 ingest pipeline and add:
CUSTOM_ZONE_IDS.put(\"SAST\", \"Africa/Johannesburg\");
then my problems are solved, until the pipeline is updated, so is there a better way to handle this? Thanks!

harnish-crest-data added a commit to chavdaharnish/integrations that referenced this pull request Feb 4, 2025
* fix Grok pattern Failure Co-authored-by: harnish-elastic <harnish.chavda@elastic.co> Co-authored-by: Harnish Chavda <118714680+harnish-elastic@users.noreply.github.com>
harnish-crest-data added a commit to chavdaharnish/integrations that referenced this pull request Feb 5, 2025
* fix Grok pattern Failure Co-authored-by: harnish-elastic <harnish.chavda@elastic.co> Co-authored-by: Harnish Chavda <118714680+harnish-elastic@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bugfix Pull request that fixes a bug issue Integration:postgresql PostgreSQL Team:Obs-InfraObs Observability Infrastructure Monitoring team [elastic/obs-infraobs-integrations]

8 participants