Skip to content

Conversation

@mohitjha-elastic
Copy link
Collaborator

@mohitjha-elastic mohitjha-elastic commented Jun 24, 2025

Proposed commit message

elastic_security: Initial release of the package. Initial release with alert data stream, associated dashboards, and ingest pipelines. This package facilitates transferring security alert data from another Elasticsearch instance to your own. API integration was implemented as per the official documentation[1], and test samples were created using sanitized live data [1] https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-search 

Checklist

  • I have reviewed tips for building integrations and this pull request is aligned with them.
  • I have verified that all data streams collect metrics or logs.
  • I have added an entry to my package's changelog.yml file.
  • I have verified that Kibana version constraints are current according to guidelines.
  • I have verified that any added dashboard complies with Kibana's Dashboard good practices

How to test this PR locally

  • Clone integrations repo.
  • Install elastic package locally.
  • Start elastic stack using elastic-package.
  • Move to integrations/packages/elastic_security directory.
  • Run the following command to run tests.

elastic-package test -v

Screenshots

image

-Logs-Elastic-Security-Alert-Elastic-06-30-2025_05_20_PM

event-id-1

event-id-2
config-params

Related Issue

Add initial release of elastic_security package with a single data stream named alert. This also contains dashboards, ingest pipelines, tests, and readme.
@mohitjha-elastic mohitjha-elastic self-assigned this Jun 24, 2025
@mohitjha-elastic mohitjha-elastic requested a review from a team as a code owner June 24, 2025 09:35
@mohitjha-elastic mohitjha-elastic added dashboard Relates to a Kibana dashboard bug, enhancement, or modification. New Integration Issue or pull request for creating a new integration package. Team:Security-Service Integrations Security Service Integrations team [elastic/security-service-integrations] Team:Sit-Crest Crest developers on the Security Integrations team [elastic/sit-crest-contractors] labels Jun 24, 2025
@elasticmachine
Copy link

Pinging @elastic/security-service-integrations (Team:Security-Service Integrations)

@kcreddy kcreddy requested a review from a team June 24, 2025 09:46
@elastic-vault-github-plugin-prod

🚀 Benchmarks report

To see the full report comment with /test benchmark fullreport

@narph narph requested review from jamesspi and peluja1012 June 25, 2025 10:07
Event kind has been updated to alert from signal.
fields:
- name: ancestry
type: keyword
- name: args
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aren't some of these fields already part of ECS? Can you identify them and remove?
Same for other top level fields that are already in ECS.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some fields are being received as string values instead of the expected array format, so we need to explicitly define them in fields.yml.
This behavior has been observed consistently across events.

Copy link
Member

@andrewkroh andrewkroh Jul 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some fields are being received as string values instead of the expected array format

I don't really understand this justification. What do you mean by "expected array" format? Is that specifically for process.args? Many of these fields can simply change to using external: ecs without affecting the field type.

Fields that should use external: ecs
field ECS type
data_stream.type constant_keyword
data_stream.dataset constant_keyword
data_stream.namespace constant_keyword
@timestamp date
process.args_count long
process.entity_id keyword
process.entry_leader.args keyword
process.entry_leader.args_count long
process.entry_leader.entity_id keyword
process.entry_leader.entry_meta.type keyword
process.entry_leader.executable keyword
process.entry_leader.group.name keyword
process.entry_leader.interactive boolean
process.entry_leader.name keyword
process.entry_leader.parent.entity_id keyword
process.entry_leader.parent.pid long
process.entry_leader.pid long
process.entry_leader.real_group.name keyword
process.entry_leader.real_user.name keyword
process.entry_leader.same_as_process boolean
process.entry_leader.user.name keyword
process.entry_leader.working_directory keyword
process.executable keyword
process.group.name keyword
process.group_leader.args keyword
process.group_leader.args_count long
process.group_leader.entity_id keyword
process.group_leader.executable keyword
process.group_leader.group.name keyword
process.group_leader.interactive boolean
process.group_leader.name keyword
process.group_leader.pid long
process.group_leader.real_group.name keyword
process.group_leader.real_user.name keyword
process.group_leader.same_as_process boolean
process.group_leader.supplemental_groups.name keyword
process.group_leader.user.name keyword
process.group_leader.working_directory keyword
process.hash.md5 keyword
process.hash.sha1 keyword
process.hash.sha256 keyword
process.interactive boolean
process.name keyword
process.parent.args keyword
process.parent.args_count long
process.parent.entity_id keyword
process.parent.executable keyword
process.parent.group.name keyword
process.parent.interactive boolean
process.parent.name keyword
process.parent.pid long
process.parent.real_group.name keyword
process.parent.real_user.name keyword
process.parent.supplemental_groups.name keyword
process.parent.user.name keyword
process.parent.working_directory keyword
process.pid long
process.previous.args keyword
process.previous.args_count long
process.previous.executable keyword
process.real_group.name keyword
process.real_user.name keyword
process.session_leader.args keyword
process.session_leader.args_count long
process.session_leader.entity_id keyword
process.session_leader.executable keyword
process.session_leader.group.name keyword
process.session_leader.interactive boolean
process.session_leader.name keyword
process.session_leader.pid long
process.session_leader.real_group.name keyword
process.session_leader.real_user.name keyword
process.session_leader.same_as_process boolean
process.session_leader.supplemental_groups.name keyword
process.session_leader.user.name keyword
process.session_leader.working_directory keyword
process.supplemental_groups.name keyword
process.user.name keyword
process.working_directory keyword
threat.tactic.id keyword
threat.tactic.reference keyword
threat.tactic.name keyword
threat.technique.id keyword
threat.technique.name keyword
threat.technique.reference keyword
threat.technique.subtechnique.id keyword
threat.technique.subtechnique.name keyword
threat.technique.subtechnique.reference keyword

This can be fixed by running

go run github.com/andrewkroh/fydler@main -a useecs -fix packages/elastic_security/**/fields/*.yml

However, the more concerning part are the fields that are declared in conflict with ECS. I think these need to be fixed to use external: ecs which will change their type. Most of these changes will be widening the data type except for the keyword to date changes. Here's a summary:

field type ECS type
process.command_line keyword wildcard
process.entry_leader.group.id long keyword
process.entry_leader.parent.start keyword date
process.entry_leader.real_group.id long keyword
process.entry_leader.real_user.id long keyword
process.entry_leader.start keyword date
process.entry_leader.user.id long keyword
process.group.id long keyword
process.group_leader.group.id long keyword
process.group_leader.real_group.id long keyword
process.group_leader.real_user.id long keyword
process.group_leader.start keyword date
process.group_leader.supplemental_groups.id long keyword
process.group_leader.user.id long keyword
process.parent.command_line keyword wildcard
process.parent.group.id long keyword
process.parent.real_group.id long keyword
process.parent.real_user.id long keyword
process.parent.start keyword date
process.parent.supplemental_groups.id long keyword
process.parent.user.id long keyword
process.real_group.id long keyword
process.real_user.id long keyword
process.session_leader.group.id long keyword
process.session_leader.real_group.id long keyword
process.session_leader.real_user.id long keyword
process.session_leader.start keyword date
process.session_leader.supplemental_groups.id long keyword
process.session_leader.user.id long keyword
process.start keyword date
process.supplemental_groups.id long keyword
process.user.id long keyword
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@andrewkroh
We're seeing two types of issues or behaviors here:

  1. Data type mismatches: Some ECS fields are appearing with a different data type than expected. For example, process.group.id is coming in as an integer in the raw logs, whereas ECS expects it to be a string.

  2. Structure mismatches: Some fields, like threat.techniques, are coming through as arrays of objects in the raw logs, while ECS expects them to be explicitly defined as group or nested types.

Example here -

{ "threat": [ { "framework": "MITRE ATT&CK", "technique": [ { "reference": "https://attack.mitre.org/techniques/T1059/", "name": "Command and Scripting Interpreter", "subtechnique": [ { "reference": "https://attack.mitre.org/techniques/T1059/004/", "name": "Unix Shell", "id": "T1059.004" }, { "reference": "https://attack.mitre.org/techniques/T1059/006/", "name": "Python", "id": "T1059.006" } ], "id": "T1059" } ], "tactic": { "reference": "https://attack.mitre.org/tactics/TA0002/", "name": "Execution", "id": "TA0002" } }, { "framework": "MITRE ATT&CK", "technique": [ { "reference": "https://attack.mitre.org/techniques/T1132/", "name": "Data Encoding", "subtechnique": [ { "reference": "https://attack.mitre.org/techniques/T1132/001/", "name": "Standard Encoding", "id": "T1132.001" } ], "id": "T1132" } ] } ] } 

For the first issue, even if we reference the external:ecs definition, we'll still encounter data type mismatch errors. To resolve this, we can add a script in the ingest pipeline to convert such fields (e.g., integers to strings) and then remove these ECS definitions from the fields.yml as definitions will be handled by the dynamic ECS imports.

For the second issue, the structure itself is incompatible. One possible solution is that we'll need to handle these cases in the pipeline as well—likely by flattening or reformatting the data to match the ECS schema.

Let me know your thoughts on this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Data type mismatches: Some ECS fields are appearing with a different data type than expected. For example, process.group.id is coming in as an integer in the raw logs, whereas ECS expects it to be a string.

The Elasticsearch data type is the primary concern. The integration reads the event structure from _source, which might not reflect how the data was indexed in the source cluster.

For instance, it's common for fields to be numbers in the JSON _source and stored as keyword types. However, in elastic-package, this causes a validation issue because we prefer _source to match the ES data type (e.g., JSON string for a keyword). To resolve this, you can either apply a convert(type: string) processor or configure elastic-package to ignore numeric fields using numeric_keyword_fields1.

then remove these ECS definitions from the fields.yml as definitions will be handled by the dynamic ECS imports.

Do not remove the static ECS definitions. Keep them. They are generally stronger (because they don't rely on match_mapping_type) and provide documentation.


Structure mismatches: Some fields, like threat.techniques, are coming through as arrays of objects in the raw logs, while ECS expects them to be explicitly defined as group or nested types.

Again, I believe this relates to how the _source is structured versus how the data was actually indexed and how it will be indexed in the target cluster.

If you send the data as-is to an index that contains mappings for the ECS threat fields, it should be mapped correctly. This is because Elasticsearch automatically flattens2 arrays of objects. Did you encounter any errors while trying to use the ECS definitions with this JSON structure?

Footnotes

  1. https://github.com/elastic/elastic-package/blob/f8f2f15a04bcc25eca00887fb147bd7f8a0f32b3/docs/howto/pipeline_testing.md#test-configuration

  2. https://www.elastic.co/guide/en/elasticsearch/reference/8.18/nested.html#nested-arrays-flattening-objects

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the suggestion. @andrewkroh
Will ignore the numeric fields using numeric_keyword_fields and use external: ecs to resolve the data type mismatch issue.

Regarding the second issue on structure mismatch, using the ECS definitions with the provided JSON structure results in an error from elastic-package, as shown in the attached message.
ECS expects the threat.technique fields to be either group or nested but its coming as an array of objects.
image

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for sharing the error. That elastic-package validation seems counter productive given that it would force the pipeline to implement something that Elasticsearch does automatically. I don't see any way to disable the check. So I think a comment in the file explaining why the declaration for the threat.* fields exists, and why they are not using external: ecs is in order.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you send the data as-is to an index that contains mappings for the ECS threat fields, it should be mapped correctly. This is because Elasticsearch automatically flattens2 arrays of objects.

@andrewkroh, since this automatic flattening results in losing relationships/associations between fields, is it better to make them nested instead and keep the associations for threat fields?
@mohitjha-elastic, if you don't add external: ecs for any ECS fields, please manually add field's description:.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since this automatic flattening results in losing relationships/associations between fields, is it better to make them nested instead and keep the associations for threat fields?

I would defer to ECS on this. ECS does not indicate to use nested on the threat, only on the threat.enrichments. So it must be that the associations are not essential (if they are then we need to change ECS).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @andrewkroh and @kcreddy!!
Here is the Issue and the PR.

1. Update descriptions of config parameters. 2. Add saved search in dashboard. 3. Update query parameter in data collection, moved it to the request body. 4. Preserve event.original value from message field. 5. Update readme.
@mohitjha-elastic mohitjha-elastic requested a review from kcreddy June 30, 2025 12:12
@@ -0,0 +1,1122 @@
{
"@timestamp": "2060-06-09T13:56:03.205Z",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The data is fetched starting from that interval, and pagination happens based on the @timestamp of the incoming data. Therefore, the starting time should be in the future rather than the past.

If you set @timestamp inside the config.yml to reasonable past dates 2022, 2023, etc. you will still be able to achieve same result because your cursor.last_timestamp is based on those values.

optional.of(body.hits.hits.map(e, timestamp(e._source['@timestamp'])).max())
.

@andrewkroh andrewkroh added the documentation Improvements or additions to documentation. Applied to PRs that modify *.md files. label Jul 1, 2025
1. Update readme. 2. Shorted system test data for documentation. 3. Add some safety checks in cel code. 4. Replace pipeline script with remove processor.
@mohitjha-elastic mohitjha-elastic requested review from efd6 and kcreddy July 7, 2025 10:30
@elasticmachine
Copy link

💚 Build Succeeded

History

cc @mohitjha-elastic

Copy link
Contributor

@efd6 efd6 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM but please wait for @kcreddy.

Copy link
Contributor

@kcreddy kcreddy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM for the review comments.

@mohitjha-elastic, can you also confirm the issues discussed via DMs, namely mismatched event.severity, incorrect @timestamp, and duplicate event.id are all fixed now?

@mohitjha-elastic
Copy link
Collaborator Author

LGTM for the review comments.

@mohitjha-elastic, can you also confirm the issues discussed via DMs, namely mismatched event.severity, incorrect @timestamp, and duplicate event.id are all fixed now?

@kcreddy Sorry I left that issue unconcluded.

I investigated and confirmed that the logs at both the source and destination Elasticsearch instances are identical — same timestamp, severity, etc.
It turns out there are two entries for the same event ID in the source instance: one in the original index (e.g., .ds-logs-endpoint.events.process*) and another in the .internal.alerts-security.alerts* index. Since we're ingesting data only from the alert index, the ingested data at the destination matches what's present in that alert index.

Source Instance (From which we are ingesting the data) -
client-instance

Destination Instance (In which we are ingesting the data) -
our-instance

Copy link
Contributor

@kcreddy kcreddy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@mohitjha-elastic mohitjha-elastic merged commit 55805d6 into elastic:main Jul 9, 2025
9 checks passed
@mohitjha-elastic mohitjha-elastic deleted the elastic_security-0.1.0 branch July 9, 2025 18:12
@elastic-vault-github-plugin-prod

Package elastic_security - 0.1.0 containing this change is available at https://epr.elastic.co/package/elastic_security/0.1.0/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

dashboard Relates to a Kibana dashboard bug, enhancement, or modification. documentation Improvements or additions to documentation. Applied to PRs that modify *.md files. Integration:elastic_security Elastic Security New Integration Issue or pull request for creating a new integration package. Team:Security-Service Integrations Security Service Integrations team [elastic/security-service-integrations] Team:Sit-Crest Crest developers on the Security Integrations team [elastic/sit-crest-contractors]

5 participants