Building the High Speed Cyber Security Data Pipeline Using Apache NiFi Praveen Kanumarlapudi
Cyber Security
60% of Small Businesses Fold Within 6 Months of a Cyber Attack.
How to make it success ?
Global Security Key Stake Holders Security Operations Center Data Scientists Data Analysts Executives An information security operations center ("ISOC" or "SOC") is a facility where enterprise information systems (websites, applications, databases, data centers and servers, networks, desktops and other endpoints) are monitored, assessed, and defended. Technology : SIEM Security data scientists have the skills to understand complex algorithms and build advanced models for threat and anomaly detection and applying these concepts to real security data sets in single or clustered environments. Technology : Python, R, Big Data, Spark/Scala or MATLAB… Map and trace the data from system to system for solving a given business or incident problem. Design and create data reports using various reporting tools that help business executive to make better decisions. Implements new metrics for business (KPIs) Technology : SQL, SIEM, Big Data, Reporting tools CSO’s, CISO’s
Cyber Security ‘BIG data’ challenges • Speed , Volume and Variety  Data Ingestion  Cleansing  Transformation • data reliance  Executives – KPI Metrics  Data scientists  SOC  Data Analysts • Real-Time context
A couple of years Ago ! Network logs Web logs AD Logs Infrastructure logs Application Logs Threat Intel 3rd Party RG RDBMS unstructured(semi)structured Syslog servers SIEM APP Sqoop PySpark SIEM Tool Data Source Ingestion Integration Delivery Flume UBA Tools SOCDataScienceKPI/Reporting
Challenges • Complexity of Architecture • Debugging • Data Source Dependencies • Lack of Centralized logging • Multiple Data Copies • Stress on Network • Transformations with respect to destination
Solution Framework  Single Data entry point – avoids network traffic and duplicate data flowing around  Transformations according destination – reduces the reliance on source  Should be capable of handling different formats and different sources Ingest Clean/Route Transform for 1 Transform for 2 Route to 1 Route to 2 Archive
Deployment Models Data Sources
Challenges  Good architectural understanding of all systems  Good amount of coding effort  Long development hours  Maintenance overheads  Maintain the sync between the systems  Provenance
• Guaranteed delivery • Processors that supports multiple formats • Ease to develop the flows and deploy in minutes • Open Source and rich community
The Data Gateway Network logs Web logs AD Logs Infrastructure logs Application Logs Threat Intel 3rd Party RG RDBMS unstructured(semi)structured Data Source Data Gateway Delivery SOCDataScienceKPI/Reporting SOC
Sample Flow
To Azure
To Splunk
Grafana dashboard – Last 7 days
Yesterday
In the middle of the day
Metrics  100+ production flows  ~ 20 Billion events  1000+ Transformations
Next ?  MiNiFi  Stateless NiFi  Registry  SAM  Real-Time Model training  CI/CD, NiFi API’s
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi

Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi

  • 1.
    Building the HighSpeed Cyber Security Data Pipeline Using Apache NiFi Praveen Kanumarlapudi
  • 2.
  • 3.
    60% of Small BusinessesFold Within 6 Months of a Cyber Attack.
  • 4.
    How to makeit success ?
  • 5.
    Global Security KeyStake Holders Security Operations Center Data Scientists Data Analysts Executives An information security operations center ("ISOC" or "SOC") is a facility where enterprise information systems (websites, applications, databases, data centers and servers, networks, desktops and other endpoints) are monitored, assessed, and defended. Technology : SIEM Security data scientists have the skills to understand complex algorithms and build advanced models for threat and anomaly detection and applying these concepts to real security data sets in single or clustered environments. Technology : Python, R, Big Data, Spark/Scala or MATLAB… Map and trace the data from system to system for solving a given business or incident problem. Design and create data reports using various reporting tools that help business executive to make better decisions. Implements new metrics for business (KPIs) Technology : SQL, SIEM, Big Data, Reporting tools CSO’s, CISO’s
  • 6.
    Cyber Security ‘BIGdata’ challenges • Speed , Volume and Variety  Data Ingestion  Cleansing  Transformation • data reliance  Executives – KPI Metrics  Data scientists  SOC  Data Analysts • Real-Time context
  • 7.
    A couple ofyears Ago ! Network logs Web logs AD Logs Infrastructure logs Application Logs Threat Intel 3rd Party RG RDBMS unstructured(semi)structured Syslog servers SIEM APP Sqoop PySpark SIEM Tool Data Source Ingestion Integration Delivery Flume UBA Tools SOCDataScienceKPI/Reporting
  • 8.
    Challenges • Complexity ofArchitecture • Debugging • Data Source Dependencies • Lack of Centralized logging • Multiple Data Copies • Stress on Network • Transformations with respect to destination
  • 9.
    Solution Framework  SingleData entry point – avoids network traffic and duplicate data flowing around  Transformations according destination – reduces the reliance on source  Should be capable of handling different formats and different sources Ingest Clean/Route Transform for 1 Transform for 2 Route to 1 Route to 2 Archive
  • 10.
  • 11.
    Challenges  Good architecturalunderstanding of all systems  Good amount of coding effort  Long development hours  Maintenance overheads  Maintain the sync between the systems  Provenance
  • 12.
    • Guaranteed delivery •Processors that supports multiple formats • Ease to develop the flows and deploy in minutes • Open Source and rich community
  • 13.
    The Data Gateway Networklogs Web logs AD Logs Infrastructure logs Application Logs Threat Intel 3rd Party RG RDBMS unstructured(semi)structured Data Source Data Gateway Delivery SOCDataScienceKPI/Reporting SOC
  • 14.
  • 15.
  • 16.
  • 17.
  • 18.
  • 19.
    In the middleof the day
  • 20.
    Metrics  100+ productionflows  ~ 20 Billion events  1000+ Transformations
  • 21.
    Next ?  MiNiFi Stateless NiFi  Registry  SAM  Real-Time Model training  CI/CD, NiFi API’s

Editor's Notes

  • #4 60 Percent of Small Businesses went out of business in just 6 months after cyber attacks