Scientific Journal Impact Factor (SJIF): 1.711 International Journal of Modern Trends in Engineering and Research www.ijmter.com @IJMTER-2014, All rights Reserved 101 e-ISSN: 2349-9745 p-ISSN: 2393-8161 Detecting Unknown Attacks Using Big Data Analysis Bhagyashree S Jawariya1 1 Computer Department, SRES College of Engineering, Kopargaon, bhagyashreejawariya12@gmail.com Abstract— Nowadays threat of previously unknown cyber-attacks are increasing because existing security systems are not able to detect them. Previously, leaking personal information by attacking the PC or destroying the system was very common cyber attacks . But the goal of recent hacking attacks has changed from leaking information and destruction of services to attacking large-scale systems such as critical infrastructures and state agencies. In the other words, existing defence technologies to counter these attacks are based on pattern matching methods which are very limited. Because of this fact, in the event of new and previously unknown attacks, detection rate becomes very low and false negative increases. To defend against these unknown attacks, which cannot be detected with existing technology, a new model based on big data analysis techniques that can extract information from a variety of sources to detect future attacks is proposed . The expectation with this model is future Advanced Persistent Threat (APT) detection and prevention . Keywords- Alarm systems, Computer crime , Intrusion detection, Pattern Matching , Data mining I. INTRODUCTION Hacking in the past leaked personal information or were done for just fame, but recent hacking targets companies, government agencies. This kind of attack is commonly called APT(Advanced Persistent Threat). APT targets a specific system and analyses vulnerabilities of the system for a long time. Therefore it is hard to prevent and detect APT than traditional attacks and could result massive damage. Up to today, detection and protection systems for defending against cyber-attacks were frewalls, intrusion detection systems, intrusion prevention systems, anti-viruses solutions, database encryption, DRM solutions and etc. Moreover, integrated monitoring technologies for managing system logs were used. These security solutions are developed based on signatures and blacklist. However, according to various reports, intrusion detection systems and intrusion prevention systems are not capable of protecting systems against APT attacks because there are no signatures. Therefore to overcome this issue, security communities are beginning to apply heuristic and data mining technologies to detect previously unknown attacks. In this , a new model based on bigdata analysis technology to prevent and detect previously unknown APT attacks is proposed . APT Attacks . APT attack penetrate into the target system and persistently collect valuable information by using social engineering, zero day vulnerabilities and other techniques . It can damage national agencies or enterprises[5]. They are also used as a cyber weapons . Instead of Targeting ordinary desktops or servers they target industrial control systems . APT attack is usually done in four steps: Intrusion, Searching, Collection and Attack.
International Journal of Modern Trends in Engineering and Research (IJMTER) Volume 01, Issue 06, [December - 2014] e-ISSN: 2349-9745, p-ISSN: 2393-8161 @IJMTER-2014, All rights Reserved 102 Figure 1. The Sequence of APT Attacks Intrusion Step . In the intrusion step of an APT attack, the hacker probes for information about the target system and prepares the attack. Searching . To get the access to the system, the attacker searches for users with high access privileges such as administrators and use various attack techniques such as SQL injection, phishing, farming and social engineering to hijack their accounts Searching is done after the hacker gained access to the system. Hacker analyses system data such as system log for valuable information and look for security vulnerabilities than can be exploited for further malicious behaviours. Collection . In this next step, after the hacker has located valuable information in the system such as confidential documents etc, then, he installs malwares such as rootkits, backdoors to collect system data and maintain system access for the future. Attack . In this final step, the hacker leaks data and destroys target system using acquired privileges. Leaked information can be used for developing other additional security vulnerability exploits. Because APT exploits use zero-day vulnerabilities and obfuscation methods, Anti-Virus program, IDS and IPS are difficult to detect such exploits Examples of recent APT attacks are Stuxnet, RSA Secure ID hacking and the Night Dragon. Stuxnet was a very intelligent malware that was developed to attack Iran’s nuclear facilities and make them malfunction. II. SYSTEM OVERVIEW The system mainly focuses on following areas- Data Collection and Creation of Network Analysis of Data Detection of Unknown Attacks and generate an alert
International Journal of Modern Trends in Engineering and Research (IJMTER) Volume 01, Issue 06, [December - 2014] e-ISSN: 2349-9745, p-ISSN: 2393-8161 @IJMTER-2014, All rights Reserved 103 Figure 2. System Overview Figure 1 shows the Big Data Analysis System Model for Detecting Unknown Attacks . As illustrated in the design, from various sources the data is being collected . The extracted data is taken as input and is provided to the system for pre-processing. After preprocessing the data it is analysed . The Analysis is done on the basis of Behaviour Matching . Genetic Algorithm is used for behavior matching . If any unknown behavior is found then an alert will be generated by the system . Snort is been used for Detection . III. DATA ANALYSIS AND DETECTION OF UNKNOWN ATTACKS 3.1. Data Collection and Creation of Network Data collection step collects event data . The Event data is collected from firewalls and log, Servers ,application , behaviour, status information (date, time, inbound/outbound packet, daemon log, user behaviour, process information etc.) from anti-virus, database, network device and system. Data appliance is used to store the collected data . The Network is been created by client server application . Through this the data will be send through . 3.2 Analysis of Data The Clone detection is defined as a mechanism for a WSN to detect the existence of inappropriate, incorrect, or anomalous moving attackers. In this the path is checked whether the path is authorized or unauthorized. If path is authorized the packet is send to valid destination. Otherwise the packet will be deleted. Constructing Inter-Domain Packet Filters . If the packet is received from other than the port no it will be filtered and discarded. This filter only removes the unauthorized packets and authorized packets send to destination. Behaviour Matching using Genetic Algorithm . Here Genetic Algorithm is used for Behaviour Matching . The Behaviour of the received packet is matched with the already known behaviours . If the behaviour is not Matched then it is Considered as Unknown.
International Journal of Modern Trends in Engineering and Research (IJMTER) Volume 01, Issue 06, [December - 2014] e-ISSN: 2349-9745, p-ISSN: 2393-8161 @IJMTER-2014, All rights Reserved 104 Figure 3. Steps in Genetic Algorithm Initially all the known attacks set is created . If any attack comes , first it is checked whether it is known or unkown i.e it is checked whether it matches with the known attacks set or not . If a match is found with the known attack set then it will get prevented as solutions are already present for them . But if matching does not found then an alert will be generated by the Detection Engine and reported to Administrator . The data sent will then get Discarded . Association Analysis . Association rule learning is a method for discovering interesting correlations between variables in large databases . Association rule learning is being used to help in monitoring system logs to detect intruders and malicious activity Update Database . The Database is updated after Detection of unknown attack . 3.3 Detection of Unknown attacks and generating an Alert Generation of an Alert message A alert message is generated if any unknown attack is found 1. Alert is indication for detection of attack. 2. Alert is generated, when known or unknown attack found. 3. Attack message display on system if attack found. Snort is Used for Detection Components of Snort 1. Packet Decoder 2. Preprocessors 3. Detection Engine
International Journal of Modern Trends in Engineering and Research (IJMTER) Volume 01, Issue 06, [December - 2014] e-ISSN: 2349-9745, p-ISSN: 2393-8161 @IJMTER-2014, All rights Reserved 105 4. Logging and Alerting System 5. Output Modules Figure 4. Components of Snort Packet Decoder The packet decoder takes packets from different types of network interfaces and prepares the packets to be preprocessed or to be sent to the detection engine. The interfaces may be Ethernet, SLIP, PPP and so on. Preprocessors Preprocessors are components or plug-ins that can be used with Snort to arrange or modify data packets before the detection engine does some operation to find out if the packet is being used by an intruder. Some preprocessors also perform detection by finding anomalies in packet headers and generating alerts. Preprocessors are very important for any IDS to prepare data packets to be analyzed against rules in the detecttion engine. Hackers use different techniques to fool an IDS in different ways. For example, we may have created a rule to find a signature scripts/iisadmin in HTTP packets. If we are matching this string exactly, we can easily be fooled by a hacker who makes slight modifications to this string . The preprocessors are used to safeguard against the attacks. Preprocessors in Snort can defragment packets, decode HTTP URI, re-assemble TCP streams and so on. These functions are a very important part of the intrusion detection system The Detection Engine The detection engine is the most important part of Snort. Its responsibility is to detect if any intrusion activity exists in a packet. The detection engine employs Snort rules for this purpose. The rules are read into internal data structures or chains where they are matched against all packets. If a packet matches any rule, appropriate action is taken; otherwise the packet is dropped. Appropriate actions may be logging the packet or generating alerts. The detection engine is the time-critical part of Snort. Depending upon how powerful your machine is and how many rules we have defined, it may take different amounts of time to respond to different packets. If traffic on our network is too high when Snort is working in NIDS mode, we may drop
International Journal of Modern Trends in Engineering and Research (IJMTER) Volume 01, Issue 06, [December - 2014] e-ISSN: 2349-9745, p-ISSN: 2393-8161 @IJMTER-2014, All rights Reserved 106 some packets and may not get a true real-time response. The load on the detection engine depends upon the following factors: 1. Number of rules 2. Power of the machine on which Snort is running 3. Speed of internal bus used in the Snort machine 4. Load on the network Logging and Alerting System Depending upon what the detection engine finds inside a packet, the packet may be used to log the activity or generate an alert. Logs are kept in simple text files, tcp- dump style files or some other form. All of the log files are stored under /var/log/ snort folder by default. We can use l command line options to modify the location of generating logs and alerts. Many command line options can modify the type and detail of information that is logged by the logging and alerting system. Output Modules Output modules or plug-ins can do different operations depending on how we want to save output generated by the logging and alerting system of Snort. Basically these modules control the type of output generated by the logging and alerting system III. CONCLUSION In this paper a Big Data System Model for reacting to previously unknown cyber threats is proposed. Recent unknown attacks easily bypass existing security solutions by using encryption and obfuscation. Therefore there is a need to develop a new detection methods for reacting to such attacks . To defend against these unknown attacks, which cannot be detected with existing technology the model is proposed . This gives a model for reacting to previously unknown cyber threats. REFERENCES [1] Tai-Myoung Chung Sung-Hwan Ahn, Nam-Uk Kim. "`Big data analysis system concept for detecting unknown attacks"'. Technical report, February 2014 [2] Tianyi Xing Jeongkeun Lee Chun-Jen Chung, Pankaj Khatkar and Dijiang Huang. "`Nice: Network intrusion detection and countermeasure selection in virtual network systems”,Technical report, IEEE Transactions on Dependable and Secure Computing , Vol 10 , No 4 August 2013 [3] Liping Zhang2 Dajiang Lei1 and Lisheng Zhang. "`Cloud model based outlier detection algorithm for categorical data"'. Technical report, International Journal of Database Theory and Applications ,Vol 6,No 4, August 2013 [4] Christopher J.C. Burges "`A Tutorial on Support Vector Machines for Pattern Recognition Kluwer Academic Publishers, Boston. Manufactured in The Netherlands [5] Command Five Pty Ltd. "`Advanced persistent threats: A decade in review"'. Technical report , June 2011 [6] Dr . Kiran Jyoti , Bhawna Gupta. "`Big data analytics with hadoop to analyze targeted attacks on enterprise data"'. Technical report, International Journal of Computer Science and Information Technologies, IJCSIT, Vol 5(3) 2014. [7] R . Magoulas and B. Lorica, Introduction to Big Data, Release 2.0 (Sebastopol OReilly Media , February 2009
Detecting Unknown Attacks Using Big Data Analysis
Detecting Unknown Attacks Using Big Data Analysis

Detecting Unknown Attacks Using Big Data Analysis

  • 1.
    Scientific Journal ImpactFactor (SJIF): 1.711 International Journal of Modern Trends in Engineering and Research www.ijmter.com @IJMTER-2014, All rights Reserved 101 e-ISSN: 2349-9745 p-ISSN: 2393-8161 Detecting Unknown Attacks Using Big Data Analysis Bhagyashree S Jawariya1 1 Computer Department, SRES College of Engineering, Kopargaon, bhagyashreejawariya12@gmail.com Abstract— Nowadays threat of previously unknown cyber-attacks are increasing because existing security systems are not able to detect them. Previously, leaking personal information by attacking the PC or destroying the system was very common cyber attacks . But the goal of recent hacking attacks has changed from leaking information and destruction of services to attacking large-scale systems such as critical infrastructures and state agencies. In the other words, existing defence technologies to counter these attacks are based on pattern matching methods which are very limited. Because of this fact, in the event of new and previously unknown attacks, detection rate becomes very low and false negative increases. To defend against these unknown attacks, which cannot be detected with existing technology, a new model based on big data analysis techniques that can extract information from a variety of sources to detect future attacks is proposed . The expectation with this model is future Advanced Persistent Threat (APT) detection and prevention . Keywords- Alarm systems, Computer crime , Intrusion detection, Pattern Matching , Data mining I. INTRODUCTION Hacking in the past leaked personal information or were done for just fame, but recent hacking targets companies, government agencies. This kind of attack is commonly called APT(Advanced Persistent Threat). APT targets a specific system and analyses vulnerabilities of the system for a long time. Therefore it is hard to prevent and detect APT than traditional attacks and could result massive damage. Up to today, detection and protection systems for defending against cyber-attacks were frewalls, intrusion detection systems, intrusion prevention systems, anti-viruses solutions, database encryption, DRM solutions and etc. Moreover, integrated monitoring technologies for managing system logs were used. These security solutions are developed based on signatures and blacklist. However, according to various reports, intrusion detection systems and intrusion prevention systems are not capable of protecting systems against APT attacks because there are no signatures. Therefore to overcome this issue, security communities are beginning to apply heuristic and data mining technologies to detect previously unknown attacks. In this , a new model based on bigdata analysis technology to prevent and detect previously unknown APT attacks is proposed . APT Attacks . APT attack penetrate into the target system and persistently collect valuable information by using social engineering, zero day vulnerabilities and other techniques . It can damage national agencies or enterprises[5]. They are also used as a cyber weapons . Instead of Targeting ordinary desktops or servers they target industrial control systems . APT attack is usually done in four steps: Intrusion, Searching, Collection and Attack.
  • 2.
    International Journal ofModern Trends in Engineering and Research (IJMTER) Volume 01, Issue 06, [December - 2014] e-ISSN: 2349-9745, p-ISSN: 2393-8161 @IJMTER-2014, All rights Reserved 102 Figure 1. The Sequence of APT Attacks Intrusion Step . In the intrusion step of an APT attack, the hacker probes for information about the target system and prepares the attack. Searching . To get the access to the system, the attacker searches for users with high access privileges such as administrators and use various attack techniques such as SQL injection, phishing, farming and social engineering to hijack their accounts Searching is done after the hacker gained access to the system. Hacker analyses system data such as system log for valuable information and look for security vulnerabilities than can be exploited for further malicious behaviours. Collection . In this next step, after the hacker has located valuable information in the system such as confidential documents etc, then, he installs malwares such as rootkits, backdoors to collect system data and maintain system access for the future. Attack . In this final step, the hacker leaks data and destroys target system using acquired privileges. Leaked information can be used for developing other additional security vulnerability exploits. Because APT exploits use zero-day vulnerabilities and obfuscation methods, Anti-Virus program, IDS and IPS are difficult to detect such exploits Examples of recent APT attacks are Stuxnet, RSA Secure ID hacking and the Night Dragon. Stuxnet was a very intelligent malware that was developed to attack Iran’s nuclear facilities and make them malfunction. II. SYSTEM OVERVIEW The system mainly focuses on following areas- Data Collection and Creation of Network Analysis of Data Detection of Unknown Attacks and generate an alert
  • 3.
    International Journal ofModern Trends in Engineering and Research (IJMTER) Volume 01, Issue 06, [December - 2014] e-ISSN: 2349-9745, p-ISSN: 2393-8161 @IJMTER-2014, All rights Reserved 103 Figure 2. System Overview Figure 1 shows the Big Data Analysis System Model for Detecting Unknown Attacks . As illustrated in the design, from various sources the data is being collected . The extracted data is taken as input and is provided to the system for pre-processing. After preprocessing the data it is analysed . The Analysis is done on the basis of Behaviour Matching . Genetic Algorithm is used for behavior matching . If any unknown behavior is found then an alert will be generated by the system . Snort is been used for Detection . III. DATA ANALYSIS AND DETECTION OF UNKNOWN ATTACKS 3.1. Data Collection and Creation of Network Data collection step collects event data . The Event data is collected from firewalls and log, Servers ,application , behaviour, status information (date, time, inbound/outbound packet, daemon log, user behaviour, process information etc.) from anti-virus, database, network device and system. Data appliance is used to store the collected data . The Network is been created by client server application . Through this the data will be send through . 3.2 Analysis of Data The Clone detection is defined as a mechanism for a WSN to detect the existence of inappropriate, incorrect, or anomalous moving attackers. In this the path is checked whether the path is authorized or unauthorized. If path is authorized the packet is send to valid destination. Otherwise the packet will be deleted. Constructing Inter-Domain Packet Filters . If the packet is received from other than the port no it will be filtered and discarded. This filter only removes the unauthorized packets and authorized packets send to destination. Behaviour Matching using Genetic Algorithm . Here Genetic Algorithm is used for Behaviour Matching . The Behaviour of the received packet is matched with the already known behaviours . If the behaviour is not Matched then it is Considered as Unknown.
  • 4.
    International Journal ofModern Trends in Engineering and Research (IJMTER) Volume 01, Issue 06, [December - 2014] e-ISSN: 2349-9745, p-ISSN: 2393-8161 @IJMTER-2014, All rights Reserved 104 Figure 3. Steps in Genetic Algorithm Initially all the known attacks set is created . If any attack comes , first it is checked whether it is known or unkown i.e it is checked whether it matches with the known attacks set or not . If a match is found with the known attack set then it will get prevented as solutions are already present for them . But if matching does not found then an alert will be generated by the Detection Engine and reported to Administrator . The data sent will then get Discarded . Association Analysis . Association rule learning is a method for discovering interesting correlations between variables in large databases . Association rule learning is being used to help in monitoring system logs to detect intruders and malicious activity Update Database . The Database is updated after Detection of unknown attack . 3.3 Detection of Unknown attacks and generating an Alert Generation of an Alert message A alert message is generated if any unknown attack is found 1. Alert is indication for detection of attack. 2. Alert is generated, when known or unknown attack found. 3. Attack message display on system if attack found. Snort is Used for Detection Components of Snort 1. Packet Decoder 2. Preprocessors 3. Detection Engine
  • 5.
    International Journal ofModern Trends in Engineering and Research (IJMTER) Volume 01, Issue 06, [December - 2014] e-ISSN: 2349-9745, p-ISSN: 2393-8161 @IJMTER-2014, All rights Reserved 105 4. Logging and Alerting System 5. Output Modules Figure 4. Components of Snort Packet Decoder The packet decoder takes packets from different types of network interfaces and prepares the packets to be preprocessed or to be sent to the detection engine. The interfaces may be Ethernet, SLIP, PPP and so on. Preprocessors Preprocessors are components or plug-ins that can be used with Snort to arrange or modify data packets before the detection engine does some operation to find out if the packet is being used by an intruder. Some preprocessors also perform detection by finding anomalies in packet headers and generating alerts. Preprocessors are very important for any IDS to prepare data packets to be analyzed against rules in the detecttion engine. Hackers use different techniques to fool an IDS in different ways. For example, we may have created a rule to find a signature scripts/iisadmin in HTTP packets. If we are matching this string exactly, we can easily be fooled by a hacker who makes slight modifications to this string . The preprocessors are used to safeguard against the attacks. Preprocessors in Snort can defragment packets, decode HTTP URI, re-assemble TCP streams and so on. These functions are a very important part of the intrusion detection system The Detection Engine The detection engine is the most important part of Snort. Its responsibility is to detect if any intrusion activity exists in a packet. The detection engine employs Snort rules for this purpose. The rules are read into internal data structures or chains where they are matched against all packets. If a packet matches any rule, appropriate action is taken; otherwise the packet is dropped. Appropriate actions may be logging the packet or generating alerts. The detection engine is the time-critical part of Snort. Depending upon how powerful your machine is and how many rules we have defined, it may take different amounts of time to respond to different packets. If traffic on our network is too high when Snort is working in NIDS mode, we may drop
  • 6.
    International Journal ofModern Trends in Engineering and Research (IJMTER) Volume 01, Issue 06, [December - 2014] e-ISSN: 2349-9745, p-ISSN: 2393-8161 @IJMTER-2014, All rights Reserved 106 some packets and may not get a true real-time response. The load on the detection engine depends upon the following factors: 1. Number of rules 2. Power of the machine on which Snort is running 3. Speed of internal bus used in the Snort machine 4. Load on the network Logging and Alerting System Depending upon what the detection engine finds inside a packet, the packet may be used to log the activity or generate an alert. Logs are kept in simple text files, tcp- dump style files or some other form. All of the log files are stored under /var/log/ snort folder by default. We can use l command line options to modify the location of generating logs and alerts. Many command line options can modify the type and detail of information that is logged by the logging and alerting system. Output Modules Output modules or plug-ins can do different operations depending on how we want to save output generated by the logging and alerting system of Snort. Basically these modules control the type of output generated by the logging and alerting system III. CONCLUSION In this paper a Big Data System Model for reacting to previously unknown cyber threats is proposed. Recent unknown attacks easily bypass existing security solutions by using encryption and obfuscation. Therefore there is a need to develop a new detection methods for reacting to such attacks . To defend against these unknown attacks, which cannot be detected with existing technology the model is proposed . This gives a model for reacting to previously unknown cyber threats. REFERENCES [1] Tai-Myoung Chung Sung-Hwan Ahn, Nam-Uk Kim. "`Big data analysis system concept for detecting unknown attacks"'. Technical report, February 2014 [2] Tianyi Xing Jeongkeun Lee Chun-Jen Chung, Pankaj Khatkar and Dijiang Huang. "`Nice: Network intrusion detection and countermeasure selection in virtual network systems”,Technical report, IEEE Transactions on Dependable and Secure Computing , Vol 10 , No 4 August 2013 [3] Liping Zhang2 Dajiang Lei1 and Lisheng Zhang. "`Cloud model based outlier detection algorithm for categorical data"'. Technical report, International Journal of Database Theory and Applications ,Vol 6,No 4, August 2013 [4] Christopher J.C. Burges "`A Tutorial on Support Vector Machines for Pattern Recognition Kluwer Academic Publishers, Boston. Manufactured in The Netherlands [5] Command Five Pty Ltd. "`Advanced persistent threats: A decade in review"'. Technical report , June 2011 [6] Dr . Kiran Jyoti , Bhawna Gupta. "`Big data analytics with hadoop to analyze targeted attacks on enterprise data"'. Technical report, International Journal of Computer Science and Information Technologies, IJCSIT, Vol 5(3) 2014. [7] R . Magoulas and B. Lorica, Introduction to Big Data, Release 2.0 (Sebastopol OReilly Media , February 2009