Parallel Evolutionary Algorithms for Feature Selection in High Dimensional Datasets Safa Adi #1 , Mohammed Aldasht ∗2 # Computer and IT Department, Palestine Polytechnic University Palestine 1 safa_adi@ppu.edu ∗ Computer Engineering Department, Palestine Polytechnic University Palestine 2 mohammed@ppu.edu Abstract—Feature selection in high-dimensional datasets is considered to be a complex and time-consuming problem. To enhance the accuracy of classification and reduce the execution time, Parallel Evolutionary Algorithms (PEAs) can be used. In this paper, we make a review for the most recent works which handle the use of PEAs for feature selection in large datasets. We have classified the algorithms in these papers into four main classes (Genetic Algorithms (GA), Particle Swarm Optimization (PSO), Scattered Search (SS), and Ant Colony Optimization (ACO)). The accuracy is adopted as a measure to compare the efficiency of these PEAs. It is noticeable that the Parallel Genetic Algorithms (PGAs) are the most suitable algorithms for feature selection in large datasets; since they achieve the highest accuracy. On the other hand, we found that the Parallel ACO is time- consuming and less accurate comparing with other PEA. Index Terms: Evolutionary algorithms, parallel com- puting, classification, feature selection, high dimensional dataset. I. INTRODUCTION Nowadays many disciplines have to deal with high dimensional datasets which involve a huge number of features. So we need data preprocessing methods and data reduction models in order to simplify input data. There are two main types of data reduction models [1]. The first is: instance selection and instance generation processes are focused on the instance level. (i.e. select a representative portion of data that can fulfill a data mining task as if the whole data is used) [14]. The second is: feature selection and feature extraction models which work at the level of characteristics. These models attempt to reduce a dataset by removing noisy, irrelevant, or redundant features. Feature selection is a necessary preprocessing step in analyzing big datasets. It often leads to smaller data that will make the classifier training better and faster [3]. Feature selection is a problem with big datasets. In order to make classification faster and more accurate, we need to select the subset of features that are discriminative. Evolutionary algorithms like Genetic algorithms, Swarm intelligence optimization, Ant colony optimization, etc. These methods can be effective for this problem, but they require a huge amount of computation (long execution time), also memory consumption. In order to overcome these weaknesses, parallel computing can be used. In this survey, we will review a set of papers about parallel evolutionary algorithms that used for feature selection in large datasets. Furthermore, we will compare the performance of different algorithms and environment. The rest of the paper is organized as follow: Section2 is Background about feature selection approaches and parallel architecture in general. Section3 talk about parallel evolutionary algorithms. Section 4 will discuss and review many papers which talk about the feature selection problem by using parallel computing. Section5 contains the summary of the survey, the last section is the conclusion and future work. II. BACKGROUND In general, there are three classes of feature selection: filter-based, wrapper, and embedded. The filter approach analyzes the features statistically and ignores the classifier [18]. Most of filter-based methods perform two operations, ranking and subset selection. In some cases, these two operations are performed sequentially, first the ranking, then the selection, in other cases only the selection is carried out. These methods are effective in terms of execution time. However, filter methods sometimes select redundant variables; since they don’t consider the relationships between variables. Therefore, they are mainly used as a pre-processing method. In the wrapper model [15], the process of feature selection is depending on the performance of a specific classifier. But its disadvantages are time-consuming and over fitting. The last International Journal of Computer Science and Information Security (IJCSIS), Vol. 16, No. 3, March 2018 181 https://sites.google.com/site/ijcsis/ ISSN 1947-5500
method for feature selection is the embedded. In this method, the feature selection process and the learning algorithm (tuning the parameters) are combined to each other[6, 15]. The selection of optimal feature subset is an optimization problem that proved to be NP-hard, complex, and time- consuming problem [13]. Two major approaches are traditionally used to tackle NP-hard problems, as seen in Figure1: exact methods and metaheuristics. Exact methods allow exact solution to be found, but this approach is impractical since it is extremely time consuming for real world problems. On the other hand, metaheuristics are used for solving complex and real world problems. Because metaheuristics provide suboptimal (sometimes optimal) solution in reasonable time [2, 11, 13]. As seen in Figure1, Metaheuristics are divided into two categories [13]: • Trajectory-based (exploitation-oriented methods): the well-known metaheuristics families based on the manip- ulation of a single solution. Include Simulated Annealing (SA), Tabu Search (TS), Iterated Local Search (ILS), Variable Local Search (VNS), and Greedy Randomized Adaptive Search Procedures (GRASP). • Population-based (exploration-oriented methods): the well-known metaheuristics families based on the manipulation of a population of solutions. Include PSO, ACO, SS, Evolutionary Algorithms (EAs), Differential Evolution (DE), Evolutionary Strategies (ES), and Estimation Distribution Algorithms (EDA). Fig. 1. Approaches for handling NP-hard problems Metaheuristics algorithms have proved to be suitable tools for solving the feature selection accurately and efficiently for large dimensions in big datasets [2]. The main problems when dealing with big datasets are: The first is execution time because the complexity of the metaheuristics methods for feature selection is at least O(n2 D), where n is the number of instances and D is the number of features. The second is memory consumption since most methods for feature selection need to store the whole dataset in memory. Therefore, the researchers try to parallelize the sequential metaheuristics to improve their efficiency for feature selection on large datasets. There are many programming models and paradigms, such as MapReduce (Hadoop, spark), MPI, OpenMP, CUDA [1, 6, 13]. Parallel computing can be process interaction (shared memory, message passing) or problem decomposition (task or data parallelization) [6]. Parallel computing is a good solution for these problems since many calculations are carried out simultaneously in the task and/or data [6]. Population-based metaheuristics are naturally prone to parallelize since most of their variation operators can be easily undertaken in parallel [2, 13]. Parallel implementations of metaheuristics are an effective alternative to speed up sequential metaheuristics; by reducing the search time for solutions of optimization problems. Furthermore, they lead to the more precise random algorithm and improve the quality of solutions [11]. As seen in Figure2, the implementation of parallel metaheuristics is divided into two categories [13]. Fig. 2. Parallel implementation of metaheuristics Parallel evolutionary algorithms are used in many works rather than feature selection, such as inferring phylogenies, traffic prediction. In [9] Santander et al., used MPI/OpenMP with a hybrid multiobjective evolutionary algorithm (fast non- dominated sorting genetic algorithms and firefly algorithm); for phylogenetic reconstruction (Inferring evolutionary trees). In [10] Jiri at al., used parallel multiobjective GA with OpenMP. In order to make traffic prediction more accurate. Master-Slave scheme of GA was implemented on multi-core parallel architecture. They reduced the computational time, but it was successful for short-term traffic prediction. III. OVERVIEW OF PARALLEL EVOLUTIONARY ALGORITHMS FOR FEATURE SELECTION Feature selection algorithms are used to find an optimal subset of relevant features in the data. In this section we will talk about parallel evolutionary algorithms that are used for feature selection problem in large datasets. We will illustrate the steps of six algorithms (PGA, PCHC, PPSO, PGPSO, PSS, and PACO). International Journal of Computer Science and Information Security (IJCSIS), Vol. 16, No. 3, March 2018 182 https://sites.google.com/site/ijcsis/ ISSN 1947-5500
A. Parallel Genetic algorithm (PGA) In order to increase the efficiency and reduce the execution time of the genetic algorithm (GA); the researchers used par- allel GA. Algorithm 1 presents the parallel GA methodology, with the master-slave model of parallel GA. Algorithm 1 Parallel genetic algorithm [10] Create initial population Evaluate initial population Create slaves while not done do Start slave Wait for slave to finish Run mutation operator end while for i=1 to slave iterations do Select individuals Run crossover operator Evaluate offsprings if solution found then set done=True end if end for B. Parallel CHC algorithm (PCHC) A CHC is a non-traditional GA, which combines a con- servative selection strategy (that always preserves the best individuals found so far), that produces offsprings that are at the maximum hamming distance from their parent. The main processes of CHC algorithm are [1]: • Half-Uniform Crossover (HUX): This will produce two offsprings, which are maximally different from their two parents. • Elitist selection: this will keep the best solutions in each generation. • Incest prevention: this step prevents two individuals to mate if the similarity between them greater than a thresh- old. • The Restarting process: if the specified population stagnated, then this step generated a new population by choosing the best individuals. C. Particle Swarm Optimization (PSO) This subsection handles the geometric particle swarm optimization (GPSO) and shows the algorithm that used to parallelize PSO or GPSO. 1) Geometric Particle Swarm Optimization (GPSO): GPSO is a recent version of PSO. The key issue in GPSO is the using a multi-parental recombination of solutions (particles). In the first phase, a random initialization of particles created. Then the algorithm evaluates these particles to update the historical and social positions. Finally, the three parents (3PMBCX) move the particles, as shown in Algorithm 2: Algorithm 2 GPSO algorithm [2] S:SwarmInitialization() while not stop condition do do for each particle i of the swarm S do do evaluate(solution(xi)) update(velocity equation (hi)) update(global best solution (gi)) end for for each particle i of the swarm S do do xi:3PMBCX ((xi, wa), (gi, wb), (hi, wc)) mutate(xi) end for end while Output: best solution found 2) Parallel Multi Swarm Optimization (PMSO): Parallel multi swarm optimization presented in [2], it was defined in analogy with parallel GA as a pair of (S, M), where S is a collection swarm, and M is a migration policy. Algorithm 3 depicts the parallel PSO methodology. Algorithm 3 Multi swarm optimization [2] DO IN PARALLEL for each i 1,...,m initialize(Si) while not stop condition do do iterate Si for n steps /* PSO evolution */ for each Sj (Si) do do send particles in s(Si) to Sj end for for each Sj such that Si (Sj ) do do receive particles from Sj replace particles in Si according to r end for end while Output: best solution ever found in the multi-swarm D. Parallel Scatter Search (PSS) Scatter search is an evolutionary method that was success- fully applied to hard optimization problems. It uses strategies for search diversification and intensification that have proved effective in a variety of optimization problems, see Algorithm 4. E. Parallel Ant Colony Optimization (PACO) When dealing with huge search space, parallel computing techniques usually applied to improve the efficiency. Parallel ACO algorithms can achieve high-quality solutions in reasonable execution times comparing with sequential ACO [18]. In Algorithm 5, the methodology of PACO is presented. International Journal of Computer Science and Information Security (IJCSIS), Vol. 16, No. 3, March 2018 183 https://sites.google.com/site/ijcsis/ ISSN 1947-5500
Algorithm 4 Parallel scatter search methodology [11] Create Population (Pop, PopSize) Generate ReferenceSet (RefSet, RefSetSize) while Stopping Criterion1 do while Stopping Criterion2 do Select Subset (Subset, SubsetSize) for each processor r=1 to n do in parallel do Combine Solutions (SubSet, CurSol) Improve Solution (CurSol, ImpSol) end for end while Update ReferenceSet (RefSet) end while Algorithm 5 Parallel ant colony optimization methodology [18] Generate Ants Initialize N processors Multicast to all slaves processors N and the task ids of all slaves for each slave do do Send a number between 0 and N that identifies the task inside the program end for while not all slaves have sent back solution do Wait for solution if a slave returns a solution that is better than any solution received then Multicast this solution to all slaves end if end while Return the best solution IV. PARALLEL EVOLUTIONARY ALGORITHMS FOR FEATURE SELECTION We reviewed a set of research papers, which were dealing with feature selection problem for high dimensional datasets in a parallel environment and using parallel evolutionary algorithms. Let us discuss these studies in the following subsections. A. Parallel GA Liu et al., [5] used parallel GA for selecting informative genes (features) in tissue classification, using wrapper approach. The main purpose was to find the subset of features with fewer elements and higher accuracy. The parallelization of GA performed by dividing the population into sub- populations, and then the GA run on each sub-population. Therefore, the searching for the optimal subset of genes can be on several CPUs/computers at the same time. For evaluation, the Golub classifier was used. This classifier introduced by the authors and it depend on the sign of the results for classification; if the sign is positive the sample x belongs to class 1, else if it negative the sample x belongs to class 2. This classifier used only if the datasets have two classes. The accuracy of the classifier tested by using the LOOCV (leave one out cross validation) method. The results showed that using the parallel GA increased the accuracy, and reduced the number of genes that used for classification. In [8] Zheng et al., analyzed the execution speed and solution quality of many parallel GA schemes theoretically. Furthermore, they pointed to the best scheme of parallel GA that used on multi-core architecture. This paper considered the relationship between speed and parallel architecture along with solution quality. They analyzed (Master-Slave, Synchronous Island, Asynchronous Island, Cellular, and hybrid scheme of Master- Slave and Island) schemes of parallel GA, with Pthread library on multi-core parallel architecture. To validate their theoretical analyzing an experiments performed. The hybrid scheme of (Master-Slave and Asynchronous Island) was the best scheme in performance using multi-core architecture. The Island scheme has the best execution time, but the worst solution quality. To improve the solution quality when using Island model it is better to decrease the number of islands. The Asynchronous Island is faster than the Synchronous. The Master-Slave scheme has the best solution quality and the worst execution time. Soufan at el., [15] developed a web-based tool called DWFS, which used for feature selection for different problems. This tool followed a hybrid approach of wrapper and filter. First, the filter used as preprocessing and select the top ranked features based on tunable and a predefined threshold. In the next step, parallel GA based on wrapper approach applied to the selected features to search for subset features that increase the classifier accuracy. The scheme of parallel GA was Master-Slave; the master node used to create initial population and GA steps. While the slave (worker) nodes used for fitness evaluation of each chromosome, this implementation is performed on 64 core. For evaluation, they used three different classifiers (Bayesian classifier, K-nearest neighbor, and a combination of them). The results of the experiments show that DWFS tool provided many options to enhance the feature selection problem in different biological and biomedical problems. In [7] Pinho et al., presented a framework called ParJEColi (java-based library) for a parallel evolutionary algorithm in bioinformatics applications. The aim of this platform was to make the parallel environment (multi-core, cluster, and grid) easy and transparent to the users. This library adapted itself to the problem and the target parallel architecture. The user can easily configure the parallel model and the target architecture; since, ParJEColi encapsulated the parallelization International Journal of Computer Science and Information Security (IJCSIS), Vol. 16, No. 3, March 2018 184 https://sites.google.com/site/ijcsis/ ISSN 1947-5500
concerns as features. The explicit steps implemented by a simple GUI. The experiments for validation this framework were done on 2 biological dataset and many bioinformatics scenarios. The results indicate that the proposed framework improves the computational performance (decreases execution time) also the solution quality. B. Parallel CHC In [1] Peralta et al., presented a parallel evolutionary algorithm called CHC algorithm by using the MapReduce paradigm for selecting features in high dimensional datasets to improve the classification. The parallelization of CHC algorithm is done by using MapReduce procedure (Hadoop implementation). A cluster of computers of 20 computing nodes were used. Each dataset split into 512-map task. For evaluating their work, three classifiers where used SVM (support vector machine), logistic regression, and Bayesian classifier. The results showed that the run time for classification increased as the number of features decreased, except for Bayesian classifier. They explained this result as follow: if the number of blocks less than the number of computing machines; this leads to have some machines remain idle. In addition, if the number of blocks greater than the number of computing machines, the blocks maybe will not distributed in efficient way. They compared parallel CHC with the serial version, and they concluded that the accuracy of classification increased by using parallel CHC. Furthermore, the parallel version of CHC reduced the run time when the datasets is high dimensional. C. Parallel PSO PSO is an efficient optimization technique, it used to solve the problem of feature selection in high dimensional datasets. In [4] Chen et al., used the parallel PSO algorithm for solving two problems at the same time. By creating an objective function that takes into account three variables at the same time (the selected features, the number of support vectors, and average accuracy of SVM). In order to maximize the capability of SVM classifier in generalization. The proposed method called PTVPSO-SVM (parallel time variant particle swarm optimization support vector machine), it had two phase: 1) the parameter settings of SVM and feature selection work together. 2) the accuracy of SVM evaluated using the set of features and the optimal parameters from the first phase. They used parallel virtual machine (PVM) with 8 machines; and 10-fold cross validation. The results showed that they could achieve the following aims: increasing the accuracy classification, reducing the execution time comparing with sequential PSO, producing an appropriate model of parameters, and selecting the most discriminative subset of features. Feature selection can be carried out based on rough set theory with searching algorithm as in [3, 6]. In [6] Qian et al., proposed three parallel attribute reduction (feature selection) algorithms based on MapReduce on Hadoop. The first algorithm was built by constructing the proper (key, value) by rough set theory and implementing MapReduce functions. The second algorithms were done by realizing the parallel computation of equivalence classes and attribute significances. The last parallel algorithm was designed to acquire the core attributes and a reduce in both data and parallel task. The experiments are performed on a cluster of computers (17 computing node). They considered the performance of the parallel algorithms, but they did not focus on the classification accuracy; since the sequential and parallel algorithms gave the same results. The results showed that the proposed parallel attribute reduction algorithms could deal with high dimensional datasets in an efficient way and better than the sequential algorithms. In [3] Adamczyk, use rough set theory for attribute reduction, to increase the efficiency he implemented parallel Asynchronous PSO for this problem. The parallelization was done by assigning the complex function computations in slave cores and the main core make the updating particle and checking the convergence of the algorithm. From their experiments it was noticeable that the efficiency and speedup of parallel PSO algorithm were raising as the size of dataset increased. The achievable accuracy was not astonishing, but it was better than the classical algorithms. D. Parallel GPSO In [2] Garcia-Nieto et al., parallelized a version of PSO called GPSO which is suitable for feature selection problem in high dimensional datasets. The proposed method was called PMOS (Parallel multi-swarm optimizer). Which was done by running a set of parallel sub PSOs algorithms, which forming an island model. Migration operation exchanged solutions between islands based on a certain frequency The aim of the fitness function increasing the classification accuracy and reduce the number of selected genes (features). They used the SVM classifier (Support Vector Machine) to prove the accuracy of the selected subset of features. In their experiments, they used a cluster of computers as a International Journal of Computer Science and Information Security (IJCSIS), Vol. 16, No. 3, March 2018 185 https://sites.google.com/site/ijcsis/ ISSN 1947-5500
parallel architecture. They found that 8-swarm PMSO was the best choice for parallelization. The results pointed out that this algorithm was better than the sequential version and other methods in term of performance and accuracy while it selected few genes for each subset. E. Parallel SS In [11] Lopez et al., present a parallel SS metaheuristics for solving feature selection problem in classification. They proposed two methods for combining solutions in SS. The first method is called GC (greedy combination): in this strategy, the common features of the combined solutions are added, then at each iteration one of the remaining features is added to any new solution. The second strategy is called RGC (reduced greedy combination), it has the same start as GC, but in the next step, it considers only the features that appear in solutions with good quality. Then the parallelization of SS is obtained by running these two methods (GC, RGC) at the same time on two processors. Using different combination methods and parameters settings at each processor. They compared the proposed parallel SS with sequential SS and GA. The results show that the quality of solution in parallel SS is better than solutions which was obtained from the sequential SS and GA. Also, the parallel SS use a smaller set of features for classification. The run time is the same for parallel and sequential SS. F. Parallel ACO This subsection shows how the parallel ACO is used to solve feature selection problem for classification in high dimensional datasets. In [17] Meena et al., implemented a parallel ACO to solve the feature selection problem for long documents. The parallelization was done using MapReduce programming model (Hadoop) that automatically parallelize the code and data then run them on a cluster of computing nodes. The wrapper approachis used as evaluation criteria that used Bayesian classifier. Furthermore, the accuracy of the classifier was based on these metrics: precision, recall, accuracy and F-measure. The enhanced algorithm (parallel ACO) was compared with ACO, enhanced ACO, and two feature selection methods, CHI (Statistical technique) and IG (Information Gain). They used Bayesian classifier in evaluation process. The results showed that for a given fixed quality of the solutions the proposed algorithm could reduce the execution time but without considered the solution quality. On the other hand, the accuracy of the classifier was increased using parallel TABLE I SUMMARY OF ALGORITHMS AND PROGRAMMING MODELS Paper Used evolutionary algorithm Parallel Programming model Peralta et al. [1] CHC (Type of GA) MapReduce Garcia-Nieto et al. [2] GPSO MALLBA Adamczyk [3] PSO Unknown Chen et al. [4] PSO PVM Liu et al. [5] GA Unknown Lopez et al. [11] SS Unknown Soufan et al. [15] GA MPI Meena et al. [17] ACO MapReduce ACO comparing with sequential ACO and feature selection methods. In [12] Cano et al., parallelized an existing multi-objective ant programming model that used as the classifier. This algorithm was used for rule mining in high dimensional datasets. The parallelization was done on data and each ant encoded a rule. This was achieved by let each processor perform the same task on a different subset of the data at the same time. In the implementation, they used GPUs, which are multi-core and parallel processor units architecture. This parallel model Followed CUDA method. For evaluation they used these metrics: true positive, false positive, true negative, false negative, sensitivity, and specificity. The results indicate that the efficiency of this model was increased as the size of datasets increased. V. SUMMARY AND DISCUSSION The summary of the papers that implemented the parallel EA for solving the classification problem in high dimensional datasets is reported in Table 1 and Table 2. Many research papers [2, 3, 7, 8, 9, 10, 12], stated that we can reduce the execution time and achieve acceptable speed ups, when applying parallel evolutionary algorithms on multiple processors. We noticed that they achieved a reasonable speed up in many cases. In the next table (Table 2), when comparing the accuracy of parallel EA it is important to notice how many classifiers were used to measure the accuracy. Furthermore, we should consider the metrics that were used to evaluate the classifier. For example, the parallel PSO and its variants have the higher accuracy; but they used only one metric which is the success rate. This means that the parallel PSO is not the most accurate parallel EA based on Table 2. On the other hand, the parallel GA and its variant has the least accuracy, but they used from two to five metrics for evaluation purpose. Based on these metrics, we can say that International Journal of Computer Science and Information Security (IJCSIS), Vol. 16, No. 3, March 2018 186 https://sites.google.com/site/ijcsis/ ISSN 1947-5500
TABLE II SUMMARY OF DATASETS, CLASSIFIERS, AND ACCURACY RESULTS Paper dataset Classifiers Metrics for classification Accuracy Peralta et al. [1] Epsilon Bayesian AUC= (TPR+TNR)/2 0.71 SVM 0.68 Logistic Regression 0.70 ECBDL 14-ROS Bayesian 0.67 SVM 0.63 Logistic Regression 0.63 Garcia- Nieto et al. [2] Colon SVM Success Rate 0.85 Lymp 0.97 Leuk 0.98 Lung 0.97 Adamczyk [3] 15 Data Set — Success rate 0.70 (Avg) Chen et al. [4] 30 Data Set SVM Success rate 0.87 (Avg) Liu et al. [5] Leukemia Golub Success rate 0.88 Colon N/A Lopez et al. [11] 12 Data Set Nearest Neighbor Success rate 0.86 (Avg) Bayesian 0.87 (Avg) Decision Tree 0.86 (Avg) Soufan et al. [15] 9 Data Set K- Nearest Neighbor F1, PPV, GMean,... 0.81(Avg) (GMean) Bayesian 0.79(Avg) (GMean) Meena et al. [17] 2 Data Sets Bayesian F-measure, recall,.... 0.64 (Avg) the parallel GA is the best parallel EA for feature selection in high dimensional datasets VI. CONCLUSION After the review of different parallel EA that are used to solve the feature selection problem in high dimensional datasets. We adopted the accuracy as a measure to compare the algorithms performance. The following points show our conclusion about the perfor- mance of the mentioned algorithms in this chapter for feature selection: • GA and its variants: based on the papers we reviewed, the parallel GA has the higher accuracy. • PSO and its variants: the parallel PSO has the same accuracy as sequential PSO. • SS: parallel SS gives better results in case of accuracy than GA and sequential SS. • ACO: parallel ACO has the less accurate results than the other parallel EA. It is noticeable that PGAs are the most suitable algorithms for feature selection in large datasets; since they achieved the highest accuracy. On the other hand, the PACO is time-consuming and less accurate comparing with other PEA. References [1] D. Peralta, S. RA.O, S. Rama.rez-Gallego, I.Triguero, J. Benitez, F. Herrera. ”Evolutionary Feature Selection for Big Data Classification: A MapReduce Approach”. Hindawi Publishing Corporation, Mathematical Problems in Engineering, Volume 11, Article ID 246139, (2015). [2] J. Garca-Nieto, E. Alba. ”Parallel multi-swarm optimizer for gene selection in DNA microarrays”. DOI 10.1007/s10489- 011-0325-9, Appl Intell (2012) 37:255266. [3] M. Adamczyk. ”Parallel Feature Selection Algorithm based on Rough Sets and Particle Swarm Optimization”. DOI:.10.15439/2014F389, ACSIS, VOL.2. IEEE, (2014). [4] H. Ling Chen, B. Yang, S. Wang, G. Wang, D. Liu, H. Zhong Li, W. Liu. ”Towards an optimal support vector machine classifier using a parallel particle swarm optimization startegy”. Applied Mathematics and Computation 239, 180- 197. Elsevier, (2014). [5] J. Liu, H. Iba. ”Selecting Informative Genes with Parallel Genetic Algorithms in Tissue Classification”. Genome Informatics 12: 14-23, (2001). [6] J. Qian, D. Miao, Z. Zhang, X. Yue. ”Parallel attribute reduction algorithms using MapReduce”. Elsevier, Information Sciences (2014). [7] J. Pinho, L. Sobral, M. Rocha. ”Parallel evolutionary computation in bioinformatics applications”. Elsevier. Computer methods and programs in biomedicine 110, 183191, (2013). [8] L. Zheng, Y. Lu, M. Ding, Y.Shen, M. Guo, S. Guo. ”Architecture-based performance evaluation of genetic algorithms on multi/many core systems”. The 14th IEEE International Conference on Computational Science and Engineering. 978-0-7695-4477-9/11, (2011) IEEE DOI 10.1109/CSE.2011.65 [9] S. Santander-Jimenez, M. Vega-Rodrguez. ”Parallel Multiobjective Metaheuristics for Inferring Phylogenies on Multicore Clusters”. DOI 10.1109/TPDS.2014.2325828, IEEE (2014). [10] J. Petrlik, L. Sekanina. ”Towards Robust and Accurate Traffic Prediction Using Parallel Multiobjective Genetic Algorithms and Support Vector Regression”. (2015) IEEE 18th International Conference on Intelligent Transportation Systems. International Journal of Computer Science and Information Security (IJCSIS), Vol. 16, No. 3, March 2018 187 https://sites.google.com/site/ijcsis/ ISSN 1947-5500
[11] F. Lopez, M. Torres, B. Batista, J. Perez, J. Vega. ”Solving feature selection problem by a parallel Scatter Search”. Elsevier, European Journal of Operational Research 169, 477-489, (2006). [12] A. Cano, J. Olmo, S. Ventura. ”Parallel Multi-Objective Ant Programming for Classification Using GPUs”. Journal of Parallel and Distributed Computing, November (2012). [13] E. Albaa, G. Luquea, S. Nesmachnowb. ”Parallel metaheuristics: recent advances and new trends” Intl. Trans. in Op. Res. 20 (2013) 148, DOI: 10.1111/j.1475- 3995.2012.00862.x [14] H. Liu, M. Hiroshi. ”Instance Selection and Construction for Data mining”. Springer ,Eds (2001). [15] O. Soufan, D. Kleftogiannis, P. Kalnis, V. Bajic. ”DWFS: A wrapper Feature Selection Tool Based on a Parallel Genetic Algorithm”. PLOS ONE, DOI: 10.1371/journal.pone.0117988, February (2015). [16] A. Gottlieb, G.S. Almasi. ”Highly Parallel Computing”. Benjamin-Cummings Publishing Co., Inc. Red word city, CA, USA (1989). [17] M. Meena, K.R. Chandran, A. Kathik, A. Samuel. ”A parallel ACO algorithm to select terms to categorise longer documents”. Int. J. Computational Science and Engineering, Vol 6 , No.4, (2001). [18] A. Sameh, A. Ayman, N. Hasan. ”Parallel Ant Colony Optimization”. International Journal of research and reviews in Computer Science (IJRRCS), Vol.1, No. 2, June (2010). International Journal of Computer Science and Information Security (IJCSIS), Vol. 16, No. 3, March 2018 188 https://sites.google.com/site/ijcsis/ ISSN 1947-5500

Parallel Evolutionary Algorithms for Feature Selection in High Dimensional Datasets

  • 1.
    Parallel Evolutionary Algorithmsfor Feature Selection in High Dimensional Datasets Safa Adi #1 , Mohammed Aldasht ∗2 # Computer and IT Department, Palestine Polytechnic University Palestine 1 safa_adi@ppu.edu ∗ Computer Engineering Department, Palestine Polytechnic University Palestine 2 mohammed@ppu.edu Abstract—Feature selection in high-dimensional datasets is considered to be a complex and time-consuming problem. To enhance the accuracy of classification and reduce the execution time, Parallel Evolutionary Algorithms (PEAs) can be used. In this paper, we make a review for the most recent works which handle the use of PEAs for feature selection in large datasets. We have classified the algorithms in these papers into four main classes (Genetic Algorithms (GA), Particle Swarm Optimization (PSO), Scattered Search (SS), and Ant Colony Optimization (ACO)). The accuracy is adopted as a measure to compare the efficiency of these PEAs. It is noticeable that the Parallel Genetic Algorithms (PGAs) are the most suitable algorithms for feature selection in large datasets; since they achieve the highest accuracy. On the other hand, we found that the Parallel ACO is time- consuming and less accurate comparing with other PEA. Index Terms: Evolutionary algorithms, parallel com- puting, classification, feature selection, high dimensional dataset. I. INTRODUCTION Nowadays many disciplines have to deal with high dimensional datasets which involve a huge number of features. So we need data preprocessing methods and data reduction models in order to simplify input data. There are two main types of data reduction models [1]. The first is: instance selection and instance generation processes are focused on the instance level. (i.e. select a representative portion of data that can fulfill a data mining task as if the whole data is used) [14]. The second is: feature selection and feature extraction models which work at the level of characteristics. These models attempt to reduce a dataset by removing noisy, irrelevant, or redundant features. Feature selection is a necessary preprocessing step in analyzing big datasets. It often leads to smaller data that will make the classifier training better and faster [3]. Feature selection is a problem with big datasets. In order to make classification faster and more accurate, we need to select the subset of features that are discriminative. Evolutionary algorithms like Genetic algorithms, Swarm intelligence optimization, Ant colony optimization, etc. These methods can be effective for this problem, but they require a huge amount of computation (long execution time), also memory consumption. In order to overcome these weaknesses, parallel computing can be used. In this survey, we will review a set of papers about parallel evolutionary algorithms that used for feature selection in large datasets. Furthermore, we will compare the performance of different algorithms and environment. The rest of the paper is organized as follow: Section2 is Background about feature selection approaches and parallel architecture in general. Section3 talk about parallel evolutionary algorithms. Section 4 will discuss and review many papers which talk about the feature selection problem by using parallel computing. Section5 contains the summary of the survey, the last section is the conclusion and future work. II. BACKGROUND In general, there are three classes of feature selection: filter-based, wrapper, and embedded. The filter approach analyzes the features statistically and ignores the classifier [18]. Most of filter-based methods perform two operations, ranking and subset selection. In some cases, these two operations are performed sequentially, first the ranking, then the selection, in other cases only the selection is carried out. These methods are effective in terms of execution time. However, filter methods sometimes select redundant variables; since they don’t consider the relationships between variables. Therefore, they are mainly used as a pre-processing method. In the wrapper model [15], the process of feature selection is depending on the performance of a specific classifier. But its disadvantages are time-consuming and over fitting. The last International Journal of Computer Science and Information Security (IJCSIS), Vol. 16, No. 3, March 2018 181 https://sites.google.com/site/ijcsis/ ISSN 1947-5500
  • 2.
    method for featureselection is the embedded. In this method, the feature selection process and the learning algorithm (tuning the parameters) are combined to each other[6, 15]. The selection of optimal feature subset is an optimization problem that proved to be NP-hard, complex, and time- consuming problem [13]. Two major approaches are traditionally used to tackle NP-hard problems, as seen in Figure1: exact methods and metaheuristics. Exact methods allow exact solution to be found, but this approach is impractical since it is extremely time consuming for real world problems. On the other hand, metaheuristics are used for solving complex and real world problems. Because metaheuristics provide suboptimal (sometimes optimal) solution in reasonable time [2, 11, 13]. As seen in Figure1, Metaheuristics are divided into two categories [13]: • Trajectory-based (exploitation-oriented methods): the well-known metaheuristics families based on the manip- ulation of a single solution. Include Simulated Annealing (SA), Tabu Search (TS), Iterated Local Search (ILS), Variable Local Search (VNS), and Greedy Randomized Adaptive Search Procedures (GRASP). • Population-based (exploration-oriented methods): the well-known metaheuristics families based on the manipulation of a population of solutions. Include PSO, ACO, SS, Evolutionary Algorithms (EAs), Differential Evolution (DE), Evolutionary Strategies (ES), and Estimation Distribution Algorithms (EDA). Fig. 1. Approaches for handling NP-hard problems Metaheuristics algorithms have proved to be suitable tools for solving the feature selection accurately and efficiently for large dimensions in big datasets [2]. The main problems when dealing with big datasets are: The first is execution time because the complexity of the metaheuristics methods for feature selection is at least O(n2 D), where n is the number of instances and D is the number of features. The second is memory consumption since most methods for feature selection need to store the whole dataset in memory. Therefore, the researchers try to parallelize the sequential metaheuristics to improve their efficiency for feature selection on large datasets. There are many programming models and paradigms, such as MapReduce (Hadoop, spark), MPI, OpenMP, CUDA [1, 6, 13]. Parallel computing can be process interaction (shared memory, message passing) or problem decomposition (task or data parallelization) [6]. Parallel computing is a good solution for these problems since many calculations are carried out simultaneously in the task and/or data [6]. Population-based metaheuristics are naturally prone to parallelize since most of their variation operators can be easily undertaken in parallel [2, 13]. Parallel implementations of metaheuristics are an effective alternative to speed up sequential metaheuristics; by reducing the search time for solutions of optimization problems. Furthermore, they lead to the more precise random algorithm and improve the quality of solutions [11]. As seen in Figure2, the implementation of parallel metaheuristics is divided into two categories [13]. Fig. 2. Parallel implementation of metaheuristics Parallel evolutionary algorithms are used in many works rather than feature selection, such as inferring phylogenies, traffic prediction. In [9] Santander et al., used MPI/OpenMP with a hybrid multiobjective evolutionary algorithm (fast non- dominated sorting genetic algorithms and firefly algorithm); for phylogenetic reconstruction (Inferring evolutionary trees). In [10] Jiri at al., used parallel multiobjective GA with OpenMP. In order to make traffic prediction more accurate. Master-Slave scheme of GA was implemented on multi-core parallel architecture. They reduced the computational time, but it was successful for short-term traffic prediction. III. OVERVIEW OF PARALLEL EVOLUTIONARY ALGORITHMS FOR FEATURE SELECTION Feature selection algorithms are used to find an optimal subset of relevant features in the data. In this section we will talk about parallel evolutionary algorithms that are used for feature selection problem in large datasets. We will illustrate the steps of six algorithms (PGA, PCHC, PPSO, PGPSO, PSS, and PACO). International Journal of Computer Science and Information Security (IJCSIS), Vol. 16, No. 3, March 2018 182 https://sites.google.com/site/ijcsis/ ISSN 1947-5500
  • 3.
    A. Parallel Geneticalgorithm (PGA) In order to increase the efficiency and reduce the execution time of the genetic algorithm (GA); the researchers used par- allel GA. Algorithm 1 presents the parallel GA methodology, with the master-slave model of parallel GA. Algorithm 1 Parallel genetic algorithm [10] Create initial population Evaluate initial population Create slaves while not done do Start slave Wait for slave to finish Run mutation operator end while for i=1 to slave iterations do Select individuals Run crossover operator Evaluate offsprings if solution found then set done=True end if end for B. Parallel CHC algorithm (PCHC) A CHC is a non-traditional GA, which combines a con- servative selection strategy (that always preserves the best individuals found so far), that produces offsprings that are at the maximum hamming distance from their parent. The main processes of CHC algorithm are [1]: • Half-Uniform Crossover (HUX): This will produce two offsprings, which are maximally different from their two parents. • Elitist selection: this will keep the best solutions in each generation. • Incest prevention: this step prevents two individuals to mate if the similarity between them greater than a thresh- old. • The Restarting process: if the specified population stagnated, then this step generated a new population by choosing the best individuals. C. Particle Swarm Optimization (PSO) This subsection handles the geometric particle swarm optimization (GPSO) and shows the algorithm that used to parallelize PSO or GPSO. 1) Geometric Particle Swarm Optimization (GPSO): GPSO is a recent version of PSO. The key issue in GPSO is the using a multi-parental recombination of solutions (particles). In the first phase, a random initialization of particles created. Then the algorithm evaluates these particles to update the historical and social positions. Finally, the three parents (3PMBCX) move the particles, as shown in Algorithm 2: Algorithm 2 GPSO algorithm [2] S:SwarmInitialization() while not stop condition do do for each particle i of the swarm S do do evaluate(solution(xi)) update(velocity equation (hi)) update(global best solution (gi)) end for for each particle i of the swarm S do do xi:3PMBCX ((xi, wa), (gi, wb), (hi, wc)) mutate(xi) end for end while Output: best solution found 2) Parallel Multi Swarm Optimization (PMSO): Parallel multi swarm optimization presented in [2], it was defined in analogy with parallel GA as a pair of (S, M), where S is a collection swarm, and M is a migration policy. Algorithm 3 depicts the parallel PSO methodology. Algorithm 3 Multi swarm optimization [2] DO IN PARALLEL for each i 1,...,m initialize(Si) while not stop condition do do iterate Si for n steps /* PSO evolution */ for each Sj (Si) do do send particles in s(Si) to Sj end for for each Sj such that Si (Sj ) do do receive particles from Sj replace particles in Si according to r end for end while Output: best solution ever found in the multi-swarm D. Parallel Scatter Search (PSS) Scatter search is an evolutionary method that was success- fully applied to hard optimization problems. It uses strategies for search diversification and intensification that have proved effective in a variety of optimization problems, see Algorithm 4. E. Parallel Ant Colony Optimization (PACO) When dealing with huge search space, parallel computing techniques usually applied to improve the efficiency. Parallel ACO algorithms can achieve high-quality solutions in reasonable execution times comparing with sequential ACO [18]. In Algorithm 5, the methodology of PACO is presented. International Journal of Computer Science and Information Security (IJCSIS), Vol. 16, No. 3, March 2018 183 https://sites.google.com/site/ijcsis/ ISSN 1947-5500
  • 4.
    Algorithm 4 Parallelscatter search methodology [11] Create Population (Pop, PopSize) Generate ReferenceSet (RefSet, RefSetSize) while Stopping Criterion1 do while Stopping Criterion2 do Select Subset (Subset, SubsetSize) for each processor r=1 to n do in parallel do Combine Solutions (SubSet, CurSol) Improve Solution (CurSol, ImpSol) end for end while Update ReferenceSet (RefSet) end while Algorithm 5 Parallel ant colony optimization methodology [18] Generate Ants Initialize N processors Multicast to all slaves processors N and the task ids of all slaves for each slave do do Send a number between 0 and N that identifies the task inside the program end for while not all slaves have sent back solution do Wait for solution if a slave returns a solution that is better than any solution received then Multicast this solution to all slaves end if end while Return the best solution IV. PARALLEL EVOLUTIONARY ALGORITHMS FOR FEATURE SELECTION We reviewed a set of research papers, which were dealing with feature selection problem for high dimensional datasets in a parallel environment and using parallel evolutionary algorithms. Let us discuss these studies in the following subsections. A. Parallel GA Liu et al., [5] used parallel GA for selecting informative genes (features) in tissue classification, using wrapper approach. The main purpose was to find the subset of features with fewer elements and higher accuracy. The parallelization of GA performed by dividing the population into sub- populations, and then the GA run on each sub-population. Therefore, the searching for the optimal subset of genes can be on several CPUs/computers at the same time. For evaluation, the Golub classifier was used. This classifier introduced by the authors and it depend on the sign of the results for classification; if the sign is positive the sample x belongs to class 1, else if it negative the sample x belongs to class 2. This classifier used only if the datasets have two classes. The accuracy of the classifier tested by using the LOOCV (leave one out cross validation) method. The results showed that using the parallel GA increased the accuracy, and reduced the number of genes that used for classification. In [8] Zheng et al., analyzed the execution speed and solution quality of many parallel GA schemes theoretically. Furthermore, they pointed to the best scheme of parallel GA that used on multi-core architecture. This paper considered the relationship between speed and parallel architecture along with solution quality. They analyzed (Master-Slave, Synchronous Island, Asynchronous Island, Cellular, and hybrid scheme of Master- Slave and Island) schemes of parallel GA, with Pthread library on multi-core parallel architecture. To validate their theoretical analyzing an experiments performed. The hybrid scheme of (Master-Slave and Asynchronous Island) was the best scheme in performance using multi-core architecture. The Island scheme has the best execution time, but the worst solution quality. To improve the solution quality when using Island model it is better to decrease the number of islands. The Asynchronous Island is faster than the Synchronous. The Master-Slave scheme has the best solution quality and the worst execution time. Soufan at el., [15] developed a web-based tool called DWFS, which used for feature selection for different problems. This tool followed a hybrid approach of wrapper and filter. First, the filter used as preprocessing and select the top ranked features based on tunable and a predefined threshold. In the next step, parallel GA based on wrapper approach applied to the selected features to search for subset features that increase the classifier accuracy. The scheme of parallel GA was Master-Slave; the master node used to create initial population and GA steps. While the slave (worker) nodes used for fitness evaluation of each chromosome, this implementation is performed on 64 core. For evaluation, they used three different classifiers (Bayesian classifier, K-nearest neighbor, and a combination of them). The results of the experiments show that DWFS tool provided many options to enhance the feature selection problem in different biological and biomedical problems. In [7] Pinho et al., presented a framework called ParJEColi (java-based library) for a parallel evolutionary algorithm in bioinformatics applications. The aim of this platform was to make the parallel environment (multi-core, cluster, and grid) easy and transparent to the users. This library adapted itself to the problem and the target parallel architecture. The user can easily configure the parallel model and the target architecture; since, ParJEColi encapsulated the parallelization International Journal of Computer Science and Information Security (IJCSIS), Vol. 16, No. 3, March 2018 184 https://sites.google.com/site/ijcsis/ ISSN 1947-5500
  • 5.
    concerns as features.The explicit steps implemented by a simple GUI. The experiments for validation this framework were done on 2 biological dataset and many bioinformatics scenarios. The results indicate that the proposed framework improves the computational performance (decreases execution time) also the solution quality. B. Parallel CHC In [1] Peralta et al., presented a parallel evolutionary algorithm called CHC algorithm by using the MapReduce paradigm for selecting features in high dimensional datasets to improve the classification. The parallelization of CHC algorithm is done by using MapReduce procedure (Hadoop implementation). A cluster of computers of 20 computing nodes were used. Each dataset split into 512-map task. For evaluating their work, three classifiers where used SVM (support vector machine), logistic regression, and Bayesian classifier. The results showed that the run time for classification increased as the number of features decreased, except for Bayesian classifier. They explained this result as follow: if the number of blocks less than the number of computing machines; this leads to have some machines remain idle. In addition, if the number of blocks greater than the number of computing machines, the blocks maybe will not distributed in efficient way. They compared parallel CHC with the serial version, and they concluded that the accuracy of classification increased by using parallel CHC. Furthermore, the parallel version of CHC reduced the run time when the datasets is high dimensional. C. Parallel PSO PSO is an efficient optimization technique, it used to solve the problem of feature selection in high dimensional datasets. In [4] Chen et al., used the parallel PSO algorithm for solving two problems at the same time. By creating an objective function that takes into account three variables at the same time (the selected features, the number of support vectors, and average accuracy of SVM). In order to maximize the capability of SVM classifier in generalization. The proposed method called PTVPSO-SVM (parallel time variant particle swarm optimization support vector machine), it had two phase: 1) the parameter settings of SVM and feature selection work together. 2) the accuracy of SVM evaluated using the set of features and the optimal parameters from the first phase. They used parallel virtual machine (PVM) with 8 machines; and 10-fold cross validation. The results showed that they could achieve the following aims: increasing the accuracy classification, reducing the execution time comparing with sequential PSO, producing an appropriate model of parameters, and selecting the most discriminative subset of features. Feature selection can be carried out based on rough set theory with searching algorithm as in [3, 6]. In [6] Qian et al., proposed three parallel attribute reduction (feature selection) algorithms based on MapReduce on Hadoop. The first algorithm was built by constructing the proper (key, value) by rough set theory and implementing MapReduce functions. The second algorithms were done by realizing the parallel computation of equivalence classes and attribute significances. The last parallel algorithm was designed to acquire the core attributes and a reduce in both data and parallel task. The experiments are performed on a cluster of computers (17 computing node). They considered the performance of the parallel algorithms, but they did not focus on the classification accuracy; since the sequential and parallel algorithms gave the same results. The results showed that the proposed parallel attribute reduction algorithms could deal with high dimensional datasets in an efficient way and better than the sequential algorithms. In [3] Adamczyk, use rough set theory for attribute reduction, to increase the efficiency he implemented parallel Asynchronous PSO for this problem. The parallelization was done by assigning the complex function computations in slave cores and the main core make the updating particle and checking the convergence of the algorithm. From their experiments it was noticeable that the efficiency and speedup of parallel PSO algorithm were raising as the size of dataset increased. The achievable accuracy was not astonishing, but it was better than the classical algorithms. D. Parallel GPSO In [2] Garcia-Nieto et al., parallelized a version of PSO called GPSO which is suitable for feature selection problem in high dimensional datasets. The proposed method was called PMOS (Parallel multi-swarm optimizer). Which was done by running a set of parallel sub PSOs algorithms, which forming an island model. Migration operation exchanged solutions between islands based on a certain frequency The aim of the fitness function increasing the classification accuracy and reduce the number of selected genes (features). They used the SVM classifier (Support Vector Machine) to prove the accuracy of the selected subset of features. In their experiments, they used a cluster of computers as a International Journal of Computer Science and Information Security (IJCSIS), Vol. 16, No. 3, March 2018 185 https://sites.google.com/site/ijcsis/ ISSN 1947-5500
  • 6.
    parallel architecture. Theyfound that 8-swarm PMSO was the best choice for parallelization. The results pointed out that this algorithm was better than the sequential version and other methods in term of performance and accuracy while it selected few genes for each subset. E. Parallel SS In [11] Lopez et al., present a parallel SS metaheuristics for solving feature selection problem in classification. They proposed two methods for combining solutions in SS. The first method is called GC (greedy combination): in this strategy, the common features of the combined solutions are added, then at each iteration one of the remaining features is added to any new solution. The second strategy is called RGC (reduced greedy combination), it has the same start as GC, but in the next step, it considers only the features that appear in solutions with good quality. Then the parallelization of SS is obtained by running these two methods (GC, RGC) at the same time on two processors. Using different combination methods and parameters settings at each processor. They compared the proposed parallel SS with sequential SS and GA. The results show that the quality of solution in parallel SS is better than solutions which was obtained from the sequential SS and GA. Also, the parallel SS use a smaller set of features for classification. The run time is the same for parallel and sequential SS. F. Parallel ACO This subsection shows how the parallel ACO is used to solve feature selection problem for classification in high dimensional datasets. In [17] Meena et al., implemented a parallel ACO to solve the feature selection problem for long documents. The parallelization was done using MapReduce programming model (Hadoop) that automatically parallelize the code and data then run them on a cluster of computing nodes. The wrapper approachis used as evaluation criteria that used Bayesian classifier. Furthermore, the accuracy of the classifier was based on these metrics: precision, recall, accuracy and F-measure. The enhanced algorithm (parallel ACO) was compared with ACO, enhanced ACO, and two feature selection methods, CHI (Statistical technique) and IG (Information Gain). They used Bayesian classifier in evaluation process. The results showed that for a given fixed quality of the solutions the proposed algorithm could reduce the execution time but without considered the solution quality. On the other hand, the accuracy of the classifier was increased using parallel TABLE I SUMMARY OF ALGORITHMS AND PROGRAMMING MODELS Paper Used evolutionary algorithm Parallel Programming model Peralta et al. [1] CHC (Type of GA) MapReduce Garcia-Nieto et al. [2] GPSO MALLBA Adamczyk [3] PSO Unknown Chen et al. [4] PSO PVM Liu et al. [5] GA Unknown Lopez et al. [11] SS Unknown Soufan et al. [15] GA MPI Meena et al. [17] ACO MapReduce ACO comparing with sequential ACO and feature selection methods. In [12] Cano et al., parallelized an existing multi-objective ant programming model that used as the classifier. This algorithm was used for rule mining in high dimensional datasets. The parallelization was done on data and each ant encoded a rule. This was achieved by let each processor perform the same task on a different subset of the data at the same time. In the implementation, they used GPUs, which are multi-core and parallel processor units architecture. This parallel model Followed CUDA method. For evaluation they used these metrics: true positive, false positive, true negative, false negative, sensitivity, and specificity. The results indicate that the efficiency of this model was increased as the size of datasets increased. V. SUMMARY AND DISCUSSION The summary of the papers that implemented the parallel EA for solving the classification problem in high dimensional datasets is reported in Table 1 and Table 2. Many research papers [2, 3, 7, 8, 9, 10, 12], stated that we can reduce the execution time and achieve acceptable speed ups, when applying parallel evolutionary algorithms on multiple processors. We noticed that they achieved a reasonable speed up in many cases. In the next table (Table 2), when comparing the accuracy of parallel EA it is important to notice how many classifiers were used to measure the accuracy. Furthermore, we should consider the metrics that were used to evaluate the classifier. For example, the parallel PSO and its variants have the higher accuracy; but they used only one metric which is the success rate. This means that the parallel PSO is not the most accurate parallel EA based on Table 2. On the other hand, the parallel GA and its variant has the least accuracy, but they used from two to five metrics for evaluation purpose. Based on these metrics, we can say that International Journal of Computer Science and Information Security (IJCSIS), Vol. 16, No. 3, March 2018 186 https://sites.google.com/site/ijcsis/ ISSN 1947-5500
  • 7.
    TABLE II SUMMARY OFDATASETS, CLASSIFIERS, AND ACCURACY RESULTS Paper dataset Classifiers Metrics for classification Accuracy Peralta et al. [1] Epsilon Bayesian AUC= (TPR+TNR)/2 0.71 SVM 0.68 Logistic Regression 0.70 ECBDL 14-ROS Bayesian 0.67 SVM 0.63 Logistic Regression 0.63 Garcia- Nieto et al. [2] Colon SVM Success Rate 0.85 Lymp 0.97 Leuk 0.98 Lung 0.97 Adamczyk [3] 15 Data Set — Success rate 0.70 (Avg) Chen et al. [4] 30 Data Set SVM Success rate 0.87 (Avg) Liu et al. [5] Leukemia Golub Success rate 0.88 Colon N/A Lopez et al. [11] 12 Data Set Nearest Neighbor Success rate 0.86 (Avg) Bayesian 0.87 (Avg) Decision Tree 0.86 (Avg) Soufan et al. [15] 9 Data Set K- Nearest Neighbor F1, PPV, GMean,... 0.81(Avg) (GMean) Bayesian 0.79(Avg) (GMean) Meena et al. [17] 2 Data Sets Bayesian F-measure, recall,.... 0.64 (Avg) the parallel GA is the best parallel EA for feature selection in high dimensional datasets VI. CONCLUSION After the review of different parallel EA that are used to solve the feature selection problem in high dimensional datasets. We adopted the accuracy as a measure to compare the algorithms performance. The following points show our conclusion about the perfor- mance of the mentioned algorithms in this chapter for feature selection: • GA and its variants: based on the papers we reviewed, the parallel GA has the higher accuracy. • PSO and its variants: the parallel PSO has the same accuracy as sequential PSO. • SS: parallel SS gives better results in case of accuracy than GA and sequential SS. • ACO: parallel ACO has the less accurate results than the other parallel EA. It is noticeable that PGAs are the most suitable algorithms for feature selection in large datasets; since they achieved the highest accuracy. On the other hand, the PACO is time-consuming and less accurate comparing with other PEA. References [1] D. Peralta, S. RA.O, S. Rama.rez-Gallego, I.Triguero, J. Benitez, F. Herrera. ”Evolutionary Feature Selection for Big Data Classification: A MapReduce Approach”. Hindawi Publishing Corporation, Mathematical Problems in Engineering, Volume 11, Article ID 246139, (2015). [2] J. Garca-Nieto, E. Alba. ”Parallel multi-swarm optimizer for gene selection in DNA microarrays”. DOI 10.1007/s10489- 011-0325-9, Appl Intell (2012) 37:255266. [3] M. Adamczyk. ”Parallel Feature Selection Algorithm based on Rough Sets and Particle Swarm Optimization”. DOI:.10.15439/2014F389, ACSIS, VOL.2. IEEE, (2014). [4] H. Ling Chen, B. Yang, S. Wang, G. Wang, D. Liu, H. Zhong Li, W. Liu. ”Towards an optimal support vector machine classifier using a parallel particle swarm optimization startegy”. Applied Mathematics and Computation 239, 180- 197. Elsevier, (2014). [5] J. Liu, H. Iba. ”Selecting Informative Genes with Parallel Genetic Algorithms in Tissue Classification”. Genome Informatics 12: 14-23, (2001). [6] J. Qian, D. Miao, Z. Zhang, X. Yue. ”Parallel attribute reduction algorithms using MapReduce”. Elsevier, Information Sciences (2014). [7] J. Pinho, L. Sobral, M. Rocha. ”Parallel evolutionary computation in bioinformatics applications”. Elsevier. Computer methods and programs in biomedicine 110, 183191, (2013). [8] L. Zheng, Y. Lu, M. Ding, Y.Shen, M. Guo, S. Guo. ”Architecture-based performance evaluation of genetic algorithms on multi/many core systems”. The 14th IEEE International Conference on Computational Science and Engineering. 978-0-7695-4477-9/11, (2011) IEEE DOI 10.1109/CSE.2011.65 [9] S. Santander-Jimenez, M. Vega-Rodrguez. ”Parallel Multiobjective Metaheuristics for Inferring Phylogenies on Multicore Clusters”. DOI 10.1109/TPDS.2014.2325828, IEEE (2014). [10] J. Petrlik, L. Sekanina. ”Towards Robust and Accurate Traffic Prediction Using Parallel Multiobjective Genetic Algorithms and Support Vector Regression”. (2015) IEEE 18th International Conference on Intelligent Transportation Systems. International Journal of Computer Science and Information Security (IJCSIS), Vol. 16, No. 3, March 2018 187 https://sites.google.com/site/ijcsis/ ISSN 1947-5500
  • 8.
    [11] F. Lopez,M. Torres, B. Batista, J. Perez, J. Vega. ”Solving feature selection problem by a parallel Scatter Search”. Elsevier, European Journal of Operational Research 169, 477-489, (2006). [12] A. Cano, J. Olmo, S. Ventura. ”Parallel Multi-Objective Ant Programming for Classification Using GPUs”. Journal of Parallel and Distributed Computing, November (2012). [13] E. Albaa, G. Luquea, S. Nesmachnowb. ”Parallel metaheuristics: recent advances and new trends” Intl. Trans. in Op. Res. 20 (2013) 148, DOI: 10.1111/j.1475- 3995.2012.00862.x [14] H. Liu, M. Hiroshi. ”Instance Selection and Construction for Data mining”. Springer ,Eds (2001). [15] O. Soufan, D. Kleftogiannis, P. Kalnis, V. Bajic. ”DWFS: A wrapper Feature Selection Tool Based on a Parallel Genetic Algorithm”. PLOS ONE, DOI: 10.1371/journal.pone.0117988, February (2015). [16] A. Gottlieb, G.S. Almasi. ”Highly Parallel Computing”. Benjamin-Cummings Publishing Co., Inc. Red word city, CA, USA (1989). [17] M. Meena, K.R. Chandran, A. Kathik, A. Samuel. ”A parallel ACO algorithm to select terms to categorise longer documents”. Int. J. Computational Science and Engineering, Vol 6 , No.4, (2001). [18] A. Sameh, A. Ayman, N. Hasan. ”Parallel Ant Colony Optimization”. International Journal of research and reviews in Computer Science (IJRRCS), Vol.1, No. 2, June (2010). International Journal of Computer Science and Information Security (IJCSIS), Vol. 16, No. 3, March 2018 188 https://sites.google.com/site/ijcsis/ ISSN 1947-5500