Parallel Algorithms: Sort & Merge, Image Processing, Fault Tolerance

Algorithms 1. Introduction An algorithm is defined as a sequence of computational steps required to accomplish a specific task. The algorithm works for a given input and will terminate in a well-defined state. The basic conditions of an algorithm are: input, output, definiteness, effectiveness and finiteness. The purpose of the developing an algorithm is to solve a general, well specified problem. A concern while designing an algorithm also pertains to the kind of computer on which the algorithm would be executed. The two forms of architectures of computers are: sequential computer and parallel computer. Therefore, depending upon the architecture of the computers, we have sequential as well as parallel algorithms. The algorithms which are executed on the sequential computers simply perform according to sequence of steps for solving a given problem. Such algorithms are known as sequential algorithms. However, a problem can be solved after dividing it into sub-problems and those in turn are executed in parallel. Later on, the results of the solutions of these sub problems can be combined together and the final solution can be achieved. In such situations, the number of processors required would be more than one and they would be communicating with each other for producing the final output. This environment operates on the parallel computer and the special kind of algorithms called parallel algorithms are designed for these computers. The parallel algorithms depend on the kind of parallel computer they are designed for. Hence, for a given problem, there would be a need to design the different kinds of parallel algorithms depending upon the kind of parallel architecture. A parallel computer is a set of processors that are able to work cooperatively to solve a computational problem. This definition is broad enough to include parallel supercomputers that have hundreds or thousands of processors, networks of workstations, multi-processor workstations, and embedded systems. The parallel computers can be represented with the help of various kinds of models such as random access machine (RAM), parallel random access machine (PRAM), Interconnection Networks etc. While designing a parallel algorithm, the

computational power of various models can be analyzed and compared, parallelism can be involved for a given problem on a specific model after understanding the characteristics of a model. The analysis of parallel algorithm on different models assist in determining the best model for a problem after receiving the results in terms of the time and space complexity. 2. Analysis of Parallel Algorithms A generic algorithm is mainly analyzed on the basis of the following parameters: the time complexity (execution time) and the space complexity (amount of space required). Usually we give much more importance to time complexity in comparison with space complexity. The subsequent section highlights the criteria of analyzing the complexity of parallel algorithms. The fundamental parameters required for the analysis of parallel algorithms are as follow: • Time Complexity • The Total Number of Processors Required • The Cost Involved. 2.1 Time Complexity As it happens, most people who implement algorithms want to know how much of a particular resource (such as time or storage) is required for a given algorithm. The parallel architectures have been designed for improving the computation power of the various algorithms. Thus, the major concern of evaluating an algorithm is the determination of the amount of time required to execute. Usually, the time complexity is calculated on the basis of the total number of steps executed to accomplish the desired output. The resource consumption in parallel algorithms is both processor cycles on each processor and also the communication overhead between the processors. Thus, first in the computation step, the local processor performs an arithmetic and logic operation. Thereafter, the various processors communicate with each other for exchanging messages and/or data. Hence, the time complexity can be calculated on the basis of computational cost and communication cost involved. The time complexity of an algorithm varies depending upon the instance of the input for a given problem. For example, the already sorted list (10, 17, 19, 21, 22,

and 33) will consume less amount of time than the reverse order of list (33, 22, 21, 19, 17, and 10). The time complexity of an algorithm has been categorized into three forms: i) Best Case Complexity; ii) Average Case Complexity; and iii) Worst Case Complexity. The best case complexity is the least amount of time required by the algorithm for a given input. The average case complexity is the average running time required by the algorithm for a given input. Similarly, the worst case complexity can be defined as the maximum amount of time required by the algorithm for a given input. The main factors involved for analyzing the time complexity depends upon the:  Algorithm  Parallel computer model  Specific set of inputs. Mostly the size of the input is a function of time complexity of the algorithm. 2.2 Number of Processors One of the other factors that assist in analysis of parallel algorithms is the total number of processors required to deliver a solution to a given problem. Thus, for a given input of size say n, the number of processors required by the parallel algorithm is a function of n, usually denoted by TP (n). 2.3 Overall Cost Finally, the total cost of the algorithm is a product of time complexity of the parallel algorithm and the total number of processors required for computation. Cost = Time Complexity * Total Number of Processors The other form of defining the cost is that it specifies the total number of steps executed collectively by the n number of processors, i.e., summation of steps. Another term related with the analysis of the parallel algorithms is efficiency of the algorithm. It is defined as the ratio of the worst case running time of the best

sequential algorithm and the cost of the parallel algorithm. The efficiency would be mostly less than or equal to 1. In a situation, if efficiency is greater than 1 then it means that the sequential algorithm is faster than the parallel algorithm. 𝐸𝑓𝑓𝑖𝑐𝑖𝑒𝑛𝑐𝑦 = 𝑊𝑜𝑟𝑠𝑡 𝑐𝑎𝑠𝑒 𝑟𝑢𝑛𝑛𝑖𝑛𝑔 𝑡𝑖𝑚𝑒 𝑜𝑓 𝑆𝑒𝑞𝑢𝑒𝑛𝑡𝑖𝑎𝑙 𝐴𝑙𝑔𝑜𝑟𝑖𝑡ℎ𝑚 𝐶𝑜𝑠𝑡 𝑜𝑓 𝑃𝑎𝑟𝑎𝑙𝑙𝑒𝑙 𝐴𝑙𝑔𝑜𝑟𝑖𝑡ℎ𝑚 3. Merge Sort Algorithm First, divide the given sequence of n numbers into two parts, each consisting of n/2 numbers. Thereafter, recursively split the sequence into two parts until each number acts as an independent sequence. Consequently, the independent numbers are first sorted and recursively merged until a sorted sequence of n numbers is not achieved. In order to perform the above-mentioned task, there will be two kinds of circuits which would be used in the following manner: the first one for sorting and another one for merging the sorted list of numbers. Odd-Even Merging Circuit Let us firstly illustrate the concept of merging two sorted sequences using an odd-even merging circuit. The working of a merging circuit is as follows: 1) Let there be two sorted sequences A=(a1, a2, a3, a4……… am) and B=(b1, b2, b3, b4……… bm) which are required to be merged. 2) With the help of a merging circuit (m/2,m/2), merge the odd indexed numbers of the two sub sequences i.e. (a1, a3, a5…… am-1) and (b1, b3, b5……… bm-1) and thus resulting in sorted sequence (c1, c2, c3……… cm). 3) Thereafter, with the help of a merging circuit (m/2,m/2), merge the even indexed numbers of the two sub sequences i.e. (a2, a4, a6……… am) and (b2, b4, b6……… bm) and thus resulting in sorted sequence (d1, d2, d3……… dm). 4) The final output sequence O=(o1, o2, o3……… o2m ) is achieved in the following manner:

o1 = a1 and o2m = bm .In general the formula is as given below: o2i = min(ai+1,bi ) and o2I+1 = max(ai+1,bi ) for i=1,2,3,4……….m-1. Now, let us take an example for merging the two sorted sequences of length 4, i.e., A=(a1, a2, a3, a4) and B=(b1, b2, b3, b4). Suppose the numbers of the sequence are A=(4,6,9,10) and B=(2,7,8,12). The circuit of merging the two given sequences is illustrated in Figure 7. Sorting Circuit along with Odd-Even Merging Circuit As we already know, the merge sort algorithm requires two circuits, i.e. one for merging and another for sorting the sequences. Therefore, the sorting circuit has been derived from the above-discussed merging circuit. The basic steps followed by the circuit are highlighted below: i) The given input sequence of length n is divided into two sub-sequences of length n/2 each. ii) The two sub sequences are recursively sorted. iii) The two sorted sub sequences are merged (n/2,n/2) using a merging circuit in order to finally get the sorted sequence of length n.

Now, let us take an example for sorting the n numbers say 4,2,10,12,8,7,6,9. The circuit of sorting + merging given sequence is illustrated in Figure 8. Analysis of Merge Sort i) The width of the sorting + merging circuit is equal to the maximum number of devices required in a stage is O(n/2). As in the above figure the maximum number of devices for a given stage is 4 which is 8/2 or n/2. ii) The circuit contains two sorting circuits for sorting sequences of length n/2 and thereafter one merging circuit for merging of the two sorted sub sequences (see

stage 4th in the above figure). Let the functions Ts and Tm denote the time complexity of sorting and merging in terms of its depth. The Ts can be calculated as follows: Ts(n) =Ts(n/2) + Tm(n/2) Ts(n) =Ts(n/2) + log(n/2) , Therefore, Ts (n) is equal to O(log2 n). 4. Image Processing Image is the 2 dimensional distributions of the small image points called pixels. It can be considered as a function of two real variables, for example, f(x,y) with f as the amplitude (e.g. brightness) of the image at position (x,y). Image Processing is the process of enhancing the image and extraction of meaningful information from an image. Image Processing with parallel computing is an alternative way to solve image processing problems that require large times of processing or handling large amounts of information in "acceptable time" (according to each criterion). The main idea of parallel image processing is to divide the problem into simple tasks and solve them concurrently, in such a way the total time can be divided between the total tasks (in the best case). Parallel image processing cannot be applied to all problems, in other words we can say that not all the problems can be coded in a parallel form. A parallel program must have some features for a correct and efficient operation; otherwise, it is possible that runtime or operation does not have the expected performance. These features include the following:  Granularity: It’s defined as the number of basic units and it is classified as:  Coarse-grained: Few tasks of more intense computing.  Fine-grained: A large number of small parts and less intense computing.  Synchronization: This prevents the overlap of two or more processes.  Latency: This is the time transition of information from request to receipt.

 Scalability: It’s defined as the ability of an algorithm to maintain its efficiency by increasing the number of processors and the size of the problem in the same proportion. 5. Proposed Algorithms for Parallel Image Processing Tasks We will go through the following sequential image processing algorithms and develop its parallel versions. Load Distribution between Processors: Suppose we are taking the following image (1) as an example. The main step of these algorithms is to determine the number of tiles to be generated. The number of tiles corresponds to the amount of threads. If only a thread exists, the computation is just sequential computation. Otherwise if there are two or more threads then the image is divided into distinct areas, as shown in Figure 2, 3, 4. Each thread is responsible for processing the pixels included in its tile and to execute different tasks but considering maintaining synchronization between all the processor otherwise there will be the situation of deadlock between processors. Parallel Segmentation by Region Growing Technique and Calculation of different Features of Segmented Regions: Region growing is one of the very famous techniques of segmentation. Main drawback of the region growing is that it is time consuming. This algorithm is designed to reduce timing of this algorithm. These are the steps involved in it. 1. Take Input image on client processor. 2. Select the Seed pixels (It is based on some user criterion like pixels in a certain gray level range, pixels evenly spaced on a grid, etc.) for different regions on client processor. 3. Count number of seed points and activate same number of kernels. 4. Send copy of image and respective seed pixels on each kernel processor. 5. Regions are then grown from these seed pixels to adjacent depending on pixel intensity, gray level texture, or color, etc. on each kernel processor for this

image information is also important because region grown is based on the membership criteria. 6. Calculate different features of different ROI (region of interest) on individual kernel. 7. Send segmented regions and information of ROIs to the client processor and deactivation of worker kernels. 8. Reconstruct the image on client processor and Display of all the information. 6. Fault Tolerance To achieve the needed reliability and availability, we need fault-tolerant computers. They have the ability to tolerate faults by detecting failures, and isolate defect modules so that the rest of the system can operate correctly.

Reliability techniques have also become of increasing interest to general- purpose computer systems. Parallel computing can also be applied to the design of fault-tolerance computer systems, particularly via lockstep systems performing the same operation in parallel. This provides redundancy in case one component should fail, and also allows automatic error detection and error correction if the results differ. These methods can be used to help prevent single event upsets caused by transient errors. Although additional measures may be required in embedded or specialized systems, this method can provide a cost effective approach to achieve n-modular redundancy in commercial off-the-shelf systems. 7. Process of System Recovery from Fault Tolerance Faults in different parts of parallel system have different importance. Let’s think about a fault processor, line or switch. The most important is fault on processor. In this case the processes allocated on this processor have to be moved to other processor, recovered and initialized one more time. Usually we can think about that processor memory content is lost after fault appearing, or lack of access. It is necessary to remove and to redirect all communications lines going through this process. Every process of parallel system from the moment when the fault appears till the end of the recovery is getting a new attribute (fig.1). When processor element PE failed, then: • Every process allocated on the processor element PE is called locked process, main and copy too, • Every process except locked process, communicating with locked process is called fault influenced process, • Every process except locked process, not communicating with locked process is called free process.

The process of system recovery is known. But there is a question how and who controls recovery of kernel of processor. Control can be either centralized or decentralized. In case of decentralized control it is necessary to build on the fact that all kernels dispose the same data, according to which they determine final processors. Every kernel determines final processors for those locked processes which have on its processor allocated copies of processes. If the copy of process is located on more than one processor then the corresponding processor transmit message about system recovery to other processors where the other copies are located. Content of the message is about final processor for the exact copy of process and time mark of begin of recovery. The kernel of the system after receiving all messages about system recovery compares this time mark with its own time of recovery. Lately the kernel doesn’t realize any code reallocation of the relevant processes. In case of equality of time marks can be decisive by another criterion, like for example identification number of a processor. There is a question how many copies of processes are enough for sufficient resistance against faults. In case of active and passive processes it depends on requested security. One passive copy of the process is sufficient if we assume, that fault doesn’t appear on two processors occupied by the same process at the same time or in time of recovery of the system.

Parallel Algorithms: Sort & Merge, Image Processing, Fault Tolerance

More Related Content

What's hot

Viewers also liked

Similar to Parallel Algorithms: Sort & Merge, Image Processing, Fault Tolerance

More from University of Technology - Iraq

Recently uploaded

Parallel Algorithms: Sort & Merge, Image Processing, Fault Tolerance