Computer Science

MIMD

MIMD, or Multiple Instruction, Multiple Data, is a parallel computing architecture where multiple processors execute different instructions on different pieces of data simultaneously. This allows for independent processing of multiple tasks, making it suitable for complex and diverse computational workloads. MIMD systems can be either shared memory or distributed memory architectures.

Written by Perlego with AI-assistance

5 Key excerpts on "MIMD"

  • Essentials of Computer Architecture
    .
    Graphics Processors . SIMD architectures are also popular for use with graphics. To understand why, it is important to know that typical graphics hardware uses sequential bytes in memory to store values for pixels on a screen. For example, consider a video game in which foreground figures move while a background scene stays in place. Game software must copy the bytes that correspond to the foreground figure from one location in memory to another. A sequential architecture requires a programmer to specify a loop that copies one byte at a time. On an SIMD architecture, however, a programmer can specify a vector size, and then issue a single copy command. The underlying SIMD hardware then copies multiple bytes simultaneously.

    18.13 Multiple Instructions Multiple Data (MIMD)

    The phrase Multiple Instructions Multiple Data streams (MIMD ) is used to describe a parallel architecture in which each of the processors performs independent computations at the same time. Although many computers contain multiple internal processing units, the MIMD designation is reserved for computers in which the processors are visible to a programmer. That is, an MIMD computer can run multiple, independent programs at the same time.
    Symmetric Multiprocessor (SMP) . The most well-known example of an MIMD architecture consists of a computer known as a Symmetric Multiprocessor (SMP ). An SMP contains a set of N processors (or N cores) that can each be used to run programs. In a typical SMP design, the processors are identical: they each have the same instruction set, operate at the same clock rate, have access to the same memory modules, and have access to the same external devices. Thus, any processor can perform exactly the same computation as any other processor. Figure 18.4
  • Computational Physics
    eBook - ePub

    Computational Physics

    Problem Solving with Python

    • Rubin H. Landau, Manuel J Páez, Cristian C. Bordeianu(Authors)
    • 2015(Publication Date)
    • Wiley-VCH
      (Publisher)
    Single instruction, multiple data (SIMD) Here instructions are processed from a single stream, but the instructions act concurrently on multiple data elements. Generally, the nodes are simple and relatively slow but are large in number.
    Multiple instructions, multiple data (MIMD) In this category, each processor runs independently of the others with independent instructions and data. These are the types of machines that utilize message-passing packages, such as MPI, to communicate among processors. They may be a collection of PCs linked via a network, or more integrated machines with thousands of processors on internal boards, such as the Blue Gene computer described in Section 10.15 . These computers, which do not have a shared memory space, are also called multicomputers. Although these types of computers are some of the most difficult to program, their low cost and effectiveness for certain classes of problems have led to their being the dominant type of parallel computer at present.
    The running of independent programs on a parallel computer is similar to the multitasking feature used by Unix and PCs. In multitasking (Figure 10.4a ), several independent programs reside in the computer’s memory simultaneously and share the processing time in a round robin or priority order. On a SISD computer, only one program runs at a single time, but if other programs are in memory, then it does not take long to switch to them. In multiprocessing (Figure 10.4b ), these jobs may all run at the same time, either in different parts of memory or in the memory of different computers. Clearly, multiprocessing becomes complicated if separate processors are operating on different parts of the same program because then synchronization and load balance (keeping all the processors equally busy) are concerns.
    In addition to instructions and data streams, another way of categorizing parallel computation is by granularity. A grain
  • Special Computer Architectures for Pattern Processing
    • King-Sun Fu(Author)
    • 2018(Publication Date)
    • CRC Press
      (Publisher)
    Multiple Data Streams (MIMD) operations are the most generalized parallel programs. Each individual instruction stream must have a sequence of scalar operations. These parallel processes may be interdependent on each other. System deadlock would be a major problem to be solved for MIMD operations. Vector instructions may not appear in strict MIMD mode, but may appear in the mixed mode to be described in Section VI.D.
    Example 3:
    Parbegin
    Subprocess 1, Subprocess 2, Subprocess n,
    Parend
    D. Distributive Mixed Mode In this mode, SIMD vector instructions and parallel MIMD processes are simultaneously executed as declared by the following statements:
    Example 4:
    P a r b e g i n
    A B  +  C
    X Y * Z
    }
    SIMD mode
    subprocess 1
    subprocess n,
    }
    MIMD  mode
    p a r e n d
    The above operation modes are only the fundamental ones to be implemented. There are many combinations of the above modes. Only after we implement the basic modes can we challenge the implementation of more sophisticated operation modes to upgrade the system throughput and enhance its flexibility.
    Special system control instructions must be developed to make the above operations possible. Listed below are several typical system command instructions that may be implemented in the system. 1.  INITIALIZE — Set the program counters of allocated processors to specific values. 2.  SYNCHRONIZE — Put the allocated PMUs in the WAIT or FETCH state. 3.  Vector issue, mask, routing, etc. 4.  Memory management, interrupts and I/O commands, etc.
    For special parallel-processing applications, such as multiple-frame image processing or pattern classification, special programming or query languages must be developed to handle the very large scale data bases. There always exists a trade-off between the complexity of user programming language and the operating system capabilities.
    VII. OPERATING SYSTEM REQUIREMENTS
    The operating strategy for the PM4 system has to be decided from the following choices: (1) Multiprogramming vs. uniprogramming on vector control and processor-memory units, and (2) Distributed vs. dedicated
  • A Survey of Computational Physics
    eBook - ePub

    A Survey of Computational Physics

    Introductory Computational Science

    This [B] = [A][B] multiplication is an example of data dependency, in which the data elements used in the computation depend on the order in which they are used. In contrast, the matrix multiplication [C] = [A][B] is a data parallel operation in which the data can be used in any order. So already we see the importance of communication, synchronization, and understanding of the mathematics behind an algorithm for parallel computation. The processors in a parallel computer are placed at the nodes of a communication network. Each node may contain one CPU or a small number of CPUs, and the communication network may be internal to or external to the computer. One way of categorizing parallel computers is by the approach they employ in handling instructions and data. From this viewpoint there are three types of machines: • Single-instruction, single-data (SISD): These are the classic (von Neumann) serial computers executing a single instruction on a single data stream before the next instruction and next data stream are encountered. • Single-instruction, multiple-data (SIMD): Here instructions are processed from a single stream, but the instructions act concurrently on multiple data elements. Generally the nodes are simple and relatively slow but are large in number. • Multiple instructions, multiple data (MIMD): In this category each processor runs independently of the others with independent instructions and data. These are the types of machines that employ message-passing packages, such as MPI,to communicate among processors. They may be a collection of work-stations linked via a network, or more integrated machines with thousands of processors on internal boards, such as the Blue Gene computer described in §14.13. These computers, which do not have a shared memory space, are also called multicomputers
  • Advanced Computer Architectures
    • Sajjan G. Shiva(Author)
    • 2018(Publication Date)
    • CRC Press
      (Publisher)
    R/C ratio.
    The definitions of Section 5.4 for the speedup, the efficiency, and the cost of parallel computer architectures apply to MIMD systems also and is illustrated by the following examples.
    Example 6.9
    Consider again the problem of accumulating N numbers. The execution time on an SISD is of the O(N). On an MIMD with N processors and a ring interconnection network between the processors, the execution requires (N − 1) time units for communication and (N − 1) time units for addition. Thus the total time required is 2(N − 1) or O(2N), and hence the speedup is 0.5.
    If the processors in the MIMD are interconnected by a hypercube network, this computation requires log2 N communication steps and log2 N additions, resulting in a total run time of 2 log2 N. Hence,
    Speedup S = N/(2 log2 N) or O(N/ log2 N)
    Efficiency E = 1/(2 log2 N) or O(1/ log2 N), and
    Cost = N × 2 log2 N or O(N log2 N)
    Example 6.10
    The problem of accumulating N numbers can be solved in two methods on an MIMD with p processors and a hypercube interconnection network. Here, p < N and we assume that N/p is less than or equal to p.In the first method, each block of p numbers are accumulated in (2 log2 p). Since there are N/p such blocks, the execution time is (2N/p log2 p). The resulting N/p partial sums are the accumulated in (2 log2 p). Thus, the total run time is (2N/p log2 p + 2 log2 p). In the second method, each of the p blocks of N/p numbers is allocated to a processor. The run time for computing the partial sums is then O(N/p). These partial sums are accumulated using the perfect shuffle network in (2 log2 p). Thus the total run time is (N/p + 2 log2 p
Index pages curate the most relevant extracts from our library of academic textbooks. They’ve been created using an in-house natural language model (NLM), each adding context and meaning to key research topics.