Computer Science

Cache Memory

Cache memory is a type of high-speed memory that stores frequently used data for quick access. It is located closer to the CPU than main memory, allowing the processor to access data more quickly. The use of cache memory can significantly improve the performance of a computer system.

Written by Perlego with AI-assistance

Related key terms

C Memory Address

Computer Memory

CPU Performance

Memory Management

Primary storage

8 Key excerpts on "Cache Memory"

eBook - ePub
Modern Computer Architecture and Organization
- Jim Ledin, Dave Farley(Authors)
- 2022(Publication Date)
- Packt Publishing
  (Publisher)
Cache Memory is a high-speed memory region (compared to the speed of main memory) that temporarily stores program instructions or data for future use. Usually, these instructions or data items have been retrieved from main memory recently and are likely to be needed again shortly.

The primary purpose of Cache Memory is to increase the speed of repeatedly accessing the same memory location and nearby memory locations. To be effective, accessing the cached items must be significantly faster than accessing the original source of the instructions or data, referred to as the backing store .

When caching is in use, each attempt to access a memory location begins with a search of the cache. If the requested item is present, the processor retrieves and uses it immediately. This is called a cache hit . If the cache search is unsuccessful (a cache miss ), the instruction or data item must be retrieved from the backing store. In the process of retrieving the requested item, a copy is added to the cache for anticipated future use.
Cache Memory is used for a variety of purposes in computer systems. Some examples of Cache Memory applications are:
- Translation lookaside buffer (TLB) : The TLB, as we saw in Chapter 7 , Processor and Memory Architectures , is a form of Cache Memory used in processors supporting paged virtual memory. The TLB contains a collection of virtual-to-physical address translations that speed up access to page frames in physical memory. As instructions execute, each main memory access requires a virtual-to-physical translation. Successful searches of the TLB result in much faster instruction execution compared to the page table lookup process following a TLB miss. The TLB is part of the MMU and is not directly related to the varieties of processor Cache Memory discussed later in this section.
- Disk drive caches : Reading and writing the magnetized platters of rotating disk drives is orders of magnitude slower than accessing dynamic RAM (DRAM ) devices. Disk drives generally implement Cache Memory to store the output of read operations and to temporarily hold data in preparation for writing. Drive controllers often store more data than the quantity originally requested in internal Cache Memory with the expectation that future reads will request data adjacent to the initial request. If this turns out to be a correct assumption, which it often is, the drive can satisfy the second request immediately from cache without the delay associated with accessing the disk platters.
Sign up to read
Learn more about book
eBook - ePub
Embedded Systems
A Contemporary Design Tool
- James K. Peckol(Author)
- 2019(Publication Date)
- Wiley
  (Publisher)
These are known as secondary memory and are shown in the diagram by the block on the left. At the bottom are the smallest, fastest memories called Cache Memory; these are typically higher speed SRAMs. These devices also tend to be the most expensive. In the middle of the hierarchy is main or prima ry memory. These are either lower speed SRAM devices or, more commonly, DRAM memories. CPU registers are sometimes included in the ranking as higher speed memory than cache. The motivation for building a memory system as a hierarchical collection of different kinds of memories is that we would prefer an application program to execute as quickly as possible. Accessing memory takes time; each access contributes to the time required to execute an instruction that can have a significant negative impact on real‐time performance in an embedded application. We will not consider secondary storage; the typical embedded applications will not use this. The discussion here will focus on main memory and cache, the last two blocks on the right. These can be implemented using (variations on) the designs presented in the previous sections. 4.15 Basic Concepts of Caching icache, dcache Cache is a small, fast memory that temporarily holds copies of block data and program instructions from the main memory. The increased speed of Cache Memory over that of main memory components offers the prospective for programs to execute much more rapidly if the instructions and data can be held in cache. Many of today's higher performance microprocessors, implemented around the Harvard architecture, will internally support both an icache (instruction cache) and a dcache (data cache). We will now examine the concept of caching in greater detail. We will look first at the ideas behind caching, what cache is, why it works, and some of the potential difficulties encountered in embedded applications
Sign up to read
Learn more about book
eBook - ePub
Modern Computer Architecture and Organization
Learn x86, ARM, and RISC-V architectures and the design of smartphones, PCs, and cloud servers
- Jim Ledin(Author)
- 2020(Publication Date)
- Packt Publishing
  (Publisher)
Chapter 8 : Performance-Enhancing Techniques
The fundamental aspects of processor and memory architectures discussed in previous chapters enable the design of a complete and functional computer system. However, the performance of such a system would be poor compared to most modern processors without the addition of features to increase the speed of instruction execution.

Several performance-enhancing techniques are employed routinely in processor and system designs to achieve peak execution speed in real-world computer systems. These techniques do not alter what the processor does in terms of program execution and data processing; they just help get it done faster.

After completing this chapter, you will understand the value of multilevel Cache Memory in computer architectures and the benefits and challenges associated with instruction pipelining. You'll also understand the performance improvement resulting from simultaneous multithreading and the purpose and applications of single instruction, multiple data processing.
The following topics will be covered in this chapter:

Cache Memory

Instruction pipelining

Simultaneous multithreading

SIMD processing

Cache Memory

A Cache Memory is a memory region that stores program instructions or data, usually instructions or data that have been accessed recently, for future use. The primary purpose of Cache Memory is to increase the speed of repeatedly accessing the same memory location or nearby memory locations. To be effective, accessing the cached data must be significantly faster than accessing the original source of the data, referred to as the backing store .

When caching is in use, each attempt to access a memory location begins with a search of the cache. If the data is present, the processor retrieves and uses it immediately. This is called a cache hit . If the cache search is unsuccessful (a cache miss
Sign up to read
Learn more about book
eBook - ePub
Computer Systems Architecture
- Aharon Yadin(Author)
- 2016(Publication Date)
- Chapman and Hall/CRC
  (Publisher)
6 Cache Memory Cache Memory
This chapter focuses on Cache Memory. By using the general architecture figure, we can relate to the Cache Memory and its contribution to system performance (Figure 6.1 ).

As stated in the previous chapter, Cache Memory is an important layer in the memory hierarchy, and its main contribution is in improving the execution speed. The memory hierarchy is depicted once again in Figure 6.2 , but this time the emphasis is on the sizes of the various levels of the hierarchy. The slowest and largest level (as far as capacity is concerned) is the disks. Currently, the standard disks used in personal computers (PCs) have a capacity that starts with several hundreds of gigabytes and goes up to several terabytes. Furthermore, utilizing cloud computing, in which the system’s resources reside on remote servers, the disk capacity increases significantly. The main memory (random access memory [RAM]), which represents the second level, has a standard capacity ranging from several gigabytes up to hundreds of gigabytes. The Cache Memory, which is the next level, is usually divided into several components, each with a different purpose and a different size. The last level of the memory hierarchy is the registers which usually are very limited.

The RAM described in the previous chapter is used for storing programs and data. There is another memory component called read-only memory (ROM), which is used by the operating system and the hardware and is intended for components (programs and/or data) that do not change frequently. Despite its name, some of the currently available ROMs can be changed; sometimes, a special recording device is required. Even so, their main use remains for special operating systems or hardware functions. As such, ROM is not available for standard computer programs.

One of the important attributes of ROM is the fact it is a nonvolatile memory, which means it retains its content even if the power is switched off. For that reason, ROM is used, for example, by the boot programs that are responsible for bringing the system up. Other components stored in the ROM are programs or data required for managing some input and output devices. Usually, these types of data will not be modified during the life span of the device. In modern computers, some of the ROM is replaced by flash memory, which is a nonvolatile device that can be rewritten if the need arises.
Sign up to read
Learn more about book
eBook - ePub
Computer Architecture and Security
Fundamentals of Designing Secure Computer Systems
- Shuangbao Paul Wang, Robert S. Ledley(Authors)
- 2012(Publication Date)
- Wiley
  (Publisher)
Modern computers usually use random access memory (RAM) as the main memory. Dynamic RAM (DRAM) is most often seen on personal computers as it is cheaper and can be highly integrated due to its lower power consumption. DRAM need to be refreshed periodically to avoid data loss. Static RAM (SRAM) is faster than DRAM but with less integration rate. It is also more expensive so it is commonly seen on servers or special, fast computers.

3.3 Cache Memory

We know registers are a special type of memory that offer the fastest speed but with very limited numbers. On the other hand, memory (RAM) is much cheaper compared with registers and can be integrated in large quantities with easy access. But its speed is slower. To fill the gap, there is another type of memory called Cache Memory. A memory hierarchy of computer systems is shown in Figure 3.6 .

Figure 3.6 A memory hierarchy of computer systems

The cache is a small amount of fast memory that sits between the processor and memory to bridge the speed gap between the CPU and main memory (Hwang, 1993). Cache is much smaller than the main memory. The working mechanism for the Cache Memory is to prefetch the data from the main memory and make them handy when the processor needs them. If the prediction is accurate then the processor can get the data directly from the fast Cache Memory without requiring the main memory to be accessed.

It is not surprising people would ask why Cache Memory works and how to predict the data needed before executing the program. Let us look at the “block” idea. Suppose we want to add two matrices with M row and N column together, we need to add M × N times. If the data are all in cache, we call it a read hit , then it would save more time getting data with the cache than sending the address to the address buffer (MAR) and waiting for data from the data buffers (MDR) with every add operation.

If the data that the processor is requesting are not in the cache, we call a read miss
Sign up to read
Learn more about book
eBook - ePub
Computer Principles and Design in Verilog HDL
- Yamin Li(Author)
- 2015(Publication Date)
- Wiley
  (Publisher)
Chapter 11 Memory Hierarchy and Virtual Memory Management
Memory is a temporary place for storing programs (instructions and data). It is commonly implemented with dynamic random access memory (DRAM). Because DRAM is slower than the CPU (central processing unit), an instruction cache and a data cache are fabricated inside the CPU. Not only the caches but also TLBs (translation lookaside buffers) are fabricated for fast translation from a virtual address to a physical memory address.

This chapter describes the memory structures, cache organizations, virtual memory management, and TLB organizations. The mechanism of the TLB-based MIPS (microprocessor without interlocked pipeline stages) virtual memory management is also introduced.

11.1 Memory

A computer consists of a CPU, the memory, and I/O interfaces. Memory is used to store programs that are being executed by the CPU. There are many types of memory, but we discuss only the following four types of memory in this book.

SRAM (static random access memory), which is fast and expensive, is used to design caches and TLBs. Some high-performance computers also use it as the main memory.

DRAM, which is large and inexpensive, is mainly used as the computer's main memory.

ROM (read-only memory), which is nonvolatile and cheap, is typically used to store the computer's initial start-up program or firmware in embedded systems.

CAM (content addressable memory), which is a very special memory, is mainly used to design a fully associative cache or TLB.

Except for ROM, all memories are volatile. It means that when the power supply is off, the contents in the memory will be lost. The contents in such memories are not usable when the power supply is just turned on. Therefore, there must be a ROM in a computer or embedded system.

“Random access” means that any location of the memory can be accessed directly by providing the address of that location. There are some other types of memory that cannot be accessed randomly, the FIFO (first-in first-out) memory, for instance.
Sign up to read
Learn more about book
eBook - ePub
Computer Architecture
Fundamentals and Principles of Computer Design, Second Edition
- Joseph D. Dumas II(Author)
- 2016(Publication Date)
- CRC Press
  (Publisher)
Figure 2.3 .

Notice that the upper levels of the hierarchy are the fastest (most closely matched to the speed of the computational hardware) but the smallest in terms of storage capacity. This is often due at least somewhat to space limitations, but it is mainly because the fastest memory technologies, such as SRAM, are the most expensive. As we move down the hierarchy, lower levels are composed of slower but cheaper and higher density components, so they have larger storage capacities. This varying capacity of each level is symbolized by drawing the diagram in the shape of a triangle.

Figure 2.2 Memory hierarchy (conceptual).

Figure 2.3 Memory hierarchy (typical of modern computer systems).

Because the higher levels of the memory hierarchy have smaller capacities, it is impossible to keep all the information (program code and data) we need in these levels at one time. In practice, each higher level of the hierarchy contains only a subset of the information from the levels below it. The fundamental idea underlying the hierarchical memory concept is that we want to make as many accesses (as a percentage of the total) as we can to the upper levels of the hierarchy while only rarely having to access the lower levels, such that the resulting, overall memory system (taking into account all devices) approaches the speed of the highest levels while maintaining a capacity and cost per gigabyte approximating that of the lowest levels (the secondary storage devices). This requires a complex and well-thought-out design of which, for best acceptance, the details should be hidden from the end user. As much as possible, only the system designers should have to deal with the details of managing the memory system for optimal performance. However, if one is to be responsible for specifying computer systems whose performance is important or for developing code to run in such an environment, it is worthwhile to study the techniques used to optimize memory systems.
Sign up to read
Learn more about book
eBook - ePub
Fundamentals of Parallel Multicore Architecture
- Yan Solihin(Author)
- 2015(Publication Date)
- Chapman and Hall/CRC
  (Publisher)
For decades, the increase in CPU speed has been much faster than the decrease in the access latency of the main memory. Up until roughly 2001-2005, CPU speed as measured in its clock frequency grew at the rate of 55% annually, while the memory speed grew at the rate of only 7% annually [24]. This speed gap produced an interesting implication. While in the past, a load instruction could get the needed datum from main memory in one CPU clock cycle, in recent systems it requires hundreds of processor clock cycles to get a datum from the main memory. Dependences between a load instruction (producer) and instructions that use the loaded value (consumers) dictate that the consumer instructions must wait until the load obtains its datum before they can execute. With the latency of loading datum from the main memory in the order of hundreds of cycles, the CPU may stall for much of that time because it runs out of instructions that are not dependent on the load. Hence, it is critical to performance that most data accesses are supplied to the CPU with low latencies. Caches provide such support.

A cache is a relatively small memory for keeping data that is likely needed by the requestor. The concept of a cache is universal since it can be used as a software construct or a hardware component. In this chapter, we focus on hardware caches that exist between the processor and the main memory.

An example of a memory hierarchy is shown in Figure 5.1 . It shows a configuration in which there are twelve processor cores on a chip. Each core has a private Level 1 (L1) data cache and a Level 1 instruction cache. Each core has a Level 2 (L2) cache that holds both instructions and data (referred to as “e.g., unified”). There is a Level 3 (L3) cache that is shared by all cores. Due to its size, the L3 cache may be banked, and each bank may be local to each core, but remote to other cores, meaning that it is accessible by all cores but at differing latencies. The typical range of access latencies in 2013 (in terms of CPU clock cycles) and capacity of each cache is shown in the figure. After the L3 cache, there may be an off-die L4 cache and the main memory.

Figure 5.1: A memory hierarchy configuration in a multicore system in 2013.

The example in the figure is similar to the memory hierarchy of the IBM Power8 processor. In the Power8, each core has 4-way simultaneous multithreading (SMT ), which means that it can execute four threads simultaneously by fetching from two different program counters. Most of the processor core resources such as register files and functional units are shared by the four threads. A Power8 die also has twelve cores, so there are a total of 48 threads that can run simultaneously. Each of the cores has a 32KB L1 instruction cache and a 64KB L1 data cache. Each core also has a private 512KB L2 cache, so in total the L2 caches have 6MB of capacity. Both the L1 and L2 caches use SRAM cells. The L3 cache is 12-way banked, and each bank has an 8MB capacity, for a total of 96MB over all banks. The L4 cache is located off the die on the memory buffer controller, which is connected to the main memory. The L3 and L4 caches are implemented on DRAM on logic process, a technology referred to as embedded DRAM (eDRAM
Sign up to read
Learn more about book

Learn about this page

Index pages curate the most relevant extracts from our library of academic textbooks. They’ve been created using an in-house natural language model (NLM), each adding context and meaning to key research topics.

Explore more topic indexes

Biological Sciences

Languages & Linguistics

Politics & International Relations

Social Sciences

Technology & Engineering