How does caching work?

The cache recognizes repeated instructions and data located in the main memory and copies them into its memory. The CPU accesses the faster cache instead of repeatedly accessing the slow main memory for the same instructions and data.

Cache, sometimes called CPU memory, usually runs on high-performance SRAM memory modules. The CPU can access a faster cache to run performance-sensitive operations. Cache memory is usually integrated under the motherboard or on different chips, and interconnected with CPU through bus.

Extended data

Cache technology depends on the principle of "locality of program execution and data access", which is manifested in two aspects:

Time locality: once an instruction in a program is executed, it may be executed again in the near future. If some data has been accessed, it may be accessed again in the near future.

Spatial locality: once a program accesses a certain storage unit, the nearby storage units will be accessed soon, that is, the addresses accessed by the program in a certain period of time may be concentrated in a certain range, because instructions or data are usually stored in sequence.

Time locality is realized by saving recently used instructions and data into cache. Spatial locality is usually achieved by using a large cache and integrating the prefetch mechanism into the cache control logic.

Baidu Encyclopedia-Cache Memory