How does the central processing unit work?

Analysis on the working principle of central processing unit

A complete microcomputer system includes hardware system and software system. Computer hardware refers to various physical devices that make up a computer. They are composed of all kinds of real devices and are the material basis of computer work. The most important part of computer hardware system is the central processing unit (CPU).

(A) the basic concept and composition of CPU

CPU (Central Processing Unit) is the core of computer system, which mainly includes arithmetic unit and controller. If the computer is compared to a person, then the CPU is the heart, and its important role can be seen from this. The internal structure of CPU can be divided into three parts: control unit, logic unit and storage unit. If these three parts coordinate with each other, we can analyze, judge, calculate and control the coordination of all parts of the computer.

All the actions of the computer are controlled by the central processing unit. The arithmetic unit mainly completes various arithmetic operations (such as addition, subtraction, multiplication and division) and logical operations (such as logical addition, logical multiplication and logical inversion); The controller has no arithmetic function, but only reads various instructions, analyzes them and makes corresponding control. Usually there are several registers in the CPU, which can directly participate in the operation and store the intermediate results of the operation.

The CPU we often talk about is X86 series and compatible CPU. The so-called X86 instruction set was specially developed by Intel Corporation of the United States for its first 16-bit CP U(i8086), and the CPU— i8088(i8086 simplified version) in the world's first PC introduced by IBM Corporation of the United States in 198 1 year used X86 instructions. At the same time, the X8 7 chip series mathematical coprocessor added in the computer to improve the floating-point data processing capability uses X87 instruction, and X87 instruction set and X87 instruction set will be collectively referred to as X86 instruction set. Although with the continuous development of CPU technology, Intel has successively developed newer i80386 and i80486, and even today's Pentium III series, in order to ensure that computers can continue to run various applications developed in the past to protect and inherit rich software resources, all CPUs produced by Intel Company still continue to use X86 instruction sets. In addition to Intel, AMD, Cyrix and other manufacturers have also produced CPU that can use X86 instruction set. Because these CPUs can run all kinds of software developed for Inte l CPU, people in the computer industry list these CPUs as CPU compatible products of Inte l. Because Intel X8 6 series and its compatible CPU all use X8 6 instruction set, today's huge x86 series and compatible CPU lineup is formed.

(2) Main technical parameters of the central processing unit

The quality of CPU directly determines the grade of a computer system, and the main technical characteristics of CPU can reflect the general performance of CPU.

The number of bits of binary data that CPU can process at the same time is one of its most important quality marks. People usually refer to 16-bit computer and 32-bit computer, that is to say, the C PU in this microcomputer can process 16-bit and 32-bit binary data at the same time. The early representative IBM PC/XT, IBM PC/AT and 286 computers were 16-bit computers, 386 and 486 computers were 32-bit computers, and 586 computers were 64-bit high-end microcomputers.

CPU can be divided into eight-bit microprocessor, sixteen-bit microprocessor, thirty-two-bit microprocessor and sixty-four-bit microprocessor according to the word length of information it processes.

Bit: In digital circuits and computer technology, binary coding is adopted, and the coding is only "0" and "1", where "0" and "1" are both one bit in CPU.

Bytes and word length: in computer technology, the number of bits of binary numbers that can be processed by CPU at one time per unit time (at the same time) is called word length. Therefore, a CPU capable of processing data with a word length of 8 bits is usually called an 8-bit CPU. Similarly, a 32-bit CPU can process 32-bit binary data in a unit time. Because commonly used English characters can be represented by 8-bit binary, 8 bits are usually called a byte. The length of bytes is not fixed, and the length of word length is different for different CPU. An 8-bit C PU can only process one yu segment at a time, while a 32-bit CPU can process four yu segments at a time. Similarly, a 64-bit CPU can process 8 bytes at a time.

2.CPU external frequency

The external frequency of CPU, that is, the CPU bus frequency listed in the table of common characteristics, is the reference clock frequency provided by the motherboard for CPU, and the working main frequency of CPU is multiplied by the external frequency according to the frequency multiplication coefficient. In Pentium era, the external frequency of CPU is generally 60/60/66MHz. Since Pentium II 350, the external frequency of CPU has been increased to 100 MHz. In general, the CPU bus frequency is the same as the memory bus frequency, so when the external frequency of CPU increases, the exchange speed with memory also increases accordingly, which has a great influence on improving the overall running speed of the computer.

3.FSB frequency

The front-end bus, also called CPU bus, is the working clock for exchanging data between CPU, memory and L2 cache (only referring to Socket 7 motherboard), because the frequency of the front-end bus is the same as that of the memory bus on various motherboards at present. Because the maximum bandwidth of data transmission depends on the width of data bits and transmission frequency, that is, data bandwidth = (bus frequency (data width) /8. For example, Intel's P Ⅱ 333 uses 6 6MHz front-end bus, so its data exchange bandwidth with memory is 528MB/s =(66×64)/8, while its P Ⅱ 350 uses 100MHz front-end bus, so its peak data exchange bandwidth is 800 MB/s = (100×). It can be seen that the front-end bus speed will affect the data exchange speed between CPU and memory (L2 cache) when the computer is running, and actually affect the overall running speed of the computer. Therefore, at present, Intel begins to transition its P Ⅲ front-end bus frequency from 100MHz to 133MHz. AMD's new K7 uses the front-end bus frequency of 20 0MHz, but some data show that the data exchange clock between K7 CPU core and memory is still 100MHz, and the main frequency is also doubled at 100 MHz.

4.CPU main frequency

The main frequency of CPU, also called working frequency, is the actual working frequency of CPU core (integer and floating-point arithmetic unit) circuits. Before 486 DX2 CPU. The main frequency of CPU is equal to the external frequency. Starting from 486DX2, basically all CPU main frequencies are equal to "external frequency multiplied by frequency multiplication factor". Main technical features of CPU. The main frequency is the clock frequency when the CPU core is running, which directly affects the running speed of CPU.

We know that only Penti um can execute two operation instructions in one clock cycle. If Pentium with the main frequency of 100MHz can execute 200 million instructions in 1 second, Pentium with the main frequency of 200MHz can execute 400 million instructions per second, so the higher the CPU frequency, the faster the computer runs.

It should be noted that Cyrix's CPU- main frequency index is measured by using PR performance rating parameters, which means that the CPU performance at this time is equivalent to Intel's main frequency CPU. The actual running clock frequency of CPU with PR parameter is inconsistent with the nominal main frequency. For example, the actual working frequency of M ⅱ-300 is 233MHz(66×3.5), but the main frequency of PR parameter is 300MH z, which means M ⅱ-300 is equivalent to Intel's P ⅱ-300. But in fact, only the Business Win ston index (integer performance) of M ⅱ-300 can be comparable to that of P ⅱ-300.

5.L 1 and L2 cache capacity and speed

The capacity and working speed of L 1 and L2 cache play a key role in improving computer speed, especially L2 cache plays a significant role in improving the speed of commercial software running more 2 D graphics processing.

L2 cache was established in 486' s to make up for the shortage of L 1 cache and minimize the delay caused by main memory to CPU operation.

The L2 cache of CPU is divided into internal and external. The L2 cache in the CPU chip runs at the same speed as the main frequency, while the L2 cache installed outside the CPU chip in P Ⅱ mode generally runs at half the main frequency, so its efficiency is lower than that in the chip, which is an important reason why Celeron only has 128KB on-chip cache, but its performance almost exceeds that of P Ⅱ with the same main frequency (with 5 12KB off-chip L2 cache, with half the main frequency).

(3) Analysis of main technical terms of CPU

1, pipeline technology

Pipeline was first used by InteI in 486 chip. An assembly line works just like an assembly line in industrial production. In CPU, an instruction processing pipeline consists of 5~6 circuit units with different functions, and then an X86 instruction is divided into 5~6 steps, which are executed by these circuit units respectively, so that an instruction can be completed within a CPU clock cycle, thus improving the running speed of CPU. Because there is only one pipeline in 486CP U, instructions that have been divided into five steps are executed simultaneously through five circuit units in the pipeline: instruction fetch, decoding, address generation, instruction execution and data write-back, so the designer of 486CP U has achieved the goal of completing one instruction per clock cycle (in my opinion, the CPU should actually reach the processing speed of completing one instruction per cycle from the fifth clock cycle). In the Pentium era, designers set up two pipelines with independent circuit units in the CPU, so that the CPU can execute two instructions at the same time through these two pipelines, so in theory, the purpose of completing two instructions per clock cycle can be achieved.

2. Superpipeline and superscalar technology

Super pipeline means that the pipeline inside some CPU exceeds the usual 5~6 steps, for example, the pipeline of Pentium pro is as long as 14 steps. The more steps (stages) of pipeline design, the faster the completion of an instruction, so it can adapt to CPU with higher working frequency. Supe rscalar means that there is more than one pipeline in the CPU, and more than one instruction can be completed in each clock cycle. This design is called superscalar technology.

3, out-of-order execution technology

Out-of-orderexecution refers to the technology adopted by CPU, which allows multiple instructions to be developed out of the order specified by the program and sent to the corresponding circuit unit for processing. For example, a part of a program has seven instructions. At this time, the CPU will analyze the idle state of each unit circuit and whether each instruction can be executed in advance, and immediately send the instruction that can be executed in advance to the corresponding circuit for execution. Of course, after each unit fails to execute instructions in the specified order, the corresponding circuit must rearrange the operation results in the order specified by the original program before returning to the program. This way of disassembling instructions out of order is called out-of-order execution (also called out-of-order execution). The purpose of using out-of-order execution technology is to make the internal circuit of CP U run at full load and correspondingly improve the speed of CPU running programs.

4. Technology-based pre-tracking and speculation execution technology

Branch prediction and speculative execution are the main contents of CPU dynamic execution technology, and dynamic execution is one of the advanced technologies mainly adopted by CPU at present. The main purpose of adopting branch prediction and dynamic execution is to improve the running speed of CPU. Speculation execution is based on branch prediction, and the processing after the branch prediction program branches is also speculation execution.

5, instruction special extension technology

From the simplest computer, the instruction sequence can get operands and perform calculations. For most computers, these instructions can only perform one calculation at a time. If you need to complete some parallel operations, you have to calculate several times in succession. This kind of computer uses SISD (Single Instruction Single Data) processor. When introducing CPU performance, we often refer to "extended instructions" or "special extensions", which refers to whether the CPU has instruction extensions to the X86 instruction set. The first extension instruction is InteI's own "MMX", followed by AMD's "3D Now!" Finally, the "Shanghai Stock Exchange" in the recent Pentium III.

MMX and Shanghai Stock Exchange: MMX is the abbreviation of English "Multimedia Instruction Set". * * * There are 57 instructions, which is the first time that Intel has extended the X86 instruction set that has been finalized since 1985. MMX is mainly used to enhance CPU's processing of multimedia information and improve CPU's ability to process 3D graphics, video and audio information. However, because only integer operation is optimized, the floating-point operation ability is not strengthened. Therefore, with the increasing popularity of 3D graphics, 3D web pages are used more and more on the Internet, and MMX is more than enough. MMX instructions can perform SIMD operations on integers, such as -40, 0, 1, 469 or 32766. SSE instruction increases the SIMD operation ability of floating-point numbers, such as -40.2337, 1.4355 or 874638+02 and so on. Using MMX and SSE, one instruction can perform calculations on more than two data streams. In the previous example, it is no longer necessary to execute 529,000 instructions per second, but only 264,600 instructions. Because the same instruction can act on the left and right channels at the same time. When displaying, it does not need 70,778,880 commands per second, but only 23,592,960 commands, because the red, green and blue channels can be controlled by the same command.

SSE: SSE is the abbreviation of "Internet Streaming SIMD Extension" in English. It is the first application of Intel in Pentium III. In fact, it was originally rumored that MMX2 was later called KNI(Katmai NewInstruction), and Katmai is actually Pentium III now. SSE*** has 70 instructions, including not only the original MMX but also the current 3D! All the functions in the instruction set, and especially strengthen the SIMD floating-point processing ability, in addition, specifically for the growing development of the current Internet, strengthen the CPU's ability to process 3D web pages and other audio and video information technologies. After the CPU has a special extended instruction set, it must be supported by the application program to play its role. Therefore, when the most advanced Penthm III 450 and Pentium II 450 run the same application without the support of extended instructions, there is not much difference in speed between them.

SSE not only keeps the original MMX instruction, but also adds 70 instructions, which not only speeds up the floating-point operation, but also improves the efficiency of memory use and makes the memory speed appear faster. The improvement of game performance is very significant. Intel said that SSE has a particularly obvious impact on the following fields: 3D geometric operation and animation processing; Graphic processing (such as Photoshop); Video editing/compression/decompression (such as MPEG and DVD); Speech recognition; And sound compression and synthesis.

3D now! The multimedia extended instruction set developed by AMD has 27 instructions. Aiming at the weakness that MMX instruction set does not strengthen the floating-point processing ability, this paper focuses on improving the processing ability of AMD K6 series CPU for 3D graphics. However, due to the limited instructions, the instruction set is mainly used for 3D games, but does not support other commercial graphics applications.

(D) CPU production process and product architecture

1, CPU production process

There is often a "process technology" in the parameters representing CPU performance, including "0.35um" or "0.25um" and so on. Generally speaking, the smaller the data in "process technology", the more advanced the CPU production technology. At present, the production of CPU mainly adopts CMOS process. CMOS is the abbreviation of "complementary metal oxide semiconductor" in English. When using this technology to produce CPU, all kinds of circuits and components are treated by "light knife", aluminum metal is deposited on silicon material, and then "light knife" is used to carve wires to connect all components. At present, the precision of lithography is generally expressed in microns (um). The higher the precision, the more advanced the production process. Because the higher the accuracy, the more components can be produced on the same volume of silicon material, and the thinner the connection, the higher the working frequency of the produced CPU can be. Because of this, the working frequency of the first generation Pentium CPU is only 60/60/66MHz when only 0.65 micron process can be used, and when the subsequent production process is gradually developed to 0.35 micron and 0.25 micron, Pentium MMX with working capacity of 266MHz and Pentium II CPU with main frequency of 500MHz are also produced accordingly. Due to the limitation of current science and technology, the current CPU production process can only reach 0.25 micron, so Intel, AMD, Cyrix and other companies are making efforts to 0. 18um and copper wire (replacing the original aluminum with metal copper deposited on silicon materials). It is estimated that as long as the production process reaches 0. 18um, it will be common to produce CPU with main frequency of l000MHz.

AMD and Motorola have reached a seven-year technical cooperation agreement in order to continue to compete with Intel for the right to develop microprocessors in the next century. Motorola will license AMD the latest copper interconnection technology. AMD plans to manufacture K7 microprocessor with the main frequency as high as 1000MHz( 1GHz) in 2000. CPU will develop to a faster 64-bit structure. The manufacturing process of CPU will be more elaborate, and it will transition from the current 0.25 micron to 0. 18 micron. By 2000, most CPU manufacturers will adopt 0. 18 micron process. After 200 1, many manufacturers will turn to 0. 13 micron copper manufacturing technology. The improvement of manufacturing technology means smaller size and higher integration. The advantages of copper technology are very obvious. Mainly in the following aspects: the conductivity of copper is better than that of aluminum, which is widely used now, and the resistance and calorific value of copper are small, thus ensuring the reliability of the processor in a wider range; Using chip manufacturing technology and copper technology below 0. 13 micron will effectively improve the working frequency of the chip; The volume of the existing die can be reduced. Compared with the traditional aluminum process, copper process will effectively improve the chip speed and reduce the chip area. From the development point of view, copper technology will eventually replace aluminum technology.

Every CPU produced by various manufacturers has a name (brand name), a code (development code) and a logo (special pattern). Among them, the early products of tel were all named after i80x86, that is, the previous 286, 386, 486 and so on. When Intel developed the fifth-generation product 586, it changed to Pentium due to the trouble of trademark registration, and registered the Chinese trademark name "Pentium" for it. Thus, the later Pentium Pr o (Pentium II), Pentium III (Pentium III) and Celeron (Celeron) came into being. At present, the name can not reflect the specifications of the same type of CPU, and it will be improved after Intel officially introduces P III with front-end bus of 133MHz. You can understand the general technical characteristics of this CPU only by looking at the name in the future.

In addition, manufacturers have another development code for each CPU, including products with the same name but different technical specifications. For example, P Ⅱ produced by Intel with 0.3 5 and 0.25 processes is code-named Klamath and Destrutes respectively. At the same time, the CIPU of each name of Itel has a special trademark pattern as a symbol. AMD and Cyrix are similar to Intel, and each of their CPUs also has a name, code name and logo, but there is no official Chinese name.

2. Internal structure of CPU

At present, the internal structure of CPU we use can actually be divided into two structures: single bus and double bus. Because the internal structure characteristics of CPU determine the packaging form and installation specifications of CPU, here are some simple introductions.

Before Intel developed Pentium Pro, all kinds of 486 and above CPUs, such as the classic Pentium, were composed of main processor, mathematical coprocessor, controller, various registers and L 1 cache. Up to now, a large number of CPUs continue to be produced in this internal structure mode, such as K6-2 of AMD, M Ⅱ of Cyrix, IDT-C6 and so on. Starting from P6 (the development code of Pen-Tium Pro), in order to further improve the data exchange speed between CPU and L2 cache, Intel integrated the cache control circuit and L2 cache (secondary cache) originally set on the computer motherboard into the CPU chip. In this way, the data exchange between CPU core and cache does not need to pass through the external bus, but directly through the cache bus inside CPU. Because the data exchange channels between CPU core and memory, and between CPU and cache are separated, the first P6 dual-bus architecture mode is formed (see figure 1). Judging from the actual application effect of Pentium Pro, this technical measure is very successful and a major progress in CPU development technology. Due to the advantages of P6 dual-bus structure, all CPUs with L2 C C ache and cache controllers have changed from traditional single-bus mode to dual-bus mode, such as Intel's P Ⅱ, New Celeron and P Ⅲ. AMD K6-Ⅲ and K7, etc.

3. Architecture and encapsulation of 3.CPU

The CPU architecture is determined according to the type and specification of the CPU installation socket. At present, the commonly used CPU can be divided into two architectures according to its installation socket specifications: Socket x and Slotx.

Socket x CPU is divided into So cket 7 and Socket370. 32 1 pin socket 7 and 370 pin socket 370 are used for installation respectively. Socket7 and Socket 370 are very similar in appearance and size, but Socket 370 has one more pin socket than Socket 7. In the CPU of Slot x architecture, it is divided into three types: S lot 1, Slot 2 and Slot A, which are installed by using slot slots of corresponding specifications. 1 Slot and Slot A are both 242 trunking, but the mechanical and electrical standards are different, so they are incompatible with each other. Slot 2 is a relatively large slot, which is specially used for loading Xeon with P Ⅱ and P Ⅲ sequences. Xeon is a CPU dedicated to workgroup servers.

Packaging is the last process of CPU production. Packaging is a protective measure to cure CPU chips or CPU modules with specific materials to prevent damage. Generally, CPU can only be delivered to users after it is packaged.

The packaging method of CPU depends on the installation form of CPU and the integrated design of equipment. Generally, the CPU installed in Socket socket can only be encapsulated by PGA (grid array), while the CPU installed in Slot x slot is all encapsulated by SEC (single-sided box). At present, the CPU in PGA package mainly includes Celeron from Intel, K6-2 and K6-Ⅲ from AMD and M Ⅱ from Cyrix. Celeron used to be encapsulated by SEC, but now it is encapsulated by PGA (see Figure 4). The CPU encapsulated by SEC includes Intel's P Ⅱ, P Ⅲ and AM D's K7. Among them, Intel's slot architecture CPU is actually packaged by three single-sided plug-in boxes: SEPP, SECC and SEC2.

Although Celeron and K6-Ⅲ have integrated the L2 cache and cache controller of 128KB and 256KB respectively in the above CPU, they make the CPU core, L2 cache and cache controller on the same silicon material at one time, so they are small in size and can be packaged in PGA. However, the main reason Celeron adopts PGA package is to reduce the production cost, and the main reason why K6-Ⅲ adopts PGA package is that Intel has applied for patents for sockets in slot 1, slot 2 and slot 370, so an MD can only adopt Socket 7 architecture and adopt PGA package to produce K6-Ⅲ.

At present, there are two manufacturing methods of slot architecture CPU. One is to install the CPU core chip, cache controller chip and L2 cache chip which are manufactured separately on a PCB (circuit board), and then install a single-sided plug-in box and a fan to complete the final manufacture of the CPU. The CPU manufactured by this structure and method includes Intel's P Ⅱ, P Ⅲ and AMD's K7. Second, a complete CPU chip (including CPU core, high-speed Cach e controller chip and L2Cache chip) is installed on the circuit board, at this time, the circuit board only plays the role of installing slot interface. Finally, a single-sided box and a fan are installed to form a complete CPU. The CPU manufactured by this structure and method is only a part of Celeron of Intel Corporation.