## **Computer Organization and Architecture** ## **High Level Languages** #### **High Vs. Low Level Languages** ## **Number System** ## **Real Number System** ## **Number System Conversion** ## **Number Systems Conversion Chart** #### Binary | Place | 211 | 210 | 2 <sup>9</sup> | 28 | 27 | 26 | 25 | 24 | 2 <sup>3</sup> | 2 <sup>2</sup> | 21 | 20 | 2-1 | 2-2 | 2-3 | 2-4 | |--------|------|------|----------------|-----|-----|----|----|----|----------------|----------------|----|----|-----|------|-------|--------| | Weight | 2048 | 1024 | 512 | 256 | 128 | 64 | 32 | 16 | 8 | 4 | 2 | 1 | 0.5 | 0.25 | 0.125 | 0.0625 | #### Binary, Hex, and Octal Conversions | Binary | Octal | Hexadecimal | Decimal | |--------|-------|-------------|---------| | 0000 | 0 | 0 | 0 | | 0001 | 1 | 1 | 1 | | 0010 | 2 | 2 | 2 | | 0011 | 3 | 3 | 3 | | 0100 | 4 | 4 | 4 | | 0101 | 5 | 5 | 5 | | 0110 | 6 | 6 | 6 | | 0111 | 7 | 7 | 7 | | 1000 | 10 | 8 | 8 | | 1001 | 11 | 9 | 9 | | 1010 | 12 | А | 10 | | 1011 | 13 | В | 11 | | 1100 | 14 | С | 12 | | 1101 | 15 | D | 13 | | 1110 | 16 | E | 14 | | 1111 | 17 | F | 15 | #### Methodology - Convert from decimal to binary Divide the decimal by the largest binary weight it is divisible by and place a "1" in that column. Then select the next largest weight, if it is divisible put a "1" in that column otherwise place a "0" in the column. Continue until all the columns have either a "1" or "0" resulting in a binary expression. - Convert from decimal to hex. Convert to binary first, then group the binary in groups of 4 beginning on the right working to the left. For each group determine the hex value based on the table to the left. - Convert to octal Convert to binary first, then group the binary in groups of 3 beginning on the right working to the left. For each group determine the octal value based on the table to the left. ## **Binary Number System** ## **Number System** #### Bits and Bytes - A single unit of data is called a bit, having a value of 1 or 0. - Computers work with collections of bits, grouping them to represent larger pieces of data, such as letters of the alphabet. - Eight bits make up one byte. A byte is the amount of memory needed to store one alphanumeric character. - With one byte, the computer can represent one of 256 different symbols or characters. 8 bits = 1 byte 1 1 1 1 1 1 1 1 ### Complement ## **Complements of numbers** ### (r-1)'s Complement - •Given a number N in base r having n digits, - •the (r- 1)'s complement of N is defined as $$(r^n - 1) - N$$ •For decimal numbers the base or r = 10 and r - 1 = 9, •so the 9's complement of N is $(10^n-1)-N$ •99999..... - N | 9 | 9 | 9 | 9 | 9 | |-------|-------|-------|-------|-------| | Digit | Digit | Next | Next | First | | n | n-1 | digit | digit | digit | #### Complements - ♣ There are two types of complements for each base-*r* system: the radix complement and diminished radix complement. - the r's complement and the second as the (r-1)'s complement. Diminished Radix Complement Given a number N in case r having n digits, the (r-1) 's complement of N is defined as $(r^n - 1) - N$ . For decimal numbers, r = 10 and r - 1 = 9, so the 9's complement of N is $(10^n - 1) - N$ . #### Example: The 9's complement of 546700 is 999999 - 546700 = 453299. The 9's complement of 012398 is 999999 - 012398 = 987601. ♣ For binary numbers, r = 2 and r - 1 = 1, so the 1's complement of N is $(2^n - 1) - N$ . #### Example: The 1's complement of 1011000 is 0100111 The 1's complement of 0101101 is 1010010 ### **Precision floating-point** # Floating point numbers—the IEEE standard - IEEE Standard 754 - Most (but not all) computer manufactures use IEEE-754 format - Number represented: (-1)S \* (1.M)\*2(E - Bias) 2 main formats: single and double ## Floating-Point Numbers Examples of floating-point numbers in base 10 ... $\diamond$ 5.341×10<sup>3</sup>, 0.05341×10<sup>5</sup>, -2.013×10<sup>-1</sup>, -201.3×10<sup>-3</sup> ❖ Examples of floating-point numbers in base 2 ... ↑ 1.00101×2<sup>23</sup> , 0.0100101×2<sup>25</sup> , -1.101101×2<sup>-3</sup> , -1101.101×2<sup>-6</sup> ♦ T.00101×2<sup>23</sup>, 0.0100101×2<sup>23</sup>, -1.101101×2<sup>3</sup>, -1101.10 ♦ Exponents are kept in decimal for clarity $\Rightarrow$ The binary number $(1101.101)_2 = 2^3 + 2^2 + 2^0 + 2^{-1} + 2^{-3} = 13.625$ - Floating-point numbers should be normalized - ♦ Exactly one non-zero digit should appear before the point - In a decimal number, this digit can be from 1 to 9 - In a binary number, this digit should be 1 - ♦ Normalized FP Numbers: 5.341×10³ and -1.101101×2-3 - ♦ NOT Normalized: 0.05341×10<sup>5</sup> and −1101.101×2<sup>-6</sup> Floating Point ICS 233 – KFUPM © Muhamed Mudawar slide 4 ## Normalized Floating Point Numbers ❖ For a normalized floating point number (S, E, F) - Significand is equal to $(1.F)_2 = (1.f_1f_2f_3f_4...)_2$ - ♦ IEEE 754 assumes hidden 1. (not stored) for normalized numbers - ♦ Significand is 1 bit longer than fraction - Value of a Normalized Floating Point Number is $$(-1)^{S} \times (1.F)_{2} \times 2^{\text{val}(E)}$$ $$(-1)^{S} \times (1.f_{1}f_{2}f_{3}f_{4}...)_{2} \times 2^{\text{val}(E)}$$ $$(-1)^{S} \times (1 + f_{1} \times 2^{-1} + f_{2} \times 2^{-2} + f_{3} \times 2^{-3} + f_{4} \times 2^{-4}...)_{2} \times 2^{\text{val}(E)}$$ $(-1)^S$ is 1 when S is 0 (positive), and -1 when S is 1 (negative) Floating Point ICS 233 – KFUPM © Muhamed Mudawar slide 8 ## Floating-Point Representation - A floating-point number is represented by the triple - S is the Sign bit (0 is positive and 1 is negative) - Representation is called sign and magnitude - ♦ E is the Exponent field (signed) - Very large numbers have large positive exponents - Very small close-to-zero numbers have negative exponents - More bits in exponent field increases range of values - F is the Fraction field (fraction after binary point) - More bits in fraction field improves the precision of FP numbers Fraction Value of a floating-point number = $(-1)^{S} \times val(F) \times 2^{val(E)}$ ## Biased Exponent - Cont'd - ❖ For double precision, exponent field is 11 bits - ♦ E can be in the range 0 to 2047 - $\Rightarrow$ E = 0 and E = 2047 are reserved for special use - $\Leftrightarrow$ E = 1 to 2046 are used for normalized floating point numbers - $\Rightarrow$ Bias = 1023 (half of 2046), val(E) = E 1023 - $\Rightarrow$ val(E=1) = -1022, val(E=1023) = 0, val(E=2046) = 1023 - Value of a Normalized Floating Point Number is $$(-1)^{S} \times (1.F)_{2} \times 2^{E-Bias}$$ $(-1)^{S} \times (1.f_{1}f_{2}f_{3}f_{4}...)_{2} \times 2^{E-Bias}$ $(-1)^{S} \times (1 + f_{1} \times 2^{-1} + f_{2} \times 2^{-2} + f_{3} \times 2^{-3} + f_{4} \times 2^{-4}...)_{2} \times 2^{E-Bias}$ **ASCII** ## **Bitwise Operation** ## **Bitwise Operators** int a = 10, b = 2 for all examples below | Operator | Meaning | Example | Result | |----------|----------------------------------|---------|--------| | ~ | Bitwise unary NOT | ~a | -11 | | & | Bitwise AND | a&b | 2 | | 1 | Bitwise OR | a b | 10 | | ٨ | Bitwise Ex-OR | a^b | 8 | | >> | Shift right | a>>1 | 5 | | >>> | Shift right zero fill | a>>>1 | 5 | | << | Shift left | a<<1 | 20 | | &= | Bitwise AND assignment | a &= b | 2 | | = | Bitwise OR assignment | a = b | 10 | | ^= | Bitwise Ex-OR assignment | a ^= b | 8 | | >>= | Shift right assignment | a >>= 1 | 5 | | >>>= | Shift right zero fill assignment | a>>>=1 | 5 | | <<= | Shift left assignment | a <<= 1 | 20 | Startertutorials.com ## **Machine Language** ### Machine Language - Instructions, like registers and words of data, are also 32 bits long - Example: add \$t0, \$s1, \$s2 - registers have numbers, \$t0=8, \$s1=17, \$s2=18 - · Instruction Format: | [ | 000000 | 10001 | 10010 | 01000 | 00000 | 100000 | |---|--------|-------|-------|-------|-------|--------| | | op | rs | rt | rd | shamt | funct | ### Can you guess what the field names stand for? op = Basic operation of the instruction: opcode rs = The first register source operand rt = The second register source operand rd = The register destination operand shamt = shift amount funct = function code 25 ## **Assembly Language** ### **Assembly Language** - Tied to the specifics of the underlying machine - Commands and names to make the code readable and writeable by humans - Hand-coded assembly code may be more efficient - E.g., IA32 from Intel ``` $0, %ecx .loop: cmpl $1, %edx .endloop jle addl $1, %ecx %edx, %eax movl $1, %eax andl jе .else %edx, %eax movl %eax, %edx addl %eax, %edx addl addl $1, %edx jmp .endif .else: sarl $1, %edx .endif: jmp .loop .endloop: ``` ## Reading IA32 Assembly Language - Assembler directives: starting with a period (".") - E.g., ".section .text" to start the text section of memory - E.g., ".loop" for the address of an instruction - Referring to a register: percent size ("%") E.g., "%ecx" or "%eip" - Referring to a constant: dollar sign ("\$") E.g., "\$1" for the number 1 - · Storing result: typically in the second argument - E.g. "addl \$1, %ecx" increments register ECX - E.g., "movl %edx, %eax" moves EDX to EAX - Comment: pound sign ("#") - E.g., "# Purpose: Convert lower to upper case" 28 - As stated earlier, a program written in any programming language is a set of logically related instructions. These instructions have two parts, as shown in the following figure: - The two parts of a programming language instruction are: - ,<u>Operation code (opcode</u>): This part instructs a computer about the operation to be performed. - ,<u>Operand:</u> This part instructs the computer about the location of the data on which the operation specified by the opcode is to be performed. For example, in the instruction Add A and B, Add is the opcode and A and B are operands NIIT Hai Phong Software Park 11 ## **Compiler and Assember** ### COMPILER VS LINKER VS LOADER | | COMPILER | LINKER | LOADER | |-----------|----------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------| | tra<br>co | A software that insforms computer ode written in one programming guage into another programming language | A computer utility program that takes one or more object files generated by a compiler and combines them into a single executable file | A part of an operating system that is responsible for loading programs to memory | | | Transforms the source code into object code | Combines multiple object code and links them with libraries | Prepares the executable file for running | ## **Logic Design** #### • Logic Gates #### AND Gate OR Gate #### NOT Gate XOR Gate ### **XOR GATE Truth Table** ProjectIoT123.com ... is equivalent to ... $$A \oplus B = A\overline{B} + \overline{A}B$$ #### NAND Gate [ teek Symbol] 0 • Half-Adder vs Full-Adder #### Half-Adder Figure-1 : a)Half Adder b)XOR implementation using NAND gates #### • Full Adder #### • 4 Bit Full Adder #### • Substract ## **Circuits as Memory** • SR Filp Folp Memo COA 1/29/23, 5:22 PM ## **Computer Orginization and Archetiture** Instruction Set Architecture - Interface of S/W and H/w b) Truth table #### Instruction Set Architecture: Critical Interface #### Properties of a good abstraction - Lasts through many generations (portability) - Used in many different ways (generality) - Provides convenient functionality to higher levels - Permits an efficient implementation at lower levels Computer Architecture- 8 ## The Instruction Set Architecture ISA #### **Instruction Set Architecture** - The computer ISA defines all the <u>programmer-visible</u> components and operations of the computer - Memory organization - address space -- how may locations can be addressed? - addressibility -- how many bits per location? - Register set - how many? what size? how are they used? - Instruction set - opcodes - data types - addressing modes - ISA provides all information needed for someone that wants to write a program in machine language (or translate from a high-level language to machine language). BYU CS 224 ISA 5 ## Instruction Set Architecture (ISA) - ISA is the interface provided by the hardware to the software - Defines the available: - instructions - registers - · addressing modes - · memory architecture - · interrupt and exception handling - · external I/O - Syntax defined by assembly language - Symbolic representation of the machine instructions - Examples: x86, ARM, HCS12 Mike Holenderski, m.holenderski@tue.nl #### **CICS vs RICS** ### CISC vs. RISC ## Complex Instruction Set Computer - Many instructions - e.g., 75-100 - Many instructions are macro-like - Simplifies programming - Most microcontrollers are based on CISC concept - e.g., PDP-11, VAX, Motorola 68k - PIC is an exception ## Reduced Instruction Set Computers - Few instructions - e.g., 30-40 - Smaller chip, smaller pin count, & very low-power consumption - Simple but fast instructions - Harvard architecture, instruction pipelining - Industry trend for microprocessor design - e.g., Intel Pentium, PIC From Computer Desktop Encyclopedia 3 1998 The Computer Language Co. Inc. ## **Memory Architecture** ## Classification of ISA (or) Types of ISA # ISA - MIPS Instrucstion Foramt and Addressing Model ## MIPS Design Paradigms - Simplicity favors regularity - all instructions single size - three register operands in arithmetic instr. - keep register fields in the same place - Smaller is faster - 32 registers - Make good compromises - large addresses and constants versus unique instruction length - Make the common case fast - PC-relative addressing for conditional branches #### **MIPS Organization So Far** ## The MIPS ISA - Register File - When writing assembly, these registers can be referenced by their address (number) or name - · General purpose and special purpose registers | # | Name | Purpose | # | Name | Purpose | |------|--------|--------------------------|------|------|--------------------------| | \$0 | \$zero | Constant zero | \$16 | \$s0 | Temporary - Callee-saved | | \$1 | \$at | Reserved for assembler | \$17 | \$s1 | | | \$2 | \$v0 | Function return value | \$18 | \$s2 | | | \$3 | \$v1 | | \$19 | \$s3 | | | \$4 | \$a0 | Function parameter | \$20 | \$s4 | | | \$5 | \$a1 | | \$21 | \$s5 | | | \$6 | \$a2 | | \$22 | \$s6 | | | \$7 | \$a3 | | \$23 | \$s7 | | | \$8 | \$t0 | Temporary – Caller-saved | \$24 | \$t8 | Temporary – Caller-saved | | \$9 | \$t1 | | \$25 | \$t9 | | | \$10 | \$t2 | | \$26 | \$k0 | Reserved for OS | | \$11 | \$t3 | | \$27 | \$k1 | | | \$12 | \$t4 | | \$28 | \$gp | Global pointer | | \$13 | \$t5 | | \$29 | \$sp | Stack pointer | | \$14 | \$t6 | | \$30 | \$fp | Frame pointer | | \$15 | \$t7 | | \$31 | \$ra | Function return address | #### **Instruction Format** op: basic operation of the instruction (opcode) rs: first source operand register rt: second source operand register rd: destination operand register shamt: shift amount funct: selects the specific variant of the opcode (function code) address: offset for load/store instructions (+/-215) immediate: constants for immediate instructions ### **Instruction Formats** - ❖ All instructions are 32-bit wide. Three instruction formats: - ❖ Register (R-Type) - ♦ Register-to-register instructions - ♦ Op: operation code specifies the format of the instruction - Immediate (I-Type) - ♦ 16-bit immediate constant is part in the instruction | Op <sup>6</sup> | Rs⁵ | Rt <sup>5</sup> | immediate <sup>16</sup> | |-----------------|-----|-----------------|-------------------------| |-----------------|-----|-----------------|-------------------------| - Jump (J-Type) - Used by jump instructions | | Op <sup>6</sup> | immediate <sup>26</sup> | | |---------------|-----------------|---------------------------------------------------------------|-------------------| | Instruction S | et Architecture | ICS 233 - Computer Architecture and Assembly Language - KFUPM | © Muhamed Mudawar | ### **Addressing Model** slide 9 #### **Example: MIPS Instruction Formats and Addressing Modes** · All instructions 32 bits wide ECE 361 3-45 ## 5.6 Addressing Modes Figure 5.11 Schematic representation of addressing modes in MiniMIPS. Computer Architecture, Instruction-Set Architecture Slide 22 ### **Instruction Categories** **ISA** - Performance ### Compiler Variations, MIPS, Performance: An Example (Continued) MIPS = Clock rate / (CPI x 10<sup>6</sup>) = 100 MHz / (CPI x 10<sup>6</sup>) CPI = CPU execution cycles / Instructions count $CPU \ clock \ cycles = \sum_{i=1}^{n} (CPI_i \times C_i)$ **CPU** time = Instruction count x **CPI** / Clock rate - · For compiler 1: - $CPI_1 = (5 \times 1 + 1 \times 2 + 1 \times 3) / (5 + 1 + 1) = 10 / 7 = 1.43$ - $\mathbf{MIP}_1 = 100 / (1.428 \times 10^6) = 70.0$ - CPU time<sub>1</sub> = $((5+1+1) \times 10^6 \times 1.43) / (100 \times 10^6) = 0.10$ seconds - · For compiler 2: - $CPI_2 = (10 \times 1 + 1 \times 2 + 1 \times 3) / (10 + 1 + 1) = 15 / 12 = 1.25$ - $MIP<sub>2</sub> = 100 / (1.25 \times 10^6) = 80.0$ - CPU time<sub>2</sub> = $((10+1+1) \times 10^6 \times 1.25) / (100 \times 10^6) = 0.15$ seconds EECC551 - Shaaban ### **Speed Up Equation for Pipelining** CPI<sub>pipelined</sub> = Ideal CPI + Average Stall cycles per Inst $$Speedup = \frac{Ideal\ \textit{CPI} \times Pipeline\ depth}{Ideal\ \textit{CPI} + Pipeline\ stall\ \textit{CPI}} \times \frac{\textit{Cycle}\ Time_{unpipelined}}{\textit{Cycle}\ Time_{pipelined}}$$ For simple RISC pipeline, CPI = 1: Speedup = $$\frac{\text{Pipeline depth}}{1 + \text{Pipeline stall CPI}} \times \frac{\text{Cycle Time}_{\text{unpipelined}}}{\text{Cycle Time}_{\text{pipelined}}}$$ CS211 41 ### **Processor - Von Neumann Architecture** # **Processor - Pipeline** # Pipelining Lessons - Pipelining doesn't help <u>latency</u> (execution time) of single task, it helps <u>throughput</u> of entire workload - <u>Multiple</u> tasks operating simultaneously using different resources - □ No to n tia tspeedup = Number of pipe stages - Time to "<u>fill</u>" pipeline and time to "<u>drain</u>" it reduces speedup - □ Pipeline rate limited by slowest pipeline stage - Unbalanced lengths of pipe stages also reduces speedup # The Fetch-Execute Cycle Control unit (2) Decode instruction (3) Get data Registers EXECUTION CYCLE (4) Execute the instruction Figure 5.3 The Fetch-Execute Cycle 17 © 2011 Jones and Bartlett Publishers, LLC (www.jbpub.com) # **Processer - Datapath** ## Review: MIPS data transfer instructions - For all cases, calculate effective address first - MIPS doesn't use segmented memory model like x86 - □ Flat memory model → EA = address being accessed - lb, lh, lw - Get data from addressed memory location - Sign extend if lb or lh, load into rt - Ibu, Ihu, Iwu - Get data from addressed memory location - Zero extend if lb or lh, load into rt - sb, sh, sw - Store data from rt (partial if sb or sh) into addressed location 10/30/2014 Computer Architecture Lecture 2 #### **MIPS Data Transfer Instructions** | Instruction | Comment | |----------------|---------------------------------------------------| | SW R3, 500(R4) | Store word | | SH R3, 502(R2) | Store half | | SB R2, 41(R3) | Store byte | | LW R1, 30(R2) | Load word | | LH R1, 40(R3) | Load half word | | LHU R1, 40(R3) | Load half word unsigned | | LB R1, 40(R3) | Load byte | | LBU R1, 40(R3) | Load byte unsigned | | LUI R1, 40 | Load Upper Immediate (16 bits shifted left by 16) | #### Why do we need LUI? #### **Pipeline Hazard** # Pipeline Hazards (1) - *Pipeline Hazards* are situations that prevent the next instruction in the instruction stream from executing in its designated clock cycle - Hazards reduce the performance from the ideal speedup gained by pipelining - Three types of hazards - Structural hazards - Arise from resource conflicts when the hardware can't support all possible combinations of overlapping instructions - Data hazards - Arise when an instruction depends on the results of a previous instruction in a way that is exposed by overlapping of instruction in pipeline - Control hazards - Arise from the pipelining of branches and other instructions that change the PC (Program Counter) # Pipeline Hazards (2) - Hazards in pipeline can make the pipeline to stall - Eliminating a hazard often requires that some instructions in the pipeline to be allowed to proceed while others are delayed - When an instruction is stalled, instructions issued *latter* than the stalled instruction are stopped, while the ones issued *earlier* must continue - No new instructions are fetched during the stall #### **Solution for Pipeline Hazards** # Structural Hazards (3) • Stall cycle added (commonly called pipeline *bubble*) #### Structural Hazard - Different instructions using the same resource at the same time - Register File: - \* Accessed in 2 stages: - Read during stage 2 (ID) - Write during stage 5 (WB) - \* Solution: 2 read ports, 1 write port - Memory XOR R10, R3, R11 - \* Accessed in 2 stages: - Instruction Fetch during stage 1 (IF) - Data read/write during stage 4 (MEM) - Solution: separate instruction cache and data cache - Each functional unit can only be used once per instruction - Each functional unit must be used at the same stage for all instructions IF ID R3 MEM RW #### **Data Hazard Solution:** #### • "Forward" result from one stage to another cs 152 L1 3.13 DAP Fa97, © U.CB # **Data Hazard Solution: Forwarding** · Key idea: connect data internally before it's stored Assumption: CSCE430/830 The register file forwards values that are read and written during the same cycle. Pipeline Hazards # Control Hazard - · Also known as branch hazard - · Pipeline makes wrong decision on branch prediction - Brings instructions into pipeline that must subsequently be discarded - · Dealing with Branches - Multiple Streams - Prefetch Branch Target - Loop buffer - Branch prediction - Delayed branching #### **Control Hazard on Branches** 2 Computer Structure 2015 – Pipeline #### **Summary - Control Hazard Solutions** - Stall stop fetching instr. until result is available - Significant performance penalty - Hardware required to stall - Predict assume an outcome and continue fetching (undo if prediction is wrong) - Performance penalty only when guess wrong - Hardware required to "squash" instructions - Delayed branch specify in architecture that following instruction is always executed - Compiler re-orders instructions into delay slot - Insert "NOP" (no-op) operations when can't use (~50%) - This is how original MIPS worked CSCE430/830 Pipeline Hazards #### **Control Hazard Review** #### The nub of the problem: - In what pipeline stage does the processor fetch the next instruction? - If that instruction is a conditional branch, when does the processor know whether the conditional branch is taken (execute code at the target address) or not taken (execute the sequential code)? - · What is the difference in cycles between them? #### The cost of stalling until you know whether to branch number of cycles in between \* branch frequency = the contribution to CPI due to branches Predict the branch outcome to avoid stalling Spring 2003 CSE P548 # **Mermory Hierarchy** # The Memory Hierarchy http://cse1.net/recaps/4-memory.html \_ #### **A Typical Memory Hierarchy** - By taking advantage of the principle of locality - Can present the user with as much memory as is available in the cheapest technology - at the speed offered by the fastest technology # Many Levels in Memory Hierarchy #### **Characteristics of the Memory Hierarchy** CPE432 Chapter 5A.7 Dr. W. Abu-Sufah, UJ # Block Size / Hit Rate / Miss Rate #### **MIPS Direct Mapped Cache Example** □ One word/block, cache size = 1K words What kind of locality are we taking advantage of? CEG3420 L13.34 Qiang Xu CUHK, Spring 2011 ## Miss Rate vs. Block Size | Block size 1K | | Cache size | | 256K | |---------------|------------------------------------------------------|--------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | | 4K | 16K | 64K | | | 15.05% | 8.57% | 3.94% | 2.04% | 1.09% | | | | 2.87% | 1.35% | 0.70% | | | | 2.64% | 1.06% | 0.51% | | | | 2.77% | 1.02% | 0.49% | | | | 3.29% | 1.15% | 0.49% | | | 1K<br>15.05%<br>13.34%<br>13.76%<br>16.64%<br>22.01% | 15.05% 8.57%<br>13.34% 7.24%<br>13.76% 7.00%<br>16.64% 7.78% | 1K 4K 16K 15.05% 8.57% 3.94% 13.34% 7.24% 2.87% 13.76% 7.00% 2.64% 16.64% 7.78% 2.77% | 1K 4K 16K 64K 15.05% 8.57% 3.94% 2.04% 13.34% 7.24% 2.87% 1.35% 13.76% 7.00% 2.64% 1.06% 16.64% 7.78% 2.77% 1.02% | FIGURE 5.12 Actual miss rate versus block size for five different-sized caches in Figure 5.11. Note that for a 1-KB cache, 64-byte, 128-byte, and 256-byte blocks have a higher miss rate than 32-byte blocks. In this example, the cache would have to be 256 KB in order for a 256-byte block to decrease misses. 2/15/99 CS520S99 Memory C. Edward Chow Page 34 #### **Block Size Tradeoffs** - · Larger block sizes... - Take advantage of spatial locality - Incur larger miss penalty since it takes longer to transfer the block into the cache - Can increase the average hit time and miss rate - Average Access Time (AMAT) = HitTime + MissPenalty\*MR Avg mem access time = Hit time $_{L1}$ + Miss rate $_{L1}$ × Miss penalty $_{L1}$ Miss penalty $_{L1}$ = Hit time $_{L2}$ + Miss rate $_{L2}$ × Miss penalty $_{L2}$ # **Handle Cache Miss** # Types of Cache Misses: The Three C's - 1 Compulsory: On the first access to a block; the block must be brought into the cache; also called cold start misses, or first reference misses. - 2 Capacity: Occur because blocks are being discarded from cache because cache cannot contain all blocks needed for program execution (program working set is much larger than cache capacity). - 3 Conflict: In the case of set associative or direct mapped block placement strategies, conflict misses occur when several blocks are mapped to the same set or block frame; also called collision misses or interference misses. EECC551 - Shaaban ## **Cache Performance** # **Cache Performance Equations** · Memory stalls per program (blocking cache): $$MemoryStallCycle = IC \times (\frac{MemoryAccesses}{Instruction}) \times MissRate \times MissPenalty$$ $$MemoryStallCycle = IC \times (\frac{Misses}{Instruction}) \times MissPenalty$$ CPU time formula: $$CPU\ Time = IC \times (CPI_{Execu} + \frac{Memory\ Stall\ Cycle}{Instruction}) \times Cycle\ Time$$ · More cache performance will be given later! 46 #### Cache Performance - CPI contributed by cache = CPIc = miss rate \* number of cycles to handle the miss - Another important metric Average memory access time = cache hit time \* hit rate + Miss penalty \* (1 - hit rate) Caches CSE 471 15 #### 2-level Cache Performance Equations - L1 AMAT = HitTimeL1 + MissRateL1 \* MissPenaltyL1 - MissLatencyL1 is low, so optimize HitTimeL1 - MissPenaltyL1 = HitTimeL2 + MissRateL2 \* MissPenaltyL2 - MissLatencyL2 is high, so optimize MissRateL2 - MissPenaltyL2 = DRAMaccessTime + (BlockSize/Bandwidth) - If DRAM time high or bandwidth high, use larger block size - · L2 miss rate: - Global: L2 misses / total CPU references - Local: L2 misses / CPU references that miss in L1 - The equation above assumes local miss rate **Improving Cache Performance** #### Miss Rate Reduction Techniques: - \* Increased cache capacity - \* Higher associativity - \* Hardware prefetching of instructions and data - \* Compiler-controlled prefetching - \* Larger block size - Victim caches - Pseudo-associative Caches - \* Compiler optimizations #### Cache Miss Penalty Reduction Techniques: - \* Giving priority to read misses over writes - \* Early restart and critical word first - \* Second-level cache (L2) - Sub-block placement - Non-blocking caches - Cache Hit Time Reduction Techniques: - \* Small and simple caches - \* Avoiding address translation during cache indexing - \* Pipelining writes for fast write hits EECC551 - Shaaban #7 Lec # 10 Winter2000 1-23-2000 # Qualitative Cache Performance Model #### Miss Types - · Compulsory ("Cold Start") Misses - First access to line not in cache - · Capacity Misses - Active portion of memory exceeds cache size - · Conflict Misses - Active portion of address space fits in cache, but too many lines map to same cache entry - Direct mapped and set associative placement only - · Coherence Misses - Block invalidated by multiprocessor cache coherence mechanism #### Hit Types - · Temporal locality hit - Accessing same word that previously accessed - · Spatial locality hit - Accessing word spatially near previously accessed word - 42 - \_\_\_\_\_ CS 740 F'07 = #### **Address Translation Mechanism** **Translating Logical Address into Physical Address** #### **Address Translation Mechanism** virtual page # offset Address from CPU (virtual address) **Main memory** physical page # Page (4KB) Page table register base address of physical page physical page # 1 1 0 1 OS programs this register (for example, CR3 in x86) Page Table Disk storage (in main memory) **Korea Univ** **Address Translation with Cache** # Virtual Address Translation The hardware converts each valid virtual address to a physical address # Address Translation Overview # **Memory Management** #### 國立青華大學 Address Binding – Compile Time ■ Program is written as symbolic code ■ Compiler translates symbolic code into absolute code ■ If starting location changes → recompile ■ Example: MS-DOS .COM format binary Compile int data; .BASE 0x1000 main() { PUSH **PUSH** data = 3 \* 7: MOVE AX 3 MOVE AX, 3 print(data); MULT AX, 7 MULT AX. 7 (0x1018), AX MOVE (0x1018), AX print, (0x1018) CALL print, (0x1018) POP POP END SPACE (4) **Memory Content** Disk Image Source Program Chapter8 Memory Management Operating System Concepts - NTHU LSA Lab 國立情華大學 Address Binding – Load Time Compiler translates symbolic code into relocatable code Relocatable code: > Machine language that can be run from any memory location ■ If starting location changes → reload the code Compile int data: START main() { **PUSH** data = 3 \* 7; MOVE AX. 3 0x2000 **PUSH** MULT AX, 7 MOVE AX, 3 print(data); (.BS+0x18), AX MULT CALL orint, (.BS+0x18) MOVE (0x2018), AX 0x2010 CALL print, (0x2018) END POP **Memory Content** Disk Image Source Program Chapter8 Memory Management Operating System Concepts - NTHU LSA Lab Address Binding – Execution Time 國立情華大學 Compiler translates symbolic code into logical-address (i.e. virtual-address) code Special hardware (i.e. MMU) is needed for this scheme ■ Most general-purpose OS use this method Compile int data: main() { **PUSH** AX, 3 PUSH data = 3 \* 7; MOVE AX, 3 Virtual addr. print(data); MOVE (0x18), AX MULT AX, 7 (0x18), AX print, (0x18) MOVE CALL print, (0x18) POP SPACE (4) Disk Image Physical addr. **Memory Content** Source Program # Storage Management ## **Disk Structure** #### DISK DRIVE STRUCTURE Operating System Concepts - 7th Edition, Jan 1, 2005 - Data stored on surfaces - Up to two surfaces per platter - · One or more platters per disk - Data in concentric tracks - Tracks broken into sectors 256B-1KB per sector - Cylinder: corresponding tracks on all surfaces - Data read and written by heads - · Actuator moves heads - · Heads move in unison Silberschatz, Galvin and Gagne # **Redundant Array of Inexpensive Disk** # RAID 5: Distributed Parity - 國立青華大學 - Read BW increases by N times, because all four disks can serve a read request - Write BW - Method1: (1)read out all unmodified (N-2) data bits. (2) re-compute parity bit. (3) write both modified bit and parity bit to disks. - $\rightarrow$ write BW = N / ((N-2)+2) = 1 $\rightarrow$ remains the same - Method2: (1) only read the parity bit and modified bit. (2) re-compute parity bit by the difference. (3) write both modified bit and parity bit. - $\rightarrow$ write BW = N / (2+2) = N/4 times faster # RAID 6: P+Q Dual Parity Redundancy - Like RAID 5, but stores extra redundant information to guard against multiple disk failure - Use ECE code (i.e. Error Correction Code) instead of single parity bit - Parity bits are also striped across disks 國立情華大學 Hybrid RAID ■ RAID 0+1: Stripe then replicate ■ RAID 1+0: Replicate then stripe **RAID 1+0 RAID 0+1** RAID 0 RAID 1 RAID 1 RAID 0 RAID 0 RAID 1 Disk 1 \*First level often control by a controller. Therefore, RAID 10 has better fault tolerance than RAID 01 when multiple disk fails http://www.thegeekstuff.com/2011/10/raid10-vs-raid01/ Chapter12 Mass Storage System Operating System Concepts - NTHULSA Lab - PCB(Process Control Block) # PROCESS CONTROL BLOCK (PCB) # - Scheduling ## Schedulers 國立青華大學 - Short-term scheduler (CPU scheduler)— selects which process should be executed and allocated CPU (Ready state → Run state) - Long-term scheduler (job scheduler) selects which processes should be loaded into memory and brought into the ready queue (New state → Ready state) - Medium-term scheduler selects which processes should be swapped in/out memory (Ready state → Wait state) ### Long-Term Scheduler 國立青華大學 - Control degree of multiprogramming - Execute less frequently (e.g. invoked only when a process leaves the system or once several minutes) - Select a good mix of CPU-bound & I/O-bound processes to increase system overall performance - UNIX/NT: no long-term scheduler - > Created process placed in memory for short-term scheduler - Multiprogramming degree is bounded by hardware limitation (e.g., # of terminals) or on the self-adjusting nature of users Chapter3 Processes Concept Operating System Concepts - NTHU LSA Lab ### Short-Term Scheduler - Execute quite frequently (e.g. once per 100ms) - Must be efficient: - > if 10 ms for picking a job, 100 ms for such a pick, - → overhead = 10 / 110 = 9% Chapter3 Processes Concept Operating System Concepts - NTHU LSA Lab 國立青華大學 ### Medium-Term Scheduler - swap out: removing processes from memory to reduce the degree of multiprogramming - swap in: reintroducing swap-out processes into memory - Purpose: improve process mix, free up memory - Most modern OS doesn't have medium-term scheduler because having sufficient physical memory or using | Differentiate between short term and long term scheduler. | | | | | | | |--------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------|--|--|--|--| | **Note: Any other relevant difference shall be considered**} | | | | | | | | Sr.<br>No | Short term scheduler | Long term scheduler | | | | | | 1 | It is a CPU scheduler | It is a job scheduler | | | | | | 2 | It selects processes from ready<br>queue which are ready to<br>execute and allocates CPU to<br>one of them. | It selects processes from job pool and loads them into memory for execution. | | | | | | 3 | Access ready queue and CPU. | Access job pool and ready queue | | | | | | 4 | It executes frequently. It executes when CPU is available for allocation. | 1 | | | | | | 5 | Speed is fast | Speed is less than short term scheduler | | | | | | 6 | It provides lesser control over degree of multiprogramming | It controls the degree of multiprogramming | | | | | ### - Context Switch ### **Context Switch** - Context Switch: Kernel saves the state of the old process and loads the saved state for the new process - Context-switch time is purely overhead - Switch time (about 1~1000 ms) depends on - > memory speed - > number of registers - > existence of special instructions - \* a single instruction to save/load all registers - > hardware support - multiple sets of registers (Sun UltraSPARC a context switch means changing register file pointer) Chapter3 Processes Concept Operating System Concepts - NTHU LSA Lab 11 ### KernelLand ### FCFS System Process SJF Foreground Process RR Background Process SRJ Student Process Lowest Priority created by Notes Jam ### CPU SCHEDULING CRITERIA - CPU Utilization: Percent of time that the CPU is busy executing a process. - Throughput: Number of processes executed per unit time. - Turnaround Time: The interval of time between submission of a process and its completion. - Waiting Time: The amount of time the process spends in the ready queue waiting for the CPU. - Response Time: The time between submission of requests and first response to the request. ### Round Robin(RR) - Each process is allotted a time slot(q). After this time has elapsed, the process is pre-empted and added to the end of the ready queue. - o Performance of the round robin algorithm - $\circ$ q large $\rightarrow$ FCFS - $q \text{ small } \rightarrow q \text{ must be greater than the context}$ switch time; otherwise, the overhead is too high ### CPU Scheduling (40 points) | Process | Burst Time | Priority | Arrival Time | |---------|------------|----------|--------------| | P1 | 12 | 3 | 0 | | P2 | 6 | 4 | 2 | | Р3 | 4 | 1 | 4 | | P4 | 18 | 2 | 6 | Table 1: Process Information. Consider the processes described in Table 1. Questions: What is the average waiting time of those processes for each of the following scheduling algorithms? (Draw a Gantt chart for each algorithm.) - (a) First Come First Serve (FCFS) - (b) Non-preemptive Shortest Job First (NP-SJF) - (c) Preemptive Shortest Job First (P-SJF) - (d) Priority Scheduling - (e) Round Robin, with the following assumptions: Assumption (1). The scheduling time quantum is 5 time units. Assumption (2). If a new process arrives at the same time as the time slice of the executing process expires, the OS puts the executing process in the ready queue, followed by the new process. **Question 3:** Consider the following set of processes, their arrival times, length of the CPU burst time given in milliseconds: | Process | Arrival Time | Burst Time | Priority | |---------|--------------|------------|----------| | P1 | 1 | 12 | 3 | | P2 | 2 | 3 | 1 | | Р3 | 2 | 16 | 4 | | P4 | 3 | 10 | 2 | | P5 | 7 | 1 | 1 | Let us schedule the execution of these processes using the following scheduling algorithms: First Come First Served (FCFS), Shortest Job First (SJF), Round Robin (RR), and a new type of scheduling algorithm called non-preemptive priority scheduling. The following are the assumptions: - 1) Larger priority number implies higher priority - 2) Assume the quantum (aka time slice) of 2 for RR algorithm - Non-Preemptive means once scheduled, a process cannot be preempted (i.e. it runs to completion). <u>3a [10 points]</u> What is the response (completion) time of each process for each of the scheduling algorithms. Write the times in the table below. | | FCFS | SJF | Priority | RR | |----|------|-----|----------|----| | P1 | | | | | | P2 | | | | | | P3 | | | | | | P4 | | | | | | P5 | | | | | **3b** [10 points] Which of the above methods results in the longest average wait time. Show calculations and explain. ### -Deadlock ### **Deadlock Characterization** Deadlock can arise if four conditions hold simultaneously. - Mutual exclusion: only one process at a time can use a resource - Hold and wait: a process holding at least one resource is waiting to acquire additional resources held by other processes - No preemption: a resource can be released only voluntarily by the process holding it, after that process has completed its task - Circular wait: there exists a set {P<sub>0</sub>, P<sub>1</sub>, ..., P<sub>n</sub>} of waiting processes such that P<sub>0</sub> is waiting for a resource that is held by P<sub>1</sub>, P<sub>1</sub> is waiting for a resource that is held by P<sub>2</sub>, ..., P<sub>n-1</sub> is waiting for a resource that is held by P<sub>n</sub>, and P<sub>n</sub> is waiting for a resource that is held by P<sub>0</sub>. 7.5 ### - Resource Allocation Graph ### **Resource-Allocation Graph: Definition** A set of vertices V and a set of edges E. - V is partitioned into two types: - $P = \{P_1, P_2, ..., P_n\}$ , the set consisting of all the processes in the system. - $R = \{R_1, R_2, ..., R_m\}$ , the set consisting of all resource types in the system. - □ request edge directed edge $P_1 \rightarrow R_i$ - □ assignment edge directed edge $R_i \rightarrow P_i$ The resource-allocation graph is therefore a bipartite directed graph. What would be the graph representing the bridge-crossing example? Operating System Concepts with Java – 7th Edition, Nov 15, 2006 7.8 ### Resource Allocation Graph Process · Resource type with 2 instances · Pi requests instance of Rj · An instance of Rj is allocated to Pi · A graph with a deadlock: Yair Amir Fall 00/ Lecture 4 ### Resource Allocation Graph (cont.) · A graph with a cycle but without a deadlock: - · If there are no cycles then there is no deadlock. - · If there is a cycle: - If there is only one instance per resource type then there is a deadlock. - If there is more than one instance for some resource type, there may or may not be a deadlock. Yair Amir Fall 00 / Lecture 4 ### Resource Allocation Graph With A Cycle But No Deadlock [6 pts] Fill in the resource matrix according to the graph above: | | Allocation | | Need | | | Available | | | | | | | |----|------------|----|------|----|----|-----------|----|----|----|----|----|----| | | R0 | R1 | R2 | R3 | R0 | R1 | R2 | R3 | R0 | R1 | R2 | R3 | | P0 | | | | | | | | | | | | | | P1 | | | | | | | | | | | | | | P2 | | | | | | | | | | | | | | P3 | | | | | | | | | | | | | ### Approaches to Deadlock Prevention | Condition | Approach | |------------------|---------------------------------| | Mutual exclusion | Spool everything | | Hold and wait | Request all resources initially | | No preemption | Take resources away | | Circular wait | Order resources numerically | Figure 6-14. Summary of approaches to deadlock prevention. Tanenbaum, Modern Operating Systems 3 e, (c) 2008 Prentice-Hall, Inc. All rights reserved. 0-13-6006639 ### **Deadlock Avoidance** - Deadlock avoidance is a technique used to avoid deadlock. - It requires information about how different processes would request different resources. - Safe state: if deadlock not occur then safe state. - Unsafe state: if deadlock occur then unsafe state. - The idea of avoiding a deadlock is simply not allow the system to enter an unsafe state the may cause a deadlock. deadlock sate ### 3.5 Deadlock Detection - If a system does not employ either a deadlock-prevention or a deadlock avoidance algorithm, then a deadlock situation may occur. - · In this environment, the system may provide: - An algorithm that examines the state of the system to determine whether a deadlock has occurred. - · An algorithm to recover from the deadlock. - Single Instance of Each Resource Type. - · Several Instances of a Resource Type. - · When should we invoke the detection algorithm? - The answer depends on two factors: - 1. How often is a deadlock likely to occur? - 2. How many processes will be affected by deadlock when it happens? 20 ### The Difference Between Deadlock Prevention and Deadlock Avoidance Deadlock Prevention: - Preventing deadlocks by constraining how requests for resources can be made in the system and how they are handled (system design). - The goal is to ensure that at least one of the necessary conditions for deadlock can never hold. - Deadlock Avoidance: - The system dynamically considers every request and decides whether it is safe to grant it at this point, - The system requires additional apriori information regarding the overall potential use of each resource for each process. - Allows more concurrency Similar to the difference between a traffic light and a police officer directing traffic. Yair Amir Fall 00 / Lecture 4 11 ### **Handling Deadlocks** ### 國立青華大學 - Ensure the system will *never* enter a deadlock state - deadlock prevention: ensure that at least one of the 4 necessary conditions cannot hold - deadlock avoidance: dynamically examines the resource-allocation state before allocation - Allow to enter a deadlock state and then recover - > deadlock detection - > deadlock recovery - Ignore the problem and pretend that deadlocks never occur in the system - used by most operating systems, including UNIX. ### Bankers' Algorithm ### The banker's algorithm - A state is safe iff there exists a sequence {P1..Pn} where each Pi is allocated all of its needed resources to be run to completion - ♦ i.e.: we can always run all the processes to completion from a safe state - The safety algorithm is the part that determines if a state is safe - Initialization: - ◆ all processes are said to be "unfinished" - ◆ set the work vector to the amount resources available: W(i) = V(i) for all i; 41 ### Banker's Algorithm - 1. Look for a new row in R which is smaller than A. If no such row exists the system will eventually deadlock → not safe. - 2. If such a row exists, the process may finish. mark that process (row) as terminate and add all of its resources to A. - 3. Repeat Steps 1 and 2 until all rows are marked → safe state If some are not marked $\rightarrow$ not safe. 30 ### **Example of Banker's Algorithm** | | Allocation<br>A B C | |----|---------------------| | P0 | 010 | | P1 | 200 | | P2 | 302 | | P3 | 211 | | P4 | 002 | | | Need<br>A B C | |----|---------------| | P0 | 7 4 3 | | P1 | 122 | | P2 | 600 | | P3 | 0 1 1 | | P4 | 4 3 1 | | Available<br>A B C | | |--------------------|--| | 332 | | Try to find a row in Need; that is <= Available. P1. run completion. Available becomes = [3 3 2] + [2 0 0] = [5 3 2] P3. run completion. Available becomes = [5 3 2] + [2 1 1] = [7 4 3] P4. run completion. Available becomes = [7 4 3] + [0 0 2] = [7 4 5] P2. run completion. Available becomes = [7 4 5] + [3 0 2] = [10 4 7] P0. run completion. Available becomes = [10 4 7] + [0 1 0] = [10 5 7] We found a sequence of execution: P1, P3, P4, P2, P0. State is safe 40 ### Banker's Algorithm for a single resource | | Has | Max | |--------------|-----|-----| | A | 0 | 6 | | В | 0 | 5 | | $\mathbf{C}$ | 0 | 4 | | D | 0 | 7 | | | Has | Max | |---|-----|-----| | A | 1 | 6 | | В | 1 | 5 | | С | 2 | 4 | | D | 4 | 7 | | | | | Free: 2 Free: 1 Any sequence finishes C,B,A,D finishes Deadlock (unsafe state) - Bankers' algorithm: before granting a request, ensure that a sequence exists that will allow all processes to complete - Use previous methods to find such a sequence - If a sequence exists, allow the requests - If there's no such sequence, deny the request - Can be slow: must be done on each request! - Using the banker's algorithm, determine whether the following state is unsafe based on the snapshot of the system below. If the state is safe, provide a safe sequence of execution. Otherwise explain why it is unsafe. Show your calculations for full credit. [10 points] | | <u>Allocation</u> | | | Max (Demand) | | | | Available = $(1, 0, 0, 2)$ | | |-------|-------------------|---|---|--------------|---|---|---|----------------------------|--| | | Α | В | C | D | Α | В | C | D | | | $P_0$ | 3 | 0 | 1 | 4 | 5 | 1 | 1 | 7 | | | $P_1$ | 2 | 2 | 1 | 0 | 3 | 2 | 1 | 1 | | | $P_2$ | 3 | 1 | 2 | 1 | 3 | 3 | 2 | 1 | | | $P_3$ | 0 | 5 | 1 | 0 | 4 | 6 | 1 | 2 | | | $P_4$ | 4 | 2 | 1 | 2 | 6 | 3 | 2 | 5 | | | | | | | | | | | | | ### Sample question (bankers algorithm) | | Allocation | Max | Need | Available | | |----|------------|---------|---------|-----------|--| | | | | | 3 2 2 1 | | | | ABCD | A B C D | A B C D | A B C D | | | P0 | 4 0 0 1 | 7 0 2 1 | | | | | P1 | 1 1 0 0 | 1 6 5 0 | | | | | P2 | 1 0 4 5 | 3 3 4 6 | | | | | P3 | 0 4 2 1 | 1 5 6 2 | | | | | P4 | 0 3 1 2 | 2 4 3 2 | | | | Using Bankers algorithm answer the following: - 1. How many resources of type A, B, C and D are there? - 2. What are the contents of the Need matrix? - 3. Is the system in a safe state? Provide reasoning for your answer (show the sequence in which the processes would finish) - If a request from process P2 arrives for additional resources of {0, 2, 0, 0}, can the Bankers algorithm grant the request immediately? Provide reasoning for your answer. (14 Marks) Q1. **Deadlocks.** The Banker's algorithm is used for deadlock avoidance. Consider the state of resource availability and allocation defined by the following matrices. Claim Matrix R1 R2 R3 P1 3 1 4 P2 6 1 3 P3 3 2 2 P4 4 2 2 | Allocation Matrix | | | | | | |-------------------|----|----|----|--|--| | | R1 | R2 | R3 | | | | P1 | 2 | 1 | 1 | | | | P2 | 5 | 1 | 1 | | | | P3 | 2 | 0 | 1 | | | | P4 | 0 | 0 | 2 | | | - (1) Assuming that the total amounts for resources R1, R2, and R3 are 10, 2, and 10, should a new request to the Banker's algorithm by process P3 to acquire one additional resource from R1 and one additional resource from R3 be approved or denied? Explain why or why not. - (2) Assuming that the total amounts for resources R1, R2, and R3 are 10, 2, and 10, should a new request to the Banker's algorithm by process P4 to acquire one additional resource from R3 be approved or denied? Explain why or why not. - (3) Assuming that the total amounts for resources R1 and R2 are 10 and 2, what is the minimum amount for resource R3 that would render the above state a safe state under the Banker's algorithm? - (4) Given your answer for part (3) what are all the possible orderings for the four processes P1, P2, P3, and P4 to complete their execution subject to the Banker's algorithm. - (5) Assuming that the total amounts for resources R1 and R2 are 10 and 2, what is the minimum amount for resource R3 that would make it possible for the Banker's algorithm to allow process P1 to complete its execution before all other three processes? ### The Banker's Algorithm - Idea: know what each process might ask for - Only make allocations that leave the system in a safe state - Inefficient Resource allocation state space ### **Process Create** ### Message-Passing System - Implementation of communication link - > physical (e.g., shared memory, HW bus, or network 國立青華大學 - ➤ logical (e.g., logical properties) - · Direct or indirect communication - Symmetric or asymmetric communication - · Blocking or non-blocking - · Automatic or explicit buffering - Send by copy or send by reference - Fixed-sized or variable-sized messages ### **Threads** ### **Thread Control Block** - Thread Control Block (TCB) is a data structure in the operating system kernel which contains thread-specific information needed to manage it. The TCB is "the manifestation of a thread in an operating system". - · An example of information contained within a TCB is: - · Stack pointer: Points to thread's stack in the process - · Program counter - · State of the thread (running, ready, waiting, start, done) - · Thread's register values - Pointer to the Process control block (PCB) of the process that the thread lives on The Thread Control Block acts as a library of information about the threads in a system. Specific information is stored in the thread control block highlighting important information about each thread. Figure 4.1 Threads and Processes [ANDE97] | Thread | Process | |-------------------------------------------------------------|-------------------------------------------------------------------------------------------| | It is lightweight entity. | It is heavy weight entity | | If a thread ends working process keep working. | Process may keep working and if a process terminates all its threads will terminate also. | | Communication b/w threads happens via memory. | Communication b/w Process happens via OS. | | The Creation of thread and context switching is inexpensive | The creation of the process is expensive. | ### Benefits of Multithreading - Responsiveness: allow a program to continue running even if part of it is blocked or is performing a lengthy operation - Resource sharing: several different threads of activity all within the same address space - Utilization of MP arch.: Several thread may be running in parallel on different processors - Economy: Allocating memory and resources for process creation is costly. In Solaris, creating a process is about 30 times slower than is creating a thread, and context switching is about five times slower. A register set switch is still required, but no memory-management related work is needed Chapter4 Multithreaded Operating System Concepts - NTHU LSA Lab ### 國立青華大學 國立唐華大學 ### Challenges in Multicore Programming - **Dividing activities:** divide program into concurrent tasks - Data splitting: divide data accessed and manipulated by the tasks - Data dependency: synchronize data access - Balance: evenly distribute tasks to cores - Testing and debugging Chapter4 Multithreaded Operating System Concepts - NTHU LSA Lab # User vs. Kernel Threads User threads – thread management done by user-level threads library POSIX Pthreads Win32 threads Java threads Java threads Kernel threads – supported by the kernel (OS) directly Windows 2000 (NT) Solaris Linux Tru64 UNIX Chapter4 Multithreaded Operating System Concepts – NTHULSA Lab ### User vs. Kernel Threads 國立青華大學 - User threads - Thread library provides support for thread creation, scheduling, and deletion - ➤ Generally fast to create and manage - ➤ If the kernel is single-threaded, a user-thread blocks → entire process blocks even if other threads are ready to run - Kernel threads - > The kernel performs thread creation, scheduling, etc. - ➤ Generally slower to create and manage - If a thread is blocked, the kernel can schedule another thread for execution Chapter4 Multithreaded Operating System Concepts – NTHU LSA Lab 10 ### Semantics of fork() and exec() 國立青華大學 - Does fork() duplicate only the calling thread or all threads? - ➤ Some UNIX system support two versions of fork() - execlp() works the same; replace the entire process - ➤ If exec() is called immediately after forking, then duplicating all threads is unnecessary ### Signal Handling - Signals (synchronous or asynchronous) are used in UN systems to notify a process that an event has occurred - > Synchronous: illegal memory access - > Asynchronous: <control-C> - A signal handler is used to process signals - 1. Signal is generated by particular event - 2. Signal is delivered to a process - 3. Signal is handled - Options - > Deliver the signal to the thread to which the signal applies - > Deliver the signal to every thread in the process - > Deliver the signal to certain threads in the process - Assign a specific thread to receive all signals for the process Chapter4 Multithreaded Operating System Concepts - NTHULSA Lab 28 ### **Thread Pools** - Create a number of threads in a pool where they await work - Advantages - > Usually slightly faster to service a request with an existing thread than create a new thread - > Allows the number of threads in the application(s) to be bound to the size of the pool - # of threads: # of CPUs, expected # of requests, amount of physical memory Chapter4 Multithreaded Operating System Concepts - NTHU LSA Lab 29 ### I/O System Mangement ### I/O Hardware - Computers support a wide variety of I/O devices, but common concepts apply to all: - Port: connection point for a device - Bus: set of wires that connects to many devices, with a protocol specifying how information can be transmitted over the wires - Controller: a chip (or part of a chip) that operates a port, a bus, or a device - How can the processor communicate with a device? - Special instructions allow the processor to transfer data and commands to registers on the device controller - The device controller may support memory-mapped I/O: - The same address bus is used for both memory and device access. - The same CPU instructions can access memory locations or devices. ## ■ Depending on how to address a device: Depending on how to address a device: Port-mapped I/O Use different address space from memory Access by special I/O instruction (e.g. IN, OUT) Memory-mapped I/O Reserve specific memory space for device Access by standard data-transfer instruction (e.g. MOV) More efficient for large memory I/O (e.g. graphic card) Vulnerable to accidental modification, error ### **IO System Performance** ### Common Concepts in I/O Hardware - ► Common concepts: signals from I/O devices interface with computer. - Port: connection point for device - Bus: set of wires and a protocol that specifies a set of messages that can be sent on the wires. - Controller: a collection of electronics that can operate a port, a bus, or a device. - Sometimes integrated and sometimes separate circuit board (host adapter) - Contains processor, microcode, private memory, bus controller, etc Amir H. Payberah (Tehran Polytechnic) I/O Systems 1393/9/15 6 / 57 ### **DMA(Direct Memroy Access)** -- Memo End --