8086 Coprocessor and Advanced Features

Explore the advanced capabilities of 8086 microprocessor including coprocessor interface, 8087 floating-point unit integration, system programming features, and performance optimization techniques.

Coprocessor Concept and Architecture

A coprocessor is a specialized processor designed to work alongside the main CPU to handle specific tasks more efficiently. The 8086 was designed with built-in coprocessor support.

Loading diagram...

Coprocessor Interface Features:

  • Transparent Operation: Coprocessor instructions appear in normal instruction stream
  • Automatic Detection: CPU automatically routes instructions to appropriate processor
  • Shared Memory: Both processors access same memory space
  • Synchronization: Hardware ensures proper coordination
  • Exception Handling: Coprocessor can signal errors to main CPU

Coprocessor Interface Signals:


Signal Lines for Coprocessor Interface:
- QS0, QS1: Queue Status signals (inform coprocessor of instruction fetching)
- TEST: Input to CPU (coprocessor can signal completion)
- BUSY: Coprocessor busy indicator
- ERROR: Coprocessor error signal
- RQ/GT: Request/Grant for bus mastership (max mode only)

8087 Floating Point Unit (FPU)

The 8087 is the most common coprocessor for 8086, providing hardware floating-point arithmetic capabilities.

8087 Specifications:


Data Types Supported:
- Word Integer: 16-bit signed integer
- Short Integer: 32-bit signed integer  
- Long Integer: 64-bit signed integer
- Short Real: 32-bit floating point
- Long Real: 64-bit floating point
- Temporary Real: 80-bit extended precision

Register Set:
- 8 Data Registers: ST(0) to ST(7) - 80-bit each
- Control Register: 16-bit
- Status Register: 16-bit  
- Tag Register: 16-bit
- Instruction Pointer: 16-bit
- Data Pointer: 16-bit

8087 Architecture:

Loading diagram...

Floating Point Data Formats:


IEEE 754 Floating Point Formats:

Short Real (32-bit):
[S][8-bit Exponent][23-bit Mantissa]
Range: ±1.18×10⁻³⁸ to ±3.40×10³⁸

Long Real (64-bit):  
[S][11-bit Exponent][52-bit Mantissa]
Range: ±2.23×10⁻³⁰⁸ to ±1.80×10³⁰⁸

Temporary Real (80-bit):
[S][15-bit Exponent][64-bit Mantissa]
Range: ±3.37×10⁻⁴⁹³² to ±1.18×10⁴⁹³²

S = Sign bit (0=positive, 1=negative)

8087 Programming Model

Register Stack Operation

The 8087 uses a stack-based architecture with 8 registers organized as a circular stack.


Stack Pointer (TOP field in Status Register):
- Points to current top of stack
- Automatically updated during push/pop operations
- Stack grows downward (TOP decrements on push)

Register Naming:
ST(0) = Top of stack (most recent data)
ST(1) = Second from top
...
ST(7) = Bottom of accessible stack

Example Stack Operations:
FLD REAL_VAR    ; Push REAL_VAR onto stack (becomes ST(0))
FADD ST(1)      ; Add ST(1) to ST(0), result in ST(0)
FSTP RESULT     ; Pop ST(0) to RESULT variable

Basic 8087 Instructions


Data Transfer Instructions:
FLD source      ; Load floating point value onto stack
FST dest        ; Store ST(0) to destination
FSTP dest       ; Store and pop ST(0) to destination
FILD source     ; Load integer and convert to float
FIST dest       ; Store float as integer
FISTP dest      ; Store as integer and pop

Arithmetic Instructions:
FADD            ; Add ST(1) to ST(0)
FSUB            ; Subtract ST(1) from ST(0)  
FMUL            ; Multiply ST(0) by ST(1)
FDIV            ; Divide ST(0) by ST(1)
FSQRT           ; Square root of ST(0)
FABS            ; Absolute value of ST(0)
FCHS            ; Change sign of ST(0)

Comparison Instructions:
FCOM            ; Compare ST(0) with ST(1)
FCOMP           ; Compare and pop
FTST            ; Compare ST(0) with 0.0
FXAM            ; Examine ST(0) and set status

Programming Example:


; Calculate: result = (a * b) + (c / d)
; Where a, b, c, d are floating point variables

.DATA
a       DD 3.14159      ; Short real (32-bit)
b       DD 2.71828      ; Short real  
c       DD 10.0         ; Short real
d       DD 3.0          ; Short real
result  DD ?            ; Result storage

.CODE
FLD a           ; Load a onto stack: ST(0)=a
FMUL b          ; Multiply by b: ST(0)=a*b
FLD c           ; Load c: ST(0)=c, ST(1)=a*b
FDIV d          ; Divide by d: ST(0)=c/d, ST(1)=a*b
FADD            ; Add: ST(0)=(a*b)+(c/d)
FSTP result     ; Store result and pop stack

Coprocessor Synchronization

Instruction Synchronization

The 8086 and 8087 must be properly synchronized to ensure correct program execution.


Synchronization Mechanisms:

1. Queue Status (QS0, QS1):
   - 8086 informs 8087 about instruction queue operations
   - 8087 tracks instruction stream to identify its instructions

2. WAIT Instruction:
   - CPU waits for coprocessor to complete operation
   - Tests TEST pin until it goes active (low)
   - Automatically inserted by assembler for certain operations

3. FWAIT Instruction:
   - Explicit wait for coprocessor
   - Same as WAIT but specifically for floating point

Example with synchronization:
FLD a           ; Load a (8087 starts operation)
FWAIT           ; Wait for load to complete
FADD b          ; Add b (requires previous operation complete)
FWAIT           ; Wait for add to complete
FSTP result     ; Store result

Exception Handling


8087 Exception Types:
1. Invalid Operation (e.g., sqrt of negative number)
2. Denormalized Operand
3. Divide by Zero
4. Overflow (result too large)
5. Underflow (result too small)  
6. Inexact Result (rounding occurred)

Exception Response:
- Set appropriate bit in Status Register
- If exception is unmasked, assert ERROR signal
- 8086 can read status and handle exception

Control Word Exception Masks:
Bit 0: Invalid Operation Mask
Bit 1: Denormalized Operand Mask  
Bit 2: Zero Divide Mask
Bit 3: Overflow Mask
Bit 4: Underflow Mask
Bit 5: Precision Mask

Advanced 8086 Features

String Processing Capabilities

8086 provides powerful string processing instructions that work with large blocks of data.


String Instructions:
MOVS    ; Move string (byte or word)
CMPS    ; Compare strings
SCAS    ; Scan string for value
LODS    ; Load string element
STOS    ; Store string element

Repeat Prefixes:
REP     ; Repeat CX times
REPE    ; Repeat while equal (and CX > 0)
REPNE   ; Repeat while not equal (and CX > 0)

High-Performance String Operations:
; Copy 1000 words from source to destination
MOV SI, OFFSET SOURCE
MOV DI, OFFSET DEST  
MOV CX, 1000
CLD                 ; Clear direction flag (forward)
REP MOVSW          ; Repeat move word

Performance: 17 cycles per word vs 25+ cycles for manual loop

Interrupt System Enhancement


Advanced Interrupt Features:

1. Interrupt Vector Table (IVT):
   - 256 interrupt vectors (0-255)
   - Each vector: 4 bytes (IP:CS)
   - Located at memory 0000:0000 to 0000:03FF

2. Interrupt Types:
   - Hardware Interrupts (INTR, NMI)
   - Software Interrupts (INT instruction)
   - Exception Interrupts (divide by zero, etc.)

3. Interrupt Priority:
   Priority (highest to lowest):
   1. Divide Error, INTO, INT instructions
   2. NMI (Non-Maskable Interrupt)
   3. INTR (Maskable Hardware Interrupt)
   4. Single Step (Trap Flag)

4. Interrupt Service Routine Template:
ISR_TEMPLATE PROC
    PUSH AX         ; Save registers
    PUSH BX
    PUSH CX
    ; ... save other registers
    
    ; Interrupt service code here
    
    POP CX          ; Restore registers (reverse order)
    POP BX
    POP AX
    IRET            ; Return from interrupt
ISR_TEMPLATE ENDP

Memory Management Features

Loading diagram...

Bus Interface Features


Advanced Bus Features:

1. Minimum vs Maximum Mode:
   Minimum Mode (MN/MX = 1):
   - Single processor system
   - Simple control signals
   - Direct memory and I/O control

   Maximum Mode (MN/MX = 0):
   - Multiprocessor capable
   - Bus arbitration support
   - Coprocessor interface

2. Bus Arbitration (Maximum Mode):
   - RQ/GT0, RQ/GT1: Request/Grant lines
   - LOCK: Bus lock for atomic operations
   - Support for multiple processors

3. Queue Status Output:
   - QS0, QS1 inform external devices about instruction queue
   - Used by coprocessors and debugging tools

Performance Optimization Techniques

Coprocessor Performance Benefits

Performance Comparison: Floating point operations with and without 8087.


Operation Performance (8MHz system):

32-bit Floating Point Addition:
Software (8086 only): ~1000 cycles = 125 μs
Hardware (8087): ~70 cycles = 8.75 μs
Speedup: 1000/70 = 14.3×

32-bit Floating Point Multiplication:
Software (8086 only): ~1600 cycles = 200 μs  
Hardware (8087): ~90 cycles = 11.25 μs
Speedup: 1600/90 = 17.8×

32-bit Floating Point Division:
Software (8086 only): ~2400 cycles = 300 μs
Hardware (8087): ~145 cycles = 18.1 μs
Speedup: 2400/145 = 16.6×

Transcendental Functions (sin, cos, log):
Software (8086 only): ~10,000+ cycles
Hardware (8087): ~300-500 cycles  
Speedup: 20-30×

Memory Access Optimization


Optimization Techniques:

1. Instruction Prefetch Queue:
   - 6-byte prefetch queue reduces memory access
   - Sequential instructions execute faster
   - Branch instructions may cause queue flush

2. Memory Organization:
   - Place code in fast memory
   - Organize data for sequential access
   - Minimize segment changes

3. Register Usage:
   - Keep frequently used data in registers
   - Minimize memory-to-memory operations
   - Use string instructions for block operations

Example Optimization:
; Inefficient: Multiple memory accesses
ADD [RESULT], [VAR1]
ADD [RESULT], [VAR2]
ADD [RESULT], [VAR3]

; Efficient: Use accumulator
MOV AX, [VAR1]
ADD AX, [VAR2]  
ADD AX, [VAR3]
MOV [RESULT], AX

System-Level Programming


Advanced System Programming Techniques:

1. Custom Interrupt Handlers:
   ; Install custom interrupt handler
   CLI                 ; Disable interrupts
   MOV AX, 0          ; Segment 0 (IVT)
   MOV ES, AX
   MOV BX, 21H*4      ; INT 21H vector offset
   MOV WORD PTR ES:[BX], OFFSET MY_HANDLER
   MOV WORD PTR ES:[BX+2], SEG MY_HANDLER
   STI                ; Re-enable interrupts

2. Memory-Mapped I/O:
   ; Access I/O device through memory mapping
   MOV AX, 0B800H     ; Video memory segment
   MOV ES, AX
   MOV DI, 0          ; Offset 0
   MOV AL, 'A'        ; Character to display
   MOV ES:[DI], AL    ; Write to video memory

3. DMA Programming:
   ; Set up DMA transfer (requires external DMA controller)
   MOV AL, 04H        ; Channel 0, single transfer
   OUT 0AH, AL        ; DMA mask register
   ; Set up address and count registers
   ; Enable DMA transfer

Numerical Problems and Applications

Problem 1: Coprocessor Performance Analysis

Question: Calculate time savings when using 8087 for a program that performs 1000 floating-point operations.

Solution:


Given:
- 1000 operations: 250 additions, 250 multiplications, 250 divisions, 250 square roots
- 8086 @ 8MHz (125ns per cycle)

Software Implementation (8086 only):
Additions: 250 × 1000 cycles = 250,000 cycles
Multiplications: 250 × 1600 cycles = 400,000 cycles  
Divisions: 250 × 2400 cycles = 600,000 cycles
Square roots: 250 × 5000 cycles = 1,250,000 cycles
Total: 2,500,000 cycles = 312.5 ms

Hardware Implementation (8087):
Additions: 250 × 70 cycles = 17,500 cycles
Multiplications: 250 × 90 cycles = 22,500 cycles
Divisions: 250 × 145 cycles = 36,250 cycles  
Square roots: 250 × 180 cycles = 45,000 cycles
Total: 121,250 cycles = 15.16 ms

Time Saved: 312.5 - 15.16 = 297.34 ms (95.2% reduction)
Speedup: 312.5 / 15.16 = 20.6×

Problem 2: Memory Layout for Coprocessor System

Question: Design memory layout for a system with 8086 + 8087 handling scientific calculations.

Solution:


Memory Layout Design:

0000:0000 - 0000:03FF  Interrupt Vector Table (1KB)
0000:0400 - 0000:04FF  BIOS Data Area (256 bytes)
0000:0500 - 9FFF:FFFF  Available RAM (~640KB)

Recommended Organization:
2000:0000 - 2FFF:FFFF  Program Code (64KB)
3000:0000 - 3FFF:FFFF  Floating Point Data (64KB) 
4000:0000 - 4FFF:FFFF  Integer Data (64KB)
5000:0000 - 5FFF:FFFF  Stack Space (64KB)
6000:0000 - 9FFF:FFFF  Dynamic Memory/Buffers (~256KB)

Segment Register Setup:
CS = 2000H    ; Code segment
DS = 3000H    ; Floating point data
ES = 4000H    ; Integer data  
SS = 5000H    ; Stack segment

Benefits:
- Clear separation of data types
- Optimal for 8087 operations
- Good memory utilization
- Easy debugging and maintenance

Problem 3: Interrupt Latency Calculation

Question: Calculate worst-case interrupt latency for 8086 system.

Solution:


Interrupt Latency Components:

1. Current Instruction Completion:
   Worst case: String instruction with REP prefix
   Example: REP MOVSW with CX=1000
   Time: 1000 × 17 cycles = 17,000 cycles

2. Interrupt Acknowledge Sequence:
   - Save flags: 1 cycle
   - Clear IF flag: 1 cycle  
   - Save CS: 2 cycles
   - Save IP: 2 cycles
   - Jump to ISR: 4 cycles
   Total: 10 cycles

3. ISR Entry Overhead:
   - PUSH registers: ~8 cycles
   - Setup: ~5 cycles
   Total: 13 cycles

Worst Case Total:
17,000 + 10 + 13 = 17,023 cycles

At 8MHz: 17,023 × 125ns = 2.13 ms

Best Case (simple instruction):
2 + 10 + 13 = 25 cycles = 3.125 μs

Answer: Latency range 3.125 μs to 2.13 ms

Practical Applications and Examples

Scientific Calculator Implementation


; Simple scientific calculator using 8087
.DATA
num1        DD ?        ; First operand
num2        DD ?        ; Second operand  
result      DD ?        ; Result
operation   DB ?        ; Operation code

.CODE
CALCULATOR PROC
    ; Get first number
    CALL INPUT_FLOAT    ; Returns number in ST(0)
    FSTP num1          ; Store first number
    
    ; Get operation
    CALL INPUT_OPERATION ; Returns operation in AL
    MOV operation, AL
    
    ; Get second number
    CALL INPUT_FLOAT    ; Returns number in ST(0)
    FSTP num2          ; Store second number
    
    ; Perform calculation
    FLD num1            ; Load first number
    FLD num2            ; Load second number
    
    CMP operation, '+'
    JE DO_ADD
    CMP operation, '-'  
    JE DO_SUB
    CMP operation, '*'
    JE DO_MUL
    CMP operation, '/'
    JE DO_DIV
    CMP operation, 's'  ; Square root
    JE DO_SQRT
    
DO_ADD:
    FADD                ; ST(0) = num1 + num2
    JMP STORE_RESULT
    
DO_SUB:
    FSUB                ; ST(0) = num1 - num2  
    JMP STORE_RESULT
    
DO_MUL:
    FMUL                ; ST(0) = num1 * num2
    JMP STORE_RESULT
    
DO_DIV:
    FDIV                ; ST(0) = num1 / num2
    JMP STORE_RESULT
    
DO_SQRT:
    FSTP ST(1)          ; Remove num2, keep num1
    FSQRT               ; ST(0) = sqrt(num1)
    
STORE_RESULT:
    FSTP result         ; Store result
    
    ; Display result
    CALL DISPLAY_FLOAT
    
    RET
CALCULATOR ENDP

Matrix Operations with Coprocessor


; Matrix multiplication using 8087
MATRIX_MULTIPLY PROC
    ; Multiply two 3x3 matrices: C = A × B
    PUSHA               ; Save all registers
    
    MOV SI, OFFSET MATRIX_A
    MOV DI, OFFSET MATRIX_B
    MOV BX, OFFSET MATRIX_C
    
    MOV ROW, 0          ; Initialize row counter
    
ROW_LOOP:
    MOV COL, 0          ; Initialize column counter
    
COL_LOOP:
    FLDZ                ; Initialize sum to 0
    MOV K, 0            ; Initialize inner loop counter
    
INNER_LOOP:
    ; Calculate A[row][k] * B[k][col]
    MOV AX, ROW
    MOV CX, 3           ; 3 columns per row
    MUL CX              ; AX = row * 3
    ADD AX, K           ; AX = row * 3 + k
    SHL AX, 2           ; AX = (row * 3 + k) * 4 (float size)
    FLD DWORD PTR [SI + AX]    ; Load A[row][k]
    
    MOV AX, K
    MOV CX, 3
    MUL CX              ; AX = k * 3
    ADD AX, COL         ; AX = k * 3 + col
    SHL AX, 2           ; AX = (k * 3 + col) * 4
    FMUL DWORD PTR [DI + AX]   ; Multiply by B[k][col]
    
    FADD                ; Add to sum
    
    INC K
    CMP K, 3
    JB INNER_LOOP
    
    ; Store result C[row][col]
    MOV AX, ROW
    MOV CX, 3
    MUL CX
    ADD AX, COL
    SHL AX, 2
    FSTP DWORD PTR [BX + AX]
    
    INC COL
    CMP COL, 3
    JB COL_LOOP
    
    INC ROW  
    CMP ROW, 3
    JB ROW_LOOP
    
    POPA                ; Restore registers
    RET
MATRIX_MULTIPLY ENDP

Debugging and Development Tools

Coprocessor State Monitoring


; Procedure to display 8087 status for debugging
DISPLAY_8087_STATUS PROC
    ; Read 8087 status word
    FNSTSW AX           ; Store status word to AX (no wait)
    
    ; Check condition codes
    TEST AX, 4000H      ; Test C3
    JZ C3_CLEAR
    ; C3 is set
    CALL DISPLAY_C3_SET
    JMP CHECK_C2
C3_CLEAR:
    CALL DISPLAY_C3_CLEAR
    
CHECK_C2:
    TEST AX, 0400H      ; Test C2
    JZ C2_CLEAR
    ; C2 is set
    CALL DISPLAY_C2_SET
    JMP CHECK_C1
C2_CLEAR:
    CALL DISPLAY_C2_CLEAR
    
    ; Similar checks for C1 and C0...
    
    ; Check exception flags
    TEST AX, 0001H      ; Invalid operation
    JNZ INVALID_OP_ERROR
    
    TEST AX, 0004H      ; Zero divide
    JNZ ZERO_DIV_ERROR
    
    ; Display stack top pointer
    MOV CX, AX
    AND CX, 3800H       ; Mask TOP field
    SHR CX, 11          ; Shift to get TOP value
    ; Display TOP value...
    
    RET
DISPLAY_8087_STATUS ENDP

; Procedure to save complete 8087 state
SAVE_8087_STATE PROC
    FNSAVE COPROCESSOR_STATE    ; Save entire 8087 state
    RET
SAVE_8087_STATE ENDP

; Procedure to restore 8087 state  
RESTORE_8087_STATE PROC
    FRSTOR COPROCESSOR_STATE    ; Restore 8087 state
    RET
RESTORE_8087_STATE ENDP

Performance Profiling


; Simple performance measurement using timer
MEASURE_PERFORMANCE PROC
    ; Save current timer value
    MOV AH, 00H         ; Get system time
    INT 1AH             ; BIOS timer interrupt
    MOV START_TIME_LOW, DX
    MOV START_TIME_HIGH, CX
    
    ; Execute code to measure
    CALL FUNCTION_TO_MEASURE
    
    ; Get end time
    MOV AH, 00H
    INT 1AH
    MOV END_TIME_LOW, DX
    MOV END_TIME_HIGH, CX
    
    ; Calculate elapsed time
    SUB DX, START_TIME_LOW
    SBB CX, START_TIME_HIGH
    
    ; Convert to microseconds (timer ticks at 18.2 Hz)
    ; Each tick = 54945 microseconds
    ; ... conversion code ...
    
    ; Display elapsed time
    CALL DISPLAY_TIME
    
    RET
MEASURE_PERFORMANCE ENDP

Summary

The 8086's coprocessor interface and advanced features significantly expanded its capabilities beyond basic integer processing. The 8087 FPU partnership created a powerful platform for scientific and engineering applications, while advanced features like string processing and sophisticated interrupt handling made it suitable for system-level programming.

Key Advanced Features:

  • Transparent coprocessor interface enabling specialized processing units
  • 8087 FPU providing IEEE 754 floating-point arithmetic
  • Advanced string processing with repeat prefixes
  • Comprehensive interrupt system with 256 vectors
  • Flexible memory management through segmentation
  • Bus arbitration support for multiprocessor systems
  • Performance optimization through instruction prefetch

Impact on Computing:

  • Enabled practical floating-point computation on microprocessors
  • Established coprocessor architecture pattern
  • Provided foundation for modern x86 advanced features
  • Made microprocessors viable for scientific applications
  • Demonstrated importance of specialized processing units
Loading diagram...

Suggetested Articles