8086 Coprocessor and Advanced Features
Explore the advanced capabilities of 8086 microprocessor including coprocessor interface, 8087 floating-point unit integration, system programming features, and performance optimization techniques.
Coprocessor Concept and Architecture
A coprocessor is a specialized processor designed to work alongside the main CPU to handle specific tasks more efficiently. The 8086 was designed with built-in coprocessor support.
Coprocessor Interface Features:
- Transparent Operation: Coprocessor instructions appear in normal instruction stream
- Automatic Detection: CPU automatically routes instructions to appropriate processor
- Shared Memory: Both processors access same memory space
- Synchronization: Hardware ensures proper coordination
- Exception Handling: Coprocessor can signal errors to main CPU
Coprocessor Interface Signals:
Signal Lines for Coprocessor Interface:
- QS0, QS1: Queue Status signals (inform coprocessor of instruction fetching)
- TEST: Input to CPU (coprocessor can signal completion)
- BUSY: Coprocessor busy indicator
- ERROR: Coprocessor error signal
- RQ/GT: Request/Grant for bus mastership (max mode only)
8087 Floating Point Unit (FPU)
The 8087 is the most common coprocessor for 8086, providing hardware floating-point arithmetic capabilities.
8087 Specifications:
Data Types Supported:
- Word Integer: 16-bit signed integer
- Short Integer: 32-bit signed integer
- Long Integer: 64-bit signed integer
- Short Real: 32-bit floating point
- Long Real: 64-bit floating point
- Temporary Real: 80-bit extended precision
Register Set:
- 8 Data Registers: ST(0) to ST(7) - 80-bit each
- Control Register: 16-bit
- Status Register: 16-bit
- Tag Register: 16-bit
- Instruction Pointer: 16-bit
- Data Pointer: 16-bit
8087 Architecture:
Floating Point Data Formats:
IEEE 754 Floating Point Formats:
Short Real (32-bit):
[S][8-bit Exponent][23-bit Mantissa]
Range: ±1.18×10⁻³⁸ to ±3.40×10³⁸
Long Real (64-bit):
[S][11-bit Exponent][52-bit Mantissa]
Range: ±2.23×10⁻³⁰⁸ to ±1.80×10³⁰⁸
Temporary Real (80-bit):
[S][15-bit Exponent][64-bit Mantissa]
Range: ±3.37×10⁻⁴⁹³² to ±1.18×10⁴⁹³²
S = Sign bit (0=positive, 1=negative)
8087 Programming Model
Register Stack Operation
The 8087 uses a stack-based architecture with 8 registers organized as a circular stack.
Stack Pointer (TOP field in Status Register):
- Points to current top of stack
- Automatically updated during push/pop operations
- Stack grows downward (TOP decrements on push)
Register Naming:
ST(0) = Top of stack (most recent data)
ST(1) = Second from top
...
ST(7) = Bottom of accessible stack
Example Stack Operations:
FLD REAL_VAR ; Push REAL_VAR onto stack (becomes ST(0))
FADD ST(1) ; Add ST(1) to ST(0), result in ST(0)
FSTP RESULT ; Pop ST(0) to RESULT variable
Basic 8087 Instructions
Data Transfer Instructions:
FLD source ; Load floating point value onto stack
FST dest ; Store ST(0) to destination
FSTP dest ; Store and pop ST(0) to destination
FILD source ; Load integer and convert to float
FIST dest ; Store float as integer
FISTP dest ; Store as integer and pop
Arithmetic Instructions:
FADD ; Add ST(1) to ST(0)
FSUB ; Subtract ST(1) from ST(0)
FMUL ; Multiply ST(0) by ST(1)
FDIV ; Divide ST(0) by ST(1)
FSQRT ; Square root of ST(0)
FABS ; Absolute value of ST(0)
FCHS ; Change sign of ST(0)
Comparison Instructions:
FCOM ; Compare ST(0) with ST(1)
FCOMP ; Compare and pop
FTST ; Compare ST(0) with 0.0
FXAM ; Examine ST(0) and set status
Programming Example:
; Calculate: result = (a * b) + (c / d)
; Where a, b, c, d are floating point variables
.DATA
a DD 3.14159 ; Short real (32-bit)
b DD 2.71828 ; Short real
c DD 10.0 ; Short real
d DD 3.0 ; Short real
result DD ? ; Result storage
.CODE
FLD a ; Load a onto stack: ST(0)=a
FMUL b ; Multiply by b: ST(0)=a*b
FLD c ; Load c: ST(0)=c, ST(1)=a*b
FDIV d ; Divide by d: ST(0)=c/d, ST(1)=a*b
FADD ; Add: ST(0)=(a*b)+(c/d)
FSTP result ; Store result and pop stack
Coprocessor Synchronization
Instruction Synchronization
The 8086 and 8087 must be properly synchronized to ensure correct program execution.
Synchronization Mechanisms:
1. Queue Status (QS0, QS1):
- 8086 informs 8087 about instruction queue operations
- 8087 tracks instruction stream to identify its instructions
2. WAIT Instruction:
- CPU waits for coprocessor to complete operation
- Tests TEST pin until it goes active (low)
- Automatically inserted by assembler for certain operations
3. FWAIT Instruction:
- Explicit wait for coprocessor
- Same as WAIT but specifically for floating point
Example with synchronization:
FLD a ; Load a (8087 starts operation)
FWAIT ; Wait for load to complete
FADD b ; Add b (requires previous operation complete)
FWAIT ; Wait for add to complete
FSTP result ; Store result
Exception Handling
8087 Exception Types:
1. Invalid Operation (e.g., sqrt of negative number)
2. Denormalized Operand
3. Divide by Zero
4. Overflow (result too large)
5. Underflow (result too small)
6. Inexact Result (rounding occurred)
Exception Response:
- Set appropriate bit in Status Register
- If exception is unmasked, assert ERROR signal
- 8086 can read status and handle exception
Control Word Exception Masks:
Bit 0: Invalid Operation Mask
Bit 1: Denormalized Operand Mask
Bit 2: Zero Divide Mask
Bit 3: Overflow Mask
Bit 4: Underflow Mask
Bit 5: Precision Mask
Advanced 8086 Features
String Processing Capabilities
8086 provides powerful string processing instructions that work with large blocks of data.
String Instructions:
MOVS ; Move string (byte or word)
CMPS ; Compare strings
SCAS ; Scan string for value
LODS ; Load string element
STOS ; Store string element
Repeat Prefixes:
REP ; Repeat CX times
REPE ; Repeat while equal (and CX > 0)
REPNE ; Repeat while not equal (and CX > 0)
High-Performance String Operations:
; Copy 1000 words from source to destination
MOV SI, OFFSET SOURCE
MOV DI, OFFSET DEST
MOV CX, 1000
CLD ; Clear direction flag (forward)
REP MOVSW ; Repeat move word
Performance: 17 cycles per word vs 25+ cycles for manual loop
Interrupt System Enhancement
Advanced Interrupt Features:
1. Interrupt Vector Table (IVT):
- 256 interrupt vectors (0-255)
- Each vector: 4 bytes (IP:CS)
- Located at memory 0000:0000 to 0000:03FF
2. Interrupt Types:
- Hardware Interrupts (INTR, NMI)
- Software Interrupts (INT instruction)
- Exception Interrupts (divide by zero, etc.)
3. Interrupt Priority:
Priority (highest to lowest):
1. Divide Error, INTO, INT instructions
2. NMI (Non-Maskable Interrupt)
3. INTR (Maskable Hardware Interrupt)
4. Single Step (Trap Flag)
4. Interrupt Service Routine Template:
ISR_TEMPLATE PROC
PUSH AX ; Save registers
PUSH BX
PUSH CX
; ... save other registers
; Interrupt service code here
POP CX ; Restore registers (reverse order)
POP BX
POP AX
IRET ; Return from interrupt
ISR_TEMPLATE ENDP
Memory Management Features
Bus Interface Features
Advanced Bus Features:
1. Minimum vs Maximum Mode:
Minimum Mode (MN/MX = 1):
- Single processor system
- Simple control signals
- Direct memory and I/O control
Maximum Mode (MN/MX = 0):
- Multiprocessor capable
- Bus arbitration support
- Coprocessor interface
2. Bus Arbitration (Maximum Mode):
- RQ/GT0, RQ/GT1: Request/Grant lines
- LOCK: Bus lock for atomic operations
- Support for multiple processors
3. Queue Status Output:
- QS0, QS1 inform external devices about instruction queue
- Used by coprocessors and debugging tools
Performance Optimization Techniques
Coprocessor Performance Benefits
Performance Comparison: Floating point operations with and without 8087.
Operation Performance (8MHz system):
32-bit Floating Point Addition:
Software (8086 only): ~1000 cycles = 125 μs
Hardware (8087): ~70 cycles = 8.75 μs
Speedup: 1000/70 = 14.3×
32-bit Floating Point Multiplication:
Software (8086 only): ~1600 cycles = 200 μs
Hardware (8087): ~90 cycles = 11.25 μs
Speedup: 1600/90 = 17.8×
32-bit Floating Point Division:
Software (8086 only): ~2400 cycles = 300 μs
Hardware (8087): ~145 cycles = 18.1 μs
Speedup: 2400/145 = 16.6×
Transcendental Functions (sin, cos, log):
Software (8086 only): ~10,000+ cycles
Hardware (8087): ~300-500 cycles
Speedup: 20-30×
Memory Access Optimization
Optimization Techniques:
1. Instruction Prefetch Queue:
- 6-byte prefetch queue reduces memory access
- Sequential instructions execute faster
- Branch instructions may cause queue flush
2. Memory Organization:
- Place code in fast memory
- Organize data for sequential access
- Minimize segment changes
3. Register Usage:
- Keep frequently used data in registers
- Minimize memory-to-memory operations
- Use string instructions for block operations
Example Optimization:
; Inefficient: Multiple memory accesses
ADD [RESULT], [VAR1]
ADD [RESULT], [VAR2]
ADD [RESULT], [VAR3]
; Efficient: Use accumulator
MOV AX, [VAR1]
ADD AX, [VAR2]
ADD AX, [VAR3]
MOV [RESULT], AX
System-Level Programming
Advanced System Programming Techniques:
1. Custom Interrupt Handlers:
; Install custom interrupt handler
CLI ; Disable interrupts
MOV AX, 0 ; Segment 0 (IVT)
MOV ES, AX
MOV BX, 21H*4 ; INT 21H vector offset
MOV WORD PTR ES:[BX], OFFSET MY_HANDLER
MOV WORD PTR ES:[BX+2], SEG MY_HANDLER
STI ; Re-enable interrupts
2. Memory-Mapped I/O:
; Access I/O device through memory mapping
MOV AX, 0B800H ; Video memory segment
MOV ES, AX
MOV DI, 0 ; Offset 0
MOV AL, 'A' ; Character to display
MOV ES:[DI], AL ; Write to video memory
3. DMA Programming:
; Set up DMA transfer (requires external DMA controller)
MOV AL, 04H ; Channel 0, single transfer
OUT 0AH, AL ; DMA mask register
; Set up address and count registers
; Enable DMA transfer
Numerical Problems and Applications
Problem 1: Coprocessor Performance Analysis
Question: Calculate time savings when using 8087 for a program that performs 1000 floating-point operations.
Solution:
Given:
- 1000 operations: 250 additions, 250 multiplications, 250 divisions, 250 square roots
- 8086 @ 8MHz (125ns per cycle)
Software Implementation (8086 only):
Additions: 250 × 1000 cycles = 250,000 cycles
Multiplications: 250 × 1600 cycles = 400,000 cycles
Divisions: 250 × 2400 cycles = 600,000 cycles
Square roots: 250 × 5000 cycles = 1,250,000 cycles
Total: 2,500,000 cycles = 312.5 ms
Hardware Implementation (8087):
Additions: 250 × 70 cycles = 17,500 cycles
Multiplications: 250 × 90 cycles = 22,500 cycles
Divisions: 250 × 145 cycles = 36,250 cycles
Square roots: 250 × 180 cycles = 45,000 cycles
Total: 121,250 cycles = 15.16 ms
Time Saved: 312.5 - 15.16 = 297.34 ms (95.2% reduction)
Speedup: 312.5 / 15.16 = 20.6×
Problem 2: Memory Layout for Coprocessor System
Question: Design memory layout for a system with 8086 + 8087 handling scientific calculations.
Solution:
Memory Layout Design:
0000:0000 - 0000:03FF Interrupt Vector Table (1KB)
0000:0400 - 0000:04FF BIOS Data Area (256 bytes)
0000:0500 - 9FFF:FFFF Available RAM (~640KB)
Recommended Organization:
2000:0000 - 2FFF:FFFF Program Code (64KB)
3000:0000 - 3FFF:FFFF Floating Point Data (64KB)
4000:0000 - 4FFF:FFFF Integer Data (64KB)
5000:0000 - 5FFF:FFFF Stack Space (64KB)
6000:0000 - 9FFF:FFFF Dynamic Memory/Buffers (~256KB)
Segment Register Setup:
CS = 2000H ; Code segment
DS = 3000H ; Floating point data
ES = 4000H ; Integer data
SS = 5000H ; Stack segment
Benefits:
- Clear separation of data types
- Optimal for 8087 operations
- Good memory utilization
- Easy debugging and maintenance
Problem 3: Interrupt Latency Calculation
Question: Calculate worst-case interrupt latency for 8086 system.
Solution:
Interrupt Latency Components:
1. Current Instruction Completion:
Worst case: String instruction with REP prefix
Example: REP MOVSW with CX=1000
Time: 1000 × 17 cycles = 17,000 cycles
2. Interrupt Acknowledge Sequence:
- Save flags: 1 cycle
- Clear IF flag: 1 cycle
- Save CS: 2 cycles
- Save IP: 2 cycles
- Jump to ISR: 4 cycles
Total: 10 cycles
3. ISR Entry Overhead:
- PUSH registers: ~8 cycles
- Setup: ~5 cycles
Total: 13 cycles
Worst Case Total:
17,000 + 10 + 13 = 17,023 cycles
At 8MHz: 17,023 × 125ns = 2.13 ms
Best Case (simple instruction):
2 + 10 + 13 = 25 cycles = 3.125 μs
Answer: Latency range 3.125 μs to 2.13 ms
Practical Applications and Examples
Scientific Calculator Implementation
; Simple scientific calculator using 8087
.DATA
num1 DD ? ; First operand
num2 DD ? ; Second operand
result DD ? ; Result
operation DB ? ; Operation code
.CODE
CALCULATOR PROC
; Get first number
CALL INPUT_FLOAT ; Returns number in ST(0)
FSTP num1 ; Store first number
; Get operation
CALL INPUT_OPERATION ; Returns operation in AL
MOV operation, AL
; Get second number
CALL INPUT_FLOAT ; Returns number in ST(0)
FSTP num2 ; Store second number
; Perform calculation
FLD num1 ; Load first number
FLD num2 ; Load second number
CMP operation, '+'
JE DO_ADD
CMP operation, '-'
JE DO_SUB
CMP operation, '*'
JE DO_MUL
CMP operation, '/'
JE DO_DIV
CMP operation, 's' ; Square root
JE DO_SQRT
DO_ADD:
FADD ; ST(0) = num1 + num2
JMP STORE_RESULT
DO_SUB:
FSUB ; ST(0) = num1 - num2
JMP STORE_RESULT
DO_MUL:
FMUL ; ST(0) = num1 * num2
JMP STORE_RESULT
DO_DIV:
FDIV ; ST(0) = num1 / num2
JMP STORE_RESULT
DO_SQRT:
FSTP ST(1) ; Remove num2, keep num1
FSQRT ; ST(0) = sqrt(num1)
STORE_RESULT:
FSTP result ; Store result
; Display result
CALL DISPLAY_FLOAT
RET
CALCULATOR ENDP
Matrix Operations with Coprocessor
; Matrix multiplication using 8087
MATRIX_MULTIPLY PROC
; Multiply two 3x3 matrices: C = A × B
PUSHA ; Save all registers
MOV SI, OFFSET MATRIX_A
MOV DI, OFFSET MATRIX_B
MOV BX, OFFSET MATRIX_C
MOV ROW, 0 ; Initialize row counter
ROW_LOOP:
MOV COL, 0 ; Initialize column counter
COL_LOOP:
FLDZ ; Initialize sum to 0
MOV K, 0 ; Initialize inner loop counter
INNER_LOOP:
; Calculate A[row][k] * B[k][col]
MOV AX, ROW
MOV CX, 3 ; 3 columns per row
MUL CX ; AX = row * 3
ADD AX, K ; AX = row * 3 + k
SHL AX, 2 ; AX = (row * 3 + k) * 4 (float size)
FLD DWORD PTR [SI + AX] ; Load A[row][k]
MOV AX, K
MOV CX, 3
MUL CX ; AX = k * 3
ADD AX, COL ; AX = k * 3 + col
SHL AX, 2 ; AX = (k * 3 + col) * 4
FMUL DWORD PTR [DI + AX] ; Multiply by B[k][col]
FADD ; Add to sum
INC K
CMP K, 3
JB INNER_LOOP
; Store result C[row][col]
MOV AX, ROW
MOV CX, 3
MUL CX
ADD AX, COL
SHL AX, 2
FSTP DWORD PTR [BX + AX]
INC COL
CMP COL, 3
JB COL_LOOP
INC ROW
CMP ROW, 3
JB ROW_LOOP
POPA ; Restore registers
RET
MATRIX_MULTIPLY ENDP
Debugging and Development Tools
Coprocessor State Monitoring
; Procedure to display 8087 status for debugging
DISPLAY_8087_STATUS PROC
; Read 8087 status word
FNSTSW AX ; Store status word to AX (no wait)
; Check condition codes
TEST AX, 4000H ; Test C3
JZ C3_CLEAR
; C3 is set
CALL DISPLAY_C3_SET
JMP CHECK_C2
C3_CLEAR:
CALL DISPLAY_C3_CLEAR
CHECK_C2:
TEST AX, 0400H ; Test C2
JZ C2_CLEAR
; C2 is set
CALL DISPLAY_C2_SET
JMP CHECK_C1
C2_CLEAR:
CALL DISPLAY_C2_CLEAR
; Similar checks for C1 and C0...
; Check exception flags
TEST AX, 0001H ; Invalid operation
JNZ INVALID_OP_ERROR
TEST AX, 0004H ; Zero divide
JNZ ZERO_DIV_ERROR
; Display stack top pointer
MOV CX, AX
AND CX, 3800H ; Mask TOP field
SHR CX, 11 ; Shift to get TOP value
; Display TOP value...
RET
DISPLAY_8087_STATUS ENDP
; Procedure to save complete 8087 state
SAVE_8087_STATE PROC
FNSAVE COPROCESSOR_STATE ; Save entire 8087 state
RET
SAVE_8087_STATE ENDP
; Procedure to restore 8087 state
RESTORE_8087_STATE PROC
FRSTOR COPROCESSOR_STATE ; Restore 8087 state
RET
RESTORE_8087_STATE ENDP
Performance Profiling
; Simple performance measurement using timer
MEASURE_PERFORMANCE PROC
; Save current timer value
MOV AH, 00H ; Get system time
INT 1AH ; BIOS timer interrupt
MOV START_TIME_LOW, DX
MOV START_TIME_HIGH, CX
; Execute code to measure
CALL FUNCTION_TO_MEASURE
; Get end time
MOV AH, 00H
INT 1AH
MOV END_TIME_LOW, DX
MOV END_TIME_HIGH, CX
; Calculate elapsed time
SUB DX, START_TIME_LOW
SBB CX, START_TIME_HIGH
; Convert to microseconds (timer ticks at 18.2 Hz)
; Each tick = 54945 microseconds
; ... conversion code ...
; Display elapsed time
CALL DISPLAY_TIME
RET
MEASURE_PERFORMANCE ENDP
Summary
The 8086's coprocessor interface and advanced features significantly expanded its capabilities beyond basic integer processing. The 8087 FPU partnership created a powerful platform for scientific and engineering applications, while advanced features like string processing and sophisticated interrupt handling made it suitable for system-level programming.
Key Advanced Features:
- Transparent coprocessor interface enabling specialized processing units
- 8087 FPU providing IEEE 754 floating-point arithmetic
- Advanced string processing with repeat prefixes
- Comprehensive interrupt system with 256 vectors
- Flexible memory management through segmentation
- Bus arbitration support for multiprocessor systems
- Performance optimization through instruction prefetch
Impact on Computing:
- Enabled practical floating-point computation on microprocessors
- Established coprocessor architecture pattern
- Provided foundation for modern x86 advanced features
- Made microprocessors viable for scientific applications
- Demonstrated importance of specialized processing units