Day 2: Matrix Operations & Image Representation in OpenCV
Hey there! Welcome to KnowledgeKnot! Don't forget to share this with your friends and revisit often. Your support motivates us to create more content in the future. Thanks for being awesome!
What Are the Prerequisites for Today's Learning?
Before diving into matrix operations and image representation, ensure you have the following packages installed:
pip install opencv-python numpy matplotlib scipy
Import statements for all our examples:
import cv2
import numpy as np
import matplotlib.pyplot as plt
from scipy.linalg import eig
import math
What Is a Matrix in the Context of Images?
A matrix is a rectangular array of numbers arranged in rows and columns. In computer vision, images are represented as matrices where each element represents a pixel's intensity or color value.
Key Mathematical Concepts:
For a matrix A with dimensions m×n:
→ Element Access: A[i,j] represents the element in row i, column j
What Are Eigenvalues and Eigenvectors in Computer Vision?
Eigenvalues and eigenvectors reveal the fundamental directions and scaling factors of a transformation.
For a matrix A: A×v=λ×v
→ v is an eigenvector
→ λ is the corresponding eigenvalue
Applications in Computer Vision:
→ Principal Component Analysis (PCA)
→ Corner detection (Harris corners)
→ Object orientation analysis
→ Dimensionality reduction
def explore_eigenvalues_in_image_analysis():
"""
Demonstrate eigenvalue applications in image analysis
"""
# Create a covariance matrix (common in image analysis)
covariance_matrix = np.array([
[25.0, 15.0],
[15.0, 20.0]
])
# Calculate eigenvalues and eigenvectors
eigenvalues, eigenvectors = np.linalg.eig(covariance_matrix)
# Interpretation for image analysis
dominant_eigenvalue = np.max(eigenvalues)
dominant_index = np.argmax(eigenvalues)
dominant_eigenvector = eigenvectors[:, dominant_index]
# Calculate the angle of the dominant direction
angle_degrees = np.degrees(np.arctan2(dominant_eigenvector[1], dominant_eigenvector[0]))
How Are Digital Images Represented as Matrices?
Digital images are essentially matrices of pixel values:
→ Grayscale: Single channel, values typically 0-255
→ Color (RGB): Three channels (Red, Green, Blue)
→ Color (BGR): OpenCV's default format (Blue, Green, Red)
def demonstrate_image_representation():
"""
Show how images are represented as matrices
"""
height, width = 100, 100
# Grayscale gradient image
grayscale_img = np.zeros((height, width), dtype=np.uint8)
for i in range(height):
for j in range(width):
grayscale_img[i, j] = (i + j) % 256
# Color image with different patterns in each channel
color_img = np.zeros((height, width, 3), dtype=np.uint8)
# Red channel - horizontal gradient
color_img[:, :, 0] = np.linspace(0, 255, width).astype(np.uint8)
# Green channel - vertical gradient
color_img[:, :, 1] = np.linspace(0, 255, height).reshape(-1, 1).astype(np.uint8)
# Blue channel - checkerboard pattern
for i in range(height):
for j in range(width):
color_img[i, j, 2] = ((i // 10) + (j // 10)) % 2 * 255
Image Matrix Properties:
→ Grayscale image shape: (height, width)
→ Color image shape: (height, width, channels)
→ Data type: typically uint8 (0-255 range)
→ Coordinate system: (row, column) or (y, x)
What's the Difference Between RGB and BGR Formats?
OpenCV uses BGR (Blue, Green, Red) format by default, while most other libraries use RGB (Red, Green, Blue). This is a common source of confusion.
def demonstrate_rgb_bgr_difference():
"""
Show the difference between RGB and BGR formats
"""
# Create a simple color image with distinct regions
height, width = 100, 150
rgb_img = np.zeros((height, width, 3), dtype=np.uint8)
# Create red, green, and blue regions
rgb_img[:, 0:50, 0] = 255 # Red region in R channel
rgb_img[:, 50:100, 1] = 255 # Green region in G channel
rgb_img[:, 100:150, 2] = 255 # Blue region in B channel
# Convert RGB to BGR (OpenCV format)
bgr_img = cv2.cvtColor(rgb_img, cv2.COLOR_RGB2BGR)
# Important: When displaying with matplotlib, we need RGB format
Key Points:
→ OpenCV loads images in BGR format
→ Matplotlib expects RGB format for display
→ Always convert between formats when needed: cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
→ Incorrect format leads to wrong colors in visualization
How Do We Manipulate Individual Pixels?
Understanding pixel manipulation is crucial for many image processing tasks:
def demonstrate_pixel_manipulation():
"""
Show various pixel manipulation techniques
"""
# Create or load a sample image
img = np.random.randint(0, 256, (200, 200, 3), dtype=np.uint8)
# 1. Accessing individual pixels
pixel_value = img[100, 100, :] # Get RGB values at (100, 100)
# 2. Modifying individual pixels
img[100, 100] = [255, 0, 0] # Set to red
# 3. Accessing regions (slicing)
top_left_region = img[0:50, 0:50]
# 4. Modifying regions
img[0:50, 0:50] = [0, 255, 0] # Set to green
# 5. Channel-wise operations
red_channel = img[:, :, 0]
green_channel = img[:, :, 1]
blue_channel = img[:, :, 2]
# 6. Conditional pixel modification
# Make all pixels with low red values completely black
mask = img[:, :, 0] < 100
img[mask] = [0, 0, 0]
# Safe pixel access function
def safe_pixel_access(img, y, x):
h, w = img.shape[:2]
if 0 <= y < h and 0 <= x < w:
return img[y, x]
else:
return None
# Safe arithmetic operations
result = cv2.add(img1, img2) # Instead of img1 + img2
result = cv2.multiply(img, 1.5) # Instead of img * 1.5
What Are the Key Takeaways from Day 2?
Mathematical Foundations:
→ Matrices represent images as arrays of pixel values