100% found this document useful (1 vote)
3K views97 pages

Computer Vision Notes

The document discusses key concepts in computer vision including image formation, geometric primitives, digital cameras, and point operators. It covers topics such as photometric image formation, common geometric shapes and transformations, components of digital cameras, and basic point-based image processing operations.

Uploaded by

Ns
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
3K views97 pages

Computer Vision Notes

The document discusses key concepts in computer vision including image formation, geometric primitives, digital cameras, and point operators. It covers topics such as photometric image formation, common geometric shapes and transformations, components of digital cameras, and basic point-based image processing operations.

Uploaded by

Ns
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 97

EAIDS254 – COMPUTER VISION JEPPIAAR UNIVERSITY

UNIT-1
INTRODUCTION TO IMAGE FORMATION AND PROCESSING
Computer Vision - Geometric primitives and transformations – Photometric
image formation-The digital camera-Point operators- Linear filtering - More
neighborhood operators - Fourier transforms - Pyramids and wavelets -
Geometric transformations - Global optimization.

1. Computer Vision:

Computer vision is a multidisciplinary field that enables machines to interpret and make
decisions based on visual data. It involves the development of algorithms and systems
that allow computers to gain high-level understanding from digital images or videos. The
goal of computer vision is to replicate and improve upon human vision capabilities,
enabling machines to recognize and understand visual information.

Key tasks in computer vision include:

1. Image Recognition: Identifying objects, people, or patterns within images.

2. Object Detection: Locating and classifying multiple objects within an image or video
stream.

3. Image Segmentation: Dividing an image into meaningful segments or regions, often


to identify boundaries and structures.

4. Face Recognition: Identifying and verifying individuals based on facial


features.

5. Gesture Recognition: Understanding and interpreting human gestures from images


or video.

6. Scene Understanding: Analyzing and comprehending the content and context of a


scene.

B.Tech [AIML/DS]
EAIDS254 – COMPUTER VISION JEPPIAAR UNIVERSITY

7. Motion Analysis: Detecting and tracking movements within video sequences.

8. 3DReconstruction:Creating

9. Three-dimensional models of objects or scenes from two-dimensional images.

Computer vision applications are diverse and found in various fields, including
healthcare (medical image analysis), autonomous vehicles, surveillance,
augmentedreality,robotics,industrialautomation,andmore.Advancesindeep learning,
especially convolutional neural networks (CNNs), have significantly contributed to the
progress and success of computer vision tasks by enabling efficient feature learning from
large datasets.

2. Geometric primitives and transformations:

Geometric primitives and transformations are fundamental concepts in computer graphics and computer vision.
They form the basis for representing and manipulating visual elements in both 2D and 3D spaces. Let's explore
each of these concepts:
Geometric Primitives:
1. Points: Represented by coordinates (x, y) in 2D or (x, y, z) in 3D space.

2. Lines and Line Segments: Defined by two points or a point and a direction vector.

3. Polygons: Closed shapes with straight sides. Triangles, quadrilaterals, and other polygons are common
geometric primitives.

4. Circles and Ellipses: Defined by a center point and radii (or axes in the case of ellipses).

5. Curves: Bézier curves, spline curves, and other parametric curves are used to represent smooth shapes.

Geometric Transformations:
Geometric transformations involve modifying the position, orientation, and scale of geometric primitives.
Common transformations include

B.Tech [AIML/DS]
EAIDS254 – COMPUTER VISION JEPPIAAR UNIVERSITY

1. Translation: Moves an object by a certain distance along a specified direction.

2. Rotation: Rotates an object around a specified point or axis.

3. Scaling: Changes the size of an object along different axes.

4. Shearing: Distorts the shape of an object by stretching or compressing along one or more axes.

5. Reflection: Mirrors an object across a specified plane.

6. Affine Transformations: Combine translation, rotation, scaling, and shearing.

7. Projective Transformations: Used for perspective transformations in 3D graphics.

Applications:

Computer Graphics: Geometric primitives and transformations are fundamental for rendering 2D and 3D
graphics in applications such as video games, simulations, and virtual reality.

Computer-Aided Design (CAD): Used for designing and modeling objects in engineering and architecture.

Computer Vision: Geometric transformations are applied to align and process images, correct distortions, and
perform other tasks in image analysis.

Robotics: Essential for robot navigation, motion planning, and spatial reasoning.

Understanding geometric primitives and transformations is crucial for creating realistic and visually appealing
computer-generated images, as well as for solving various problems in computer vision and robotics.

3. Photometric image formation:

Photometric image formation refers to the process by which light interacts with surfaces and is captured by a
camera, resulting in the creation of a digital image. This process involves various factors related to the properties
of light, the surfaces of objects, and the characteristics of the imaging system. Understanding photometric Image
formation is crucial in computer vision, computer graphics, and image processing.

B.Tech [AIML/DS]
EAIDS254 – COMPUTER VISION JEPPIAAR UNIVERSITY

Here are some key concepts involved:

Illumination:
- Ambient Light: The overall illumination of a scene that comes from all directions.
- Directional Light: Light coming from a specific direction, which can create highlights and shadows.

Reflection:
- Diffuse Reflection: Light that is scattered in various directions by rough surfaces.
- Specular Reflection: Light that reflects off smooth surfaces in a concentrated direction, creating highlights.

Shading:
- Lambertian Shading: A model that assumes diffuse reflection and constant shading across a surface.
- Phong Shading: A more sophisticated model that considers specular reflection, creating more realistic
highlights.

Surface Properties:
- Reflectance Properties: Material characteristics that determine how light is reflected (e.g., diffuse and specular
reflectance).
- Albedo: The inherent reflectivity of a surface, representing the fraction of incident light that is reflected.

Lighting Models:
- Phong Lighting Model: Combines diffuse and specular reflection components to model lighting.
- Blinn-Phong Model: Similar to the Phong model but computationally more efficient.

Shadows:
- Cast Shadows: Darkened areas on surfaces where light is blocked by other objects.
- Self Shadows: Shadows cast by parts of an object onto itself.

B.Tech [AIML/DS]
EAIDS254 – COMPUTER VISION JEPPIAAR UNIVERSITY

Color and Intensity:


- Color Reflection Models: Incorporate the color properties of surfaces in addition to reflectance.
- Intensity: The brightness of light or color in an image.

Cameras:
- Camera Exposure: The amount of light allowed to reach the camera sensor or film.
- Camera Response Function: Describes how a camera responds to light of different intensities.

4. The digital camera:

A digital camera is an electronic device that captures and stores digital images. It differs from traditional film
cameras in that it uses electronic sensors to record images rather than photographic film. Digital cameras have
become widespread due to their convenience, ability to instantly review images, and ease of sharing and storing
photos digitally. Here are key components and concepts related to digital cameras:

Image Sensor:
- Digital cameras use image sensors (such as CCD or CMOS) to convert light into electrical signals.
- The sensor captures the image by measuring the intensity of light at each pixel location.

Lens:
- The lens focuses light onto the image sensor.
- Zoom lenses allow users to adjust the focal length, providing optical zoom.

Aperture:
- The aperture is an adjustable opening in the lens that controls the amount of light entering the camera.

- It affects the depth of field and exposure.

Shutter:
- The shutter mechanism controls the duration of light exposure to the image sensor.
- Fast shutter speeds freeze motion, while slower speeds create motion blur.

Viewfinder and LCD Screen:


- Digital cameras typically have an optical or electronic viewfinder for composing shots.
- LCD screens on the camera back allow users to view and frame images.

Image Processor:
- Digital cameras include a built-in image processor to convert raw sensor data into a viewable image.
- Image processing algorithms may enhance color, sharpness, and reduce noise.

Memory Card:
- Digital images are stored on removable memory cards, such as SD or CF cards.
- Memory cards provide a convenient and portable way to store and transfer images.

Autofocus and Exposure Systems:


B.Tech [AIML/DS]
EAIDS254 – COMPUTER VISION JEPPIAAR UNIVERSITY

- Autofocus systems automatically adjust the lens to ensure a sharp image.


- Exposure systems determine the optimal combination of aperture, shutter speed, and ISO sensitivity for proper
exposure.

White Balance:
- White balance settings adjust the color temperature of the captured image to match different lighting
conditions.

Modes and Settings:


- Digital cameras offer various shooting modes (e.g., automatic, manual, portrait, landscape) and settings to
control image parameters.

Connectivity:
- USB, HDMI, or wireless connectivity allows users to transfer images to computers, share online, or connect to
other devices.

Battery:
- Digital cameras are powered by rechargeable batteries, providing the necessary energy for capturing and
processing images.

B.Tech [AIML/DS]
EAIDS254 – COMPUTER VISION JEPPIAAR UNIVERSITY

5. Point operators:
Point operators, also known as point processing or pixel-wise operations, are basic image processing operations that
operate on individual pixels independently. These operations are applied to each pixel in an image without considering the
values of neighboring pixels. Point operators typically involve mathematical operations or functions that transform the
pixel values, resulting in changes to the image's appearance. Here are some common point operators:

Brightness Adjustment:
- Addition/Subtraction: Increase or decrease the intensity of all pixels by adding or subtracting a constant value.
- Multiplication/Division: Scale the intensity values by multiplying or dividing them by a constant factor.

Contrast Adjustment:
- Linear Contrast Stretching: Rescale the intensity values to cover the full dynamic range.
- Histogram Equalization: Adjust the distribution of pixel intensities to enhance contrast.

Gamma Correction:
- Adjust the gamma value to control the overall brightness and contrast of an image.

Thresholding:
- Convert a grayscale image to binary by setting a threshold value. Pixels with values above the threshold become white,
and those below become black.

Bit-plane Slicing:
- Decompose an image into its binary representation by considering individual bits.

Color Mapping:
- Apply color transformations to change the color balance or convert between color spaces (e.g., RGB to grayscale).

Inversion:
- Invert the intensity values of pixels, turning bright areas dark and vice versa.

Image Arithmetic:
- Perform arithmetic operations between pixels of two images, such as addition, subtraction, multiplication, or division.

B.Tech [AIML/DS]
EAIDS254 – COMPUTER VISION JEPPIAAR UNIVERSITY

B.Tech [AIML/DS]
EAIDS254 – COMPUTER VISION JEPPIAAR UNIVERSITY

Point operators are foundational in image processing and form the basis for more complex operations. They are
often used in combination to achieve desired enhancements or modifications to images. These operations are
computationally efficient, as they can be applied independently to each pixel, making them suitable for real-time
applications and basic image manipulation tasks.

It's important to note that while point operators are powerful for certain tasks, more advanced image processing
techniques, such as filtering and convolution, involve considering the values of neighboring pixels and are
applied to local image regions.

Linear filtering:

Linear filtering is a fundamental concept in image processing that involves applying a linear operator to an
image. The linear filter operates on each pixel in the image by combining its value with the values of its
neighboring pixels according to a predefined convolution kernel or matrix. The convolution operation is a
mathematical operation that computes the weighted sum of pixel values in the image, producing a new value for
the center pixel.

The general formula for linear filtering or convolution is given by:

Where:

Common linear filtering operations include:


Blurring/Smoothing:
- Average filter: Each output pixel is the average of its neighboring pixels.
- Gaussian filter: Applies a Gaussian distribution to compute weights for pixel averaging.

B.Tech [AIML/DS]
EAIDS254 – COMPUTER VISION JEPPIAAR UNIVERSITY

Edge Detection:
- Sobel filter: Emphasizes edges by computing gradients in the x and y directions.
- Prewitt filter: Similar to Sobel but uses a different kernel for gradient computation.
Sharpening:
- Laplacian filter: Enhances high-frequency components to highlight edges.
- High-pass filter: Emphasizes details by subtracting a blurred version of the image.
Embossing:
- Applies an embossing effect by highlighting changes in intensity.

B.Tech [AIML/DS]
EAIDS254 – COMPUTER VISION JEPPIAAR UNIVERSITY

Linear filtering is a versatile technique and forms the basis for more advanced image processing operations. The
convolution operation can be efficiently implemented using convolutional neural networks (CNNs) in deep
learning, where filters are learned during the training process to perform tasks such as image recognition,
segmentation, and denoising. The choice of filter kernel and parameters determines the specific effect achieved
through linear filtering.
6. More neighborhood operators :
Neighborhood operators in image processing involve the consideration of pixel values in the vicinity of a target
pixel, usually within a defined neighborhood or window. Unlike point operators that operate on individual pixels,
neighborhood operators take into account the local structure of the image. Here are some common neighborhood
operators:

B.Tech [AIML/DS]
EAIDS254 – COMPUTER VISION JEPPIAAR UNIVERSITY

Median Filter:
- Computes the median value of pixel intensities within a local neighborhood.
- Effective for removing salt-and-pepper noise while preserving edges.

Gaussian Filter:
- Applies a weighted average to pixel values using a Gaussian distribution.
- Used for blurring and smoothing, with the advantage of preserving edges.

Bilateral Filter:
- Combines spatial and intensity information to smooth images while preserving edges.
- Uses two Gaussian distributions, one for spatial proximity and one for intensity similarity.

Non-local Means Filter:


- Computes the weighted average of pixel values based on similarity in a larger non-local neighborhood.
- Effective for denoising while preserving fine structures.

Anisotropic Diffusion:
- Reduces noise while preserving edges by iteratively diffusing intensity values along edges.
- Particularly useful for images with strong edges.

Morphological Operators:
- Dilation: Expands bright regions by considering the maximum pixel value in a neighborhood.

Erosion:
- Contracts bright regions by considering the minimum pixel value in a neighborhood.
- Used for operations like noise reduction, object segmentation, and shape analysis.

Laplacian of Gaussian (LoG):


- Applies a Gaussian smoothing followed by the Laplacian operator.
- Useful for edge detection.

Canny Edge Detector:


- Combines Gaussian smoothing, gradient computation, non-maximum suppression, and edge tracking by
hysteresis.
- Widely used for edge detection in computer vision applications.

Homomorphic Filtering:
- Adjusts image intensity by separating the image into illumination and reflectance components.
- Useful for enhancing images with non-uniform illumination.

Adaptive Histogram Equalization:


- Improves contrast by adjusting the histogram of pixel intensities based on local neighborhoods.
- Effective for enhancing images with varying illumination.

These neighborhood operators play a crucial role in image enhancement, denoising, edge detection, and other
image processing tasks. The choice of operator depends on the specific characteristics of the image and the
desired outcome.
B.Tech [AIML/DS]
EAIDS254 – COMPUTER VISION JEPPIAAR UNIVERSITY

7. Fourier transforms:

Fourier transforms play a significant role in computer vision for analyzing and processing images. They are used
to decompose an image into its frequency components, providing valuable information for tasks such as image
filtering, feature extraction, and pattern recognition. Here are some ways Fourier transforms are employed in
computer vision:

Frequency Analysis:
- Fourier transforms help in understanding the frequency content of an image. High-frequency components
correspond to edges and fine details, while low-frequency components represent smooth regions.

Image Filtering:
Filtering in the frequency domain allows for efficient operations such as blurring or sharpening. Low-pass filters
remove high-frequency noise, while high-pass filters enhance edges and fine details.

Image Enhancement:
- Adjusting the amplitude of specific frequency components can enhance or suppress certain features in an
image. This is commonly used in image enhancement techniques.

Texture Analysis:
- Fourier analysis is useful in characterizing and classifying textures based on their frequency characteristics. It
helps distinguish between textures with different patterns.

Pattern Recognition:
- Fourier descriptors, which capture shape information, are used for representing and recognizing objects in
images. They provide a compact representation of shape by capturing the dominant frequency components.

Image Compression:
- Transform-based image compression, such as JPEG compression, utilizes Fourier transforms to transform
image data into the frequency domain. This allows for efficient quantization and coding of frequency
components.

Image Registration:
- Fourier transforms are used in image registration, aligning images or transforming them to a common
coordinate system. Cross-correlation in the frequency domain is often employed for this purpose.

Optical Character Recognition (OCR):


- Fourier descriptors are used in OCR systems for character recognition. They help in capturing the shape
information of characters, making the recognition process more robust.

Homomorphic Filtering:
- Homomorphic filtering, which involves transforming an image to a logarithmic domain using Fourier
transforms, is used in applications such as document analysis and enhancement.

B.Tech [AIML/DS]
EAIDS254 – COMPUTER VISION JEPPIAAR UNIVERSITY

Image Reconstruction:
- Fourier transforms are involved in techniques like computed tomography (CT) or magnetic resonance imaging
(MRI) for reconstructing images from their projections.

The efficient computation of Fourier transforms, particularly through the use of the Fast Fourier Transform (FFT)
algorithm, has made these techniques computationally feasible for real-time applications in computer vision. The ability to
analyze images in the frequency domain provides valuable insights and contributes to the development of advanced image
processing techniques.

1. Pyramids and Wavelets:


Pyramids and wavelets are both techniques used in image processing for multi-resolution analysis, allowing the
representation of an image at different scales. They are valuable for tasks such as image compression, feature extraction,
and image analysis.

Image Pyramids:
Image pyramids are a series of images representing the same scene but at different resolutions. There are two main types of
image pyramids:

Gaussian Pyramid:
- Created by repeatedly applying Gaussian smoothing and downsampling to an image.
- At each level, the image is smoothed to remove high-frequency information, and then it is subsampled to reduce its size.
- Useful for tasks like image blending, image matching, and coarse-to-fine image processing.

Laplacian Pyramid:
- Derived from the Gaussian pyramid.
- Each level of the Laplacian pyramid is obtained by subtracting the expanded version of the higher level Gaussian pyramid
from the original image.
- Useful for image compression and coding, where the Laplacian pyramid represents the residual information not captured
by the Gaussian pyramid.

Image pyramids are especially useful for creating multi-scale representations of images, which can be beneficial for various
computer vision tasks.

Wavelets:
Wavelets are mathematical functions that can be used to analyze signals and images. Wavelet transforms provide a multi-
resolution analysis by decomposing an image into approximation (low-frequency) and detail (high-frequency) components.
Key concepts include:

Wavelet Transform:
- The wavelet transform decomposes an image into different frequency components by convolving the image with wavelet
functions.
- The result is a set of coefficients that represent the image at various scales and orientations.

Multi-resolution Analysis:
- Wavelet transforms offer a multi-resolution analysis, allowing the representation of an image at different scales.
- The approximation coefficients capture the low-frequency information, while detail coefficients capture high-frequency
information.

B.Tech [AIML/DS]
EAIDS254 – COMPUTER VISION JEPPIAAR UNIVERSITY

Haar Wavelet:
- The Haar wavelet is a simple wavelet function used in basic wavelet transforms.
- It represents changes in intensity between adjacent pixels.

Wavelet Compression:
- Wavelet-based image compression techniques, such as JPEG2000, utilize wavelet transforms to efficiently represent
image data in both spatial and frequency domains.

Image Denoising:
- Wavelet-based thresholding techniques can be applied to denoise images by thresholding the wavelet coefficients.

Edge Detection:
- Wavelet transforms can be used for edge detection by analyzing the high-frequency components of the image.

Both pyramids and wavelets offer advantages in multi-resolution analysis, but they differ in terms of their representation
and construction. Pyramids use a hierarchical structure of smoothed and subsampled images, while wavelets use a
transform-based approach that decomposes the image into frequency components. The choice between pyramids and
wavelets often depends on the specific requirements of the image processing task at hand.

8. Geometric transformations :
Geometric transformations are operations that modify the spatial configuration of objects in a digital image. These
transformations are applied to change the position, orientation, scale, or shape of objects while preserving certain geometric
properties. Geometric transformations are commonly used in computer graphics, computer vision, and image processing.
Here are some fundamental geometric transformations:

B.Tech [AIML/DS]
EAIDS254 – COMPUTER VISION JEPPIAAR UNIVERSITY

1. Translation:
- Description: Moves an object by a specified distance along the x and/or y axes.
- Transformation Matrix (2D):

● Applications: Object movement, image registration.

2. Rotation:
● Description: Rotates an object by a specified angle about a fixed point.
● Transformation Matrix(2D):

● Applications: Image rotation, orientation adjustment.

B.Tech [AIML/DS]
EAIDS254 – COMPUTER VISION JEPPIAAR UNIVERSITY

3. Scaling:
● Description: Changes the size of an object by multiplying its coordinates by
scaling factors.
● Transformation Matrix(2D):

● Applications: Zooming in/out ,resizing.

4. Shearing:
● Description: Distorts the shape of an object by varying its coordinates linearly.
● Transformation Matrix(2D):

● Applications: Skewing, slanting.

5. Affine Transformation:
● Description:Combines translation, rotation, scaling, and shearing.

B.Tech [AIML/DS]
EAIDS254 – COMPUTER VISION JEPPIAAR UNIVERSITY

● Transformation Matrix(2D):

● Applications: Generalized transformations.

6. Perspective Transformation:
● Description: Represents a perspective projection, useful for simulating three-
dimensional effects.
● Transformation Matrix(3D):

● Applications:3D rendering, simulation.

7. Projective Transformation:
● Description: Generalization of perspective transformation with additional control points.
● Transformation Matrix(3D):More complex than the perspective transformation matrix.
● Applications: Computer graphics, augmented reality.

B.Tech [AIML/DS]
EAIDS254 – COMPUTER VISION JEPPIAAR UNIVERSITY

These transformations are crucial for various applications, including image manipulation, computer-aided design (CAD),
computer vision, and graphics rendering. Understanding and applying geometric transformations are fundamental skills in
computer science and engineering fields related to digital image processing.

9. Global optimization:

Global optimization is a branch of optimization that focuses on finding the global minimum or maximum of a
function over its entire feasible domain. Unlike local optimization, which aims to find the optimal solution
within a specific region, global optimization seeks the best possible solution across the entire search space.
Global optimization problems are often challenging due to the presence of multiple local optima or complex,
non-convex search spaces.
Here are key concepts and approaches related to global optimization:

Concepts:
Objective Function:
- The function to be minimized or maximized.

Feasible Domain:
- The set of input values (parameters) for which the objective function is defined.

Global Minimum/Maximum:
- The lowest or highest value of the objective function over the entire feasible domain.

Local Minimum/Maximum:
● A minimum or maximum within a specific region of the feasible domain.

B.Tech [AIML/DS]
EAIDS254 – COMPUTER VISION JEPPIAAR UNIVERSITY

Approaches:
Grid Search:
- Dividing the feasible domain into a grid and evaluating the objective function at each grid point to find the optimal
solution.

Random Search:
- Randomly sampling points in the feasible domain and evaluating the objective function to explore different regions.

Evolutionary Algorithms:
- Genetic algorithms, particle swarm optimization, and other evolutionary techniques use populations of solutions and
genetic operators to iteratively evolve toward the optimal solution.

Simulated Annealing:
- Inspired by the annealing process in metallurgy, simulated annealing gradually decreases the temperature to allow the
algorithm to escape local optima.

Ant Colony Optimization:


- Inspired by the foraging behavior of ants, this algorithm uses pheromone trails to guide the search for the optimal
solution.

Genetic Algorithms:
- Inspired by biological evolution, genetic algorithms use mutation, crossover, and selection to evolve a population of
potential solutions.

Particle Swarm Optimization:


- Simulates the social behavior of birds or fish, where a swarm of particles moves through the search space to find the
optimal solution.

Bayesian Optimization:
- Utilizes probabilistic models to model the objective function and guide the search toward promising regions.

Quasi-Newton Methods:
- Iterative optimization methods that use an approximation of the Hessian matrix to find the optimal solution efficiently.

Global optimization is applied in various fields, including engineering design, machine learning,
finance, and parameter tuning in algorithmic optimization. The choice of a specific global
optimization method depends on the characteristics of the objective function, the dimensionality
of the search space, and the available computational resources.

B.Tech [AIML/DS]
EAIDS254 – COMPUTER VISION JEPPIAAR UNIVERSITY

UNIT II
FEATUREDETECTION,MATCHINGANDSEGMENTATION
Pointsandpatches-Edges-Lines-Segmentation-Activecontours-Splitand merge - Mean
shift and mode finding - Normalized cuts - Graph cuts and energy-based methods.

1. PointsandPatches:

Points:

Definition:Pointsinthecontextofcomputervisiontypicallyreferto specific locations


or coordinates within an image.

Usage:Pointsareoftenusedaskeyinterestpointsorlandmarks.These can be locations


with unique features, such as corners, edges, or distinctive textures.

Applications:Pointsarecrucialinvariouscomputervisiontasks,including
featurematching,imageregistration,andobjecttracking.Algorithmsoften
detectandusepointsasreferencelocationsforcomparingandanalyzing images.

Patches:

Definition:Patchesaresmall,localizedregionsorsegmentswithinan image.

Usage: In computer vision, patches are often extracted from images to


focusonspecificareasofinterest.Theseareascanbedefinedbypoints or other criteria.

Applications:Patchesare commonlyused infeature extractionand


representation.Insteadofanalyzingentireimages,algorithmsmaywork
withpatchestocapturedetailedinformationabouttextures,patterns,or
structureswithintheimage.Patchesarealsoutilizedintaskslikeimage classification
and object recognition.

B.Tech [AIML/DS]
EAIDS254 – COMPUTER VISION JEPPIAAR UNIVERSITY

while "points" usually refer to specific coordinates or locations within an image, "patches"
are small, localized regions or segments extracted from images. Both
conceptsarefundamentalinvariouscomputervisionapplications,providingessential
informationfortaskssuchasimageanalysis,recognition,andunderstanding.Points
andpatchesplayacrucialroleintheextractionofmeaningfulfeaturesthatcontribute to the overall
interpretation of visual data by computer vision systems.

2. Edges

Inimageprocessingandcomputervision,"edges"refertosignificantchangesin
intensityorcolorwithinanimage.Edgesoftenrepresentboundariesortransitions
betweendifferentobjectsorregionsinanimage.Detectingedgesisafundamentalstep in various
computer vision tasks, as edges contain important information about the
structureandcontentofanimage. Here are
key points about edges:
Definition:

B.Tech [AIML/DS]
EAIDS254 – COMPUTER VISION JEPPIAAR UNIVERSITY

● Anedgeisasetofpixelswherethereisarapidtransitioninintensityor color. This


transition can occur between objects, textures, or other
featuresinanimage.
Importance:
● Edges are crucial for understanding the structure of an image. They
representboundariesbetweendifferentobjectsorregions,providing
valuableinformationforobjectrecognitionandsceneunderstanding.
EdgeDetection:
● Edgedetectionistheprocessofidentifyingandhighlightingedgeswithin
animage.Variousedgedetectionalgorithms,suchastheSobeloperator,
Cannyedgedetector,andLaplacianofGaussian(LoG),arecommonly used for this
purpose.
Applications:
● ObjectRecognition:Edgeshelpindefiningthecontoursandshapesof objects,
facilitating their recognition.
● ImageSegmentation:Edgesassistindividinganimageintomeaningful segments or
regions.
● FeatureExtraction:Edgesareimportantfeaturesthatcanbeextractedand used in
higher-level analysis.
● ImageCompression:Informationaboutedgescanbeusedtoreducethe amount of
data needed to represent an image.
TypesofEdges:
● StepEdges:Sharptransitionsinintensity.
● RampEdges:Gradualtransitionsinintensity.
● RoofEdges:Acombinationofstepandrampedges.
Challenges:
● Edgedetectionmaybesensitivetonoiseintheimage,andselectingan
appropriateedgedetectionalgorithmdependsonthecharacteristicsof the image
and the specific application.

B.Tech [AIML/DS]
EAIDS254 – COMPUTER VISION JEPPIAAR UNIVERSITY

3. Lines
Inthecontextofimageprocessingandcomputervision,"lines"refertostraightor
curvedsegmentswithinanimage.Detectingandanalyzinglinesisafundamental aspect of
image understanding and is important in various computer vision
applications.Herearekeypointsaboutlines:

Definition:
● Alineisasetofconnectedpixelswithsimilarcharacteristics,typically representing
a continuous or approximate curve or straight segment within an image.
LineDetection:
● Linedetectionistheprocessofidentifyingandextractinglinesfroman
image.HoughTransformisapopulartechniqueusedforlinedetection, especially
for straight lines.

TypesofLines:
● StraightLines:Linearsegmentswithaconstantslope.
● CurvedLines:Non-linearsegmentswithvaryingcurvature.
● LineSegments:Partiallineswithastartingandendingpoint.
Applications:
● ObjectDetection:Linescanbeimportantfeaturesinrecognizingand
understanding objects within an image.

B.Tech [AIML/DS]
EAIDS254 – COMPUTER VISION JEPPIAAR UNIVERSITY

● LaneDetection:Inthecontextofautonomousvehicles,detectingand tracking
lanes on a road.
● DocumentAnalysis:Recognizingandextractinglinesoftextindocument images.
● IndustrialInspection:Inspectingandanalyzingpatternsorstructuresin
manufacturing processes.
Representation:
● Linescanberepresentedusingmathematicalequations,suchasthe slope-
intercept form (y = mx + b) for straight lines.
Challenges:
● Linedetectionmaybeaffectedbynoiseintheimageorvariationsin lighting
conditions. Robust algorithms are needed to handle these
challenges.
LineSegmentation:
● Linesegmentationinvolvesdividinganimageintosegmentsbasedonthe presence of
lines. This is useful in applications like document layout analysis and text
extraction.
HoughTransform:
● TheHoughTransformisawidelyusedtechniquefordetectinglinesinan
image.Itrepresentslinesinaparameterspaceandidentifiespeaksinthis space as
potential lines.

Inthislinesareimportantfeaturesinimagesandplayacrucialroleincomputervision applications.
Detecting and understanding lines contribute to tasks such as object
recognition,imagesegmentation,andanalysisofstructuralpatterns.Thechoiceofline
detectionmethodsdependsonthespecificcharacteristicsoftheimageandthegoals of the computer
vision application.

4. Segmentation
Imagesegmentationisacomputervisiontaskthatinvolvespartitioninganimageinto meaningful and
semantically coherent regions or segments. The goal is to group
togetherpixelsorregionsthatsharesimilarvisualcharacteristics,suchascolor,texture, or intensity.
Image segmentation is a crucial step in various computer vision
applicationsasitprovidesamoredetailedandmeaningfulunderstandingofthecontent within an image.
Here are key points about image segmentation:

Definition:

B.Tech [AIML/DS]
EAIDS254 – COMPUTER VISION JEPPIAAR UNIVERSITY

● Imagesegmentationistheprocessofdividinganimageintodistinctand
meaningfulsegments.Eachsegmenttypicallycorrespondstoaregionor object in the
image.
Purpose:
● Segmentationisusedtosimplifytherepresentationofanimage,makingit easier to
analyze and understand. It helps in identifying and delineating
differentobjectsorregionswithintheimage.
TypesofSegmentation:
● SemanticSegmentation:Assigningaspecificclasslabeltoeachpixelin
theimage,resultinginadetailedunderstandingoftheobjectcategories present.
● InstanceSegmentation:Identifyinganddelineatingindividualinstancesof objects
within the image. Each instance is assigned a unique label.
● BoundaryorEdge-basedSegmentation:Detectingedgesorboundaries between
different regions in the image.
● Region-basedSegmentation:Groupingpixelsintohomogeneousregions based on
similarity criteria.
Algorithms:
● Variousalgorithmsareusedforimagesegmentation,including
region-growingmethods,clusteringalgorithms(e.g.,K-means),watershed algorithms,
and deep learning-based approaches using convolutional
neuralnetworks(CNNs).
Applications:
● ObjectRecognition:Segmentationhelpsinisolatingandrecognizing individual
objects within an image.
● MedicalImaging:Identifyingandsegmentingstructuresoranomaliesin medical
images.
● AutonomousVehicles:Segmentingtheenvironmenttodetectand
understand objects on the road.
● Satellite Image Analysis: Partitioning satellite images into meaningful regions
for land cover classification.
● Robotics:Enablingrobotstounderstandandinteractwiththeir
environment by segmenting objects and obstacles.
Challenges:
● Imagesegmentationcanbechallengingduetovariationsinlighting,
complexobjectshapes,occlusions,andthepresenceofnoiseinthe image.
EvaluationMetrics:

B.Tech [AIML/DS]
EAIDS254 – COMPUTER VISION JEPPIAAR UNIVERSITY

● Commonmetricsforevaluatingsegmentationalgorithmsinclude
IntersectionoverUnion(IoU),Dicecoefficient,andPixelAccuracy.

Inimagesegmentationisafundamentaltaskincomputervisionthatinvolvesdividing an image into


meaningful segments to facilitate more advanced analysis and
understanding.Thechoiceofsegmentationmethoddependsonthespecific characteristics of the
images and the requirements of the application.

5. ActiveContours

Activecontours,alsoknownassnakes,areaconceptincomputervisionandimage
processingthatreferstodeformablemodelsusedforimagesegmentation.Theidea
behindactivecontoursistoevolveacurveorcontourwithinanimageinawaythat
capturestheboundariesofobjectsorregionsofinterest.Thesecurvesdeformunder the influence of
internal forces (encouraging smoothness) and external forces
(attractedtofeaturesintheimage).

B.Tech [AIML/DS]
EAIDS254 – COMPUTER VISION JEPPIAAR UNIVERSITY

Keyfeaturesofactivecontoursinclude:

Initialization:
● Activecontoursaretypicallyinitializedneartheboundariesoftheobjects to be
segmented. The initial contour can be a closed curve or an open
curvedependingontheapplication.
EnergyMinimization:
● The evolution of the active contour is guided by an energy function that
combinesinternalandexternalforces.Thegoalistominimizethisenergy to achieve an
optimal contour that fits the boundaries of the object.
InternalForces:
● Internalforcesareassociatedwiththedeformationofthecontouritself.
Theyincludetermsthatencouragesmoothnessandcontinuityofthe
curve.Theinternalenergyhelpspreventthecontourfromoscillatingor exhibiting
unnecessary deformations.
ExternalForces:
● Externalforcesarederivedfromtheimagedataanddrivethecontour
towardtheboundariesofobjects.Theseforcesareattractedtofeatures such as edges,
intensity changes, or texture gradients in the image.
SnakesAlgorithm:
● Thesnakesalgorithmisawell-knownmethodforactivecontourmodeling. It was
introduced by Michael Kass, Andrew Witkin, and Demetri Terzopoulos in 1987.
The algorithm involves iterative optimization of the
energyfunctiontodeformthecontour.
Applications:
● Activecontoursareusedinvariousimagesegmentationapplications,
suchasmedicalimageanalysis,objecttracking,andcomputervision tasks where
precise delineation of object boundaries is required.
Challenges:
● Activecontoursmayfacechallengesinthepresenceofnoise,weak edges, or
complex object structures. Careful parameter tuning and
initializationareoftenrequired.
Variations:
● Therearevariationsofactivecontours,includinggeodesicactivecontours and level-set
methods, which offer different formulations for contour
evolutionandsegmentation.

Activecontoursprovideaflexibleframeworkforinteractiveandsemi-automatic
segmentationbyallowinguserstoguidetheevolutionofthecontour.Whiletheyhave

B.Tech [AIML/DS]
EAIDS254 – COMPUTER VISION JEPPIAAR UNIVERSITY

beenwidelyused,thechoiceofsegmentationmethoddependsonthespecific characteristics of the


images and the requirements of the application.

6. SplitandMerge
SplitandMergeisarecursiveimagesegmentationalgorithmthatdividesanimageinto
homogeneousregionsbasedoncertaincriteria.Theprimaryideabehindthealgorithm
istorecursivelysplitanimageintosmallerblocksuntilcertainconditionsaremet,and
thenmergethoseblocksiftheyaresufficientlyhomogeneous.Thisprocesscontinues
iterativelyuntilthedesiredlevelofsegmentationisachieved.

HereisanoverviewoftheSplitandMergealgorithm: Splitting
Phase:
● Thealgorithmstartswiththeentireimageasasingleblock.
● Itevaluatesasplittingcriteriontodetermineiftheblockissufficiently
homogeneous or should be split further.
● Ifthesplittingcriterionismet,theblockisdividedintofourequal
sub-blocks(quadrants),andtheprocessisappliedrecursivelytoeach sub-block.

B.Tech [AIML/DS]
EAIDS254 – COMPUTER VISION JEPPIAAR UNIVERSITY

MergingPhase:
● Oncetherecursivesplittingreachesacertainlevelorthesplittingcriterion is no longer
satisfied, the merging phase begins.
● Adjacentblocksareexaminedtocheckiftheyarehomogeneousenough to be
merged.
● Ifthemergingcriterionissatisfied,neighboringblocksaremergedintoa larger block.
● Themergingprocesscontinuesuntilnofurthermergingispossible,and the
segmentation is complete.
HomogeneityCriteria:
● The homogeneity of a block or region is determined based on certain criteria,
such as color similarity, intensity, or texture. For example, blocks
maybeconsideredhomogeneousifthevarianceofpixelvalueswithinthe block is
below a certain threshold.
RecursiveProcess:
● Thesplittingandmergingphasesareappliedrecursively,leadingtoa hierarchical
segmentation of the image.
Applications:
● Split and Merge can be used for image segmentation in various
applications,includingobjectrecognition,sceneanalysis,andcomputer vision tasks
where delineation of regions is essential.
Challenges:
● The performance of Split and Merge can be affected by factors such as
noise,unevenlighting,orthepresenceofcomplexstructuresintheimage.

The Split and Merge algorithm provides a way to divide an image into regions of homogeneous
content, creating a hierarchical structure. While it has been used historically, more recent image
segmentation methods often involve advanced techniques,suchasmachinelearning-
basedapproaches(e.g.,convolutionalneural networks)orotherregion-
growingalgorithms.Thechoiceofsegmentationmethod
dependsonthecharacteristicsoftheimagesandthespecific requirementsofthe application.

7. MeanShiftandModeFinding
MeanShiftisanon-parametricclusteringalgorithmcommonlyusedforimage
segmentationandobjecttracking.Thealgorithmworksbyiterativelyshiftingasetof
datapointstowardsthemodeorpeakofthedatadistribution.Inthecontextofimage

B.Tech [AIML/DS]
EAIDS254 – COMPUTER VISION JEPPIAAR UNIVERSITY

processing,MeanShiftcanbeappliedtogrouppixelswithsimilarcharacteristicsinto coherent
segments.
Here'sabriefoverviewoftheMeanShiftalgorithm: Kernel
Density Estimation:
● Thealgorithmbeginsbyestimatingtheprobabilitydensityfunction(PDF)
oftheinputdatapoints.Thisistypicallydoneusingakernelfunction,such as a Gaussian
kernel.
Initialization:
● Eachdatapointisconsideredasacandidateclustercenter. Mean Shift
Iterations:
● Foreachdatapoint,ameanshiftvectoriscomputed.Themeanshift vector points
towards the mode or peak of the underlying data
distribution.
● Datapointsareiterativelyshiftedinthedirectionofthemeanshiftvector until
convergence.
ConvergenceCriteria:
● Thealgorithmconvergeswhenthemeanshiftvectorsbecomeverysmall or when the
points reach local modes in the data distribution.
ClusterAssignment:
● Afterconvergence,datapointsthatconvergetothesamemodeare assigned to
the same cluster.

MeanShifthasbeensuccessfullyappliedtoimagesegmentation,whereiteffectively groups pixels


with similar color or intensity values into coherent segments.

Now,let'stalkaboutmodefinding:

Instatisticsanddataanalysis,a"mode"referstothevalueorvaluesthatappearmost frequently in a
dataset. Mode finding, in the context of Mean Shift or other clustering algorithms, involves
identifying the modes or peaks in the data distribution.

ForMeanShift:

● ModeFindinginMeanShift:
● Themeanshiftprocessinvolvesiterativelyshiftingtowardsthemodesof the
underlying data distribution.

B.Tech [AIML/DS]
EAIDS254 – COMPUTER VISION JEPPIAAR UNIVERSITY

● Eachclusterisassociatedwithamode,andthemeanshiftvectorsguide the data points


toward these modes during the iterations.

MeanShiftisanalgorithmthatperformsmodefindingtoidentifyclustersinadataset. In image
processing, it is often used for segmentation by iteratively shifting towards modes in the color
or intensity distribution, effectively grouping pixels into coherent
segments.

8. Normalized Cuts
NormalizedCutsisagraph-basedimagesegmentationalgorithmthatseekstodivide
animageintomeaningfulsegmentsbyconsideringboththesimilaritybetweenpixels
andthedissimilaritybetweendifferentsegments.ItwasintroducedbyJianboShiand Jitendra Malik
in 2000 and has been widely used in computer vision and image
processing.

Here'sahigh-leveloverviewoftheNormalizedCutsalgorithm: Graph
Representation:
● Theimageisrepresentedasanundirectedgraph,whereeachpixelisa node in the
graph, and edges represent relationships between pixels. Edges are weighted
based on the similarity between pixel values.
AffinityMatrix:
● An affinity matrix is constructed to capture the similarity between pixels.
Theentriesofthismatrixrepresenttheweightsofedgesinthegraph,and
thevaluesaredeterminedbyasimilaritymetric,suchascolorsimilarityor texture
similarity.
SegmentationObjective:

B.Tech [AIML/DS]
EAIDS254 – COMPUTER VISION JEPPIAAR UNIVERSITY

● Thegoalistopartitionthegraphintotwoormoresegmentsinawaythat minimizes the


dissimilarity between segments and maximizes the
similaritywithinsegments.
NormalizedCutsCriteria:
● Thealgorithmformulatesthesegmentationproblemusinganormalized cuts
criteria, which is a ratio of the sum of dissimilarities between segments to the
sum of similarities within segments.
● Thenormalizedcutscriteriaaremathematicallydefined,andoptimization techniques
are applied to find the partition that minimizes this criteria.
EigenvalueProblem:
● Theoptimizationprobleminvolvessolvinganeigenvalueproblemderived from the
affinity matrix. The eigenvectors corresponding to the smallest eigenvalues
provide information about the optimal segmentation.
RecursiveApproach:
● Toachievemulti-segmentation,thealgorithmemploysarecursive
approach.Aftertheinitialsegmentation,eachsegmentisfurtherdivided intosub-
segmentsbyapplyingthesameprocedurerecursively.
Advantages:
● NormalizedCutsiscapableofcapturingbothspatialandcolor
information in the segmentation process.
● Itavoidsthebiastowardssmall,compactsegments,makingitsuitablefor segmenting
images with non-uniform structures.
Challenges:
● Thecomputationalcomplexityofsolvingtheeigenvalueproblemcanbea limitation,
particularly for large images.

Normalized Cuts has been widely used in image segmentation tasks, especially when capturing
global structures and relationships between pixels is essential. It has
applicationsincomputervision,medicalimageanalysis,andotherareaswhereprecise segmentation is
crucial.

9. GraphCutsandEnergy-BasedMethods
Graphcutsandenergy-basedmethodsarewidelyusedincomputervisionandimage processing for
solving optimization problems related to image segmentation. These
methodsoftenleveragegraphrepresentationsofimagesanduseenergyfunctionsto model the desired
properties of the segmentation.

B.Tech [AIML/DS]
EAIDS254 – COMPUTER VISION JEPPIAAR UNIVERSITY

GraphCuts:
Graphcutsinvolvepartitioningagraphintotwodisjointsetssuchthatthecutcost(the
sumofweightsofedgescrossingthecut)isminimized.Inimagesegmentation,pixels
arerepresentedasnodes,andedgesareweightedbasedonthedissimilaritybetween pixels.

GraphRepresentation:
● Eachpixelisanode,andedgesconnectadjacentpixels.Theweightsof edges reflect
the dissimilarity between pixels (e.g., color, intensity).
EnergyMinimization:
● The problem is formulated as an energy minimization task, where the
energyfunctionincludestermsencouragingsimilaritywithinsegments and
dissimilarity between segments.
BinaryGraphCut:
● Inthesimplestcase,thegoalistopartitionthegraphintotwosets
(foregroundandbackground)byfindingthecutwiththeminimumenergy.
MulticlassGraphCut:
● Theapproachcanbeextendedtohandlemultipleclassesorsegmentsby using
techniques like the normalized cut criterion.
Applications:
● Graphcutsareusedinimagesegmentation,objectrecognition,stereo vision, and
other computer vision tasks.

Energy-BasedMethods:
Energy-basedmethodsinvolveformulatinganenergyfunctionthatmeasuresthequality
ofaparticularconfigurationorassignmentoflabelstopixels.Theoptimizationprocess

B.Tech [AIML/DS]
EAIDS254 – COMPUTER VISION JEPPIAAR UNIVERSITY

aimstofindthelabelassignmentthatminimizestheenergy.

EnergyFunction:
● Theenergyfunctionisdefinedbasedonfactorssuchasdataterms
(measuringagreementwithobserveddata)andsmoothnessterms
(encouraging spatial coherence).
UnaryandPairwiseTerms:
● Unarytermsareassociatedwithindividualpixelsandcapturethe
likelihoodofapixelbelongingtoaparticularclass.Pairwisetermsmodel relationships
between neighboring pixels and enforce smoothness.
MarkovRandomFields(MRFs)andConditionalRandomFields(CRFs):
● MRFsandCRFsarecommonframeworksformodelingenergy-based methods.
MRFs consider local interactions, while CRFs model dependencies more
globally.
IterativeOptimization:
● Optimizationtechniqueslikebeliefpropagationorgraphcutsareoften
usediterativelytofindthelabelassignmentthatminimizestheenergy.
Applications:
● Energy-basedmethodsareappliedinimagesegmentation,image denoising,
image restoration, and various other vision tasks.

Bothgraphcutsandenergy-basedmethodsprovidepowerfultoolsforimage
segmentationbyincorporatinginformationaboutpixelrelationshipsandmodelingthe
desiredpropertiesofsegmentedregions.Thechoicebetweenthemoftendependson the specific
characteristics of the problem at hand.

B.Tech [AIML/DS]
EAIDS254 – COMPUTER VISION JEPPIAAR UNIVERSITY

B.Tech [AIML/DS]
EAIDS254 – COMPUTER VISION JEPPIAAR UNIVERSITY

UNITIII
FEATURE-BASEDALIGNMENT&MOTIONESTIMATION
2Dand3Dfeature-basedalignment-Poseestimation-Geometricintrinsic calibration-
Triangulation-Two-framestructurefrommotion-Factorization
-Bundleadjustment-Constrainedstructureandmotion-Translational alignment -
Parametric motion - Spline-based motion - Optical flow - Layered motion.

1. 2Dand3Dfeature-basedalignment:

2D and 3D Feature-Based Alignment

Feature-basedalignmentisatechniqueusedincomputervisionandimageprocessing
toalignormatchcorrespondingfeaturesindifferentimagesorscenes.Thealignment can be
performed in either 2D or 3D space, depending on the nature of the data.

2DFeature-BasedAlignment:
● Definition:In2Dfeature-basedalignment,thegoalistoalignandmatch features in
two or more 2D images.
● Features:Featurescanincludepoints,corners,edges,orotherdistinctive patterns.
● Applications:Commonlyusedinimagestitching,panoramacreation, object
recognition, and image registration.
3DFeature-BasedAlignment:

B.Tech [AIML/DS]
EAIDS254 – COMPUTER VISION JEPPIAAR UNIVERSITY

● Definition:In3Dfeature-basedalignment,thegoalistoalignandmatch features in
three-dimensional space, typically in the context of 3D
reconstructionorscene understanding.
● Features:Featurescanincludekeypoints,landmarks,orotherdistinctive 3D points.
● Applications: Used in 3D reconstruction, simultaneous localization and
mapping(SLAM),objectrecognitionin3Dscenes,andaugmentedreality.
Techniquesfor2Dand3DFeature-BasedAlignment:
● Correspondence Matching: Identifying corresponding features in different images
or 3D point clouds.
● RANSAC(RandomSampleConsensus):Robustestimationtechniqueto find the
best-fitting model despite the presence of outliers.
● TransformationModels:Applyingtransformationmodels(affine,
homography for 2D; rigid body, affine for 3D) to align features.
● IterativeOptimization:Refiningthealignmentthroughiterative
optimization methods such as Levenberg-Marquardt.
Challenges:
● NoiseandOutliers:Real-worlddataoftencontainsnoiseandoutliers, requiring
robust techniques for feature matching.
● ScaleandViewpointChanges:Featuresmayundergochangesinscaleor viewpoint,
requiring methods that are invariant to such variations.
Applications:
● ImageStitching:Aligningandstitchingtogethermultipleimagestocreate panoramic
views.
● RoboticsandSLAM:Aligningconsecutiveframesinthecontextofrobotic navigation
and simultaneous localization and mapping.
● MedicalImaging:Aligning2Dslicesor3Dvolumesforaccuratemedical image
analysis.
Evaluation:
● AccuracyandRobustness:Theaccuracyandrobustnessoffeature-based alignment
methods are crucial for their successful application in various domains.

Feature-basedalignmentisafundamentaltaskincomputervision,enablingthe
integrationofinformationfrommultipleviewsormodalitiesforimprovedanalysisand understanding
of the visual world.

B.Tech [AIML/DS]
EAIDS254 – COMPUTER VISION JEPPIAAR UNIVERSITY

2. Poseestimation:

Poseestimationisacomputervisiontaskthatinvolvesdeterminingthepositionand
orientationofanobjectorcamerarelativetoacoordinatesystem.Itisacrucialaspect
ofunderstandingthespatialrelationshipsbetweenobjectsinascene.Poseestimation can be applied to
both 2D and 3D scenarios, and it finds applications in various fields,
includingrobotics,augmentedreality,autonomousvehicles,andhuman-computer
interaction.

2DPoseEstimation:
● Definition:In2Dposeestimation,thegoalistoestimatetheposition
(translation)andorientation(rotation)ofanobjectinatwo-dimensional image.
● Methods: Techniques include keypoint-based approaches, where
distinctivepoints(suchascornersorjoints)aredetectedandusedto
estimatepose.CommonmethodsincludePnP(Perspective-n-Point) algorithms.
3DPoseEstimation:
● Definition:In3Dposeestimation,thegoalistoestimatethepositionand orientation of
an object in three-dimensional space.
● Methods:Ofteninvolvesassociating2Dkeypointswithcorresponding3D points. PnP
algorithms can be extended to 3D, and there are other methods like Iterative
Closest Point (ICP) for aligning a 3D model with a point cloud.
Applications:
● Robotics:Poseestimationiscrucialforroboticsystemstonavigateand interact with
the environment.
● AugmentedReality:Enablesthealignmentofvirtualobjectswiththe real-world
environment.

B.Tech [AIML/DS]
EAIDS254 – COMPUTER VISION JEPPIAAR UNIVERSITY

● AutonomousVehicles:Usedforunderstandingthepositionand
orientation of the vehicle in its surroundings.
● HumanPoseEstimation:Estimatingtheposeofaperson,oftenusedin applications
like gesture recognition and action recognition.
CameraPoseEstimation:
● Definition:Estimatingtheposeofacamera,whichinvolvesdeterminingits position
and orientation in the scene.
● Methods:Cameraposecanbeestimatedusingvisualodometry,SLAM
(Simultaneous Localization and Mapping), or using known reference points
in the environment.
Challenges:
● Ambiguity:Limitedinformationorsimilarappearanceofdifferentposes can
introduce ambiguity.
● Occlusion:Partiallyorfullyoccludedobjectscanmakeposeestimation challenging.
● Real-timeRequirements:Manyapplications,especiallyinroboticsand augmented
reality, require real-time pose estimation.
EvaluationMetrics:
● Commonmetricsincludetranslationandrotationerrors,whichmeasure the
accuracy of the estimated pose compared to ground truth.
DeepLearningApproaches:
● Recent advances in deep learning have led to the development of neural
network-basedmethodsforposeestimation,leveragingarchitectureslike
convolutional neural networks (CNNs) for feature extraction.

Poseestimationisafundamentaltaskincomputervisionwithwidespreadapplications.
Itplaysacrucialroleinenablingmachinestounderstandthespatialrelationships between objects and
the environment.

3. Geometricintrinsiccalibration:

Geometricintrinsiccalibrationisaprocessincomputervisionandcameracalibration that involves


determining the intrinsic parameters of a camera. Intrinsic parameters describe the internal
characteristics of a camera, such as its focal length, principal
point,andlensdistortioncoefficients.Accuratecalibrationisessentialforapplications

B.Tech [AIML/DS]
EAIDS254 – COMPUTER VISION JEPPIAAR UNIVERSITY

like3Dreconstruction,objecttracking,andaugmentedreality,whereknowingthe intrinsic properties of


the camera is crucial for accurate scene interpretation.

Herearekeypointsrelatedtogeometricintrinsiccalibration: Intrinsic
Parameters:
● FocalLength(f):Representsthedistancefromthecamera'sopticalcenter to the image
plane. It is a critical parameter for determining the scale of objects in the scene.
● PrincipalPoint(c):Denotesthecoordinatesoftheimagecenter.It
representstheoffsetfromthetop-leftcorneroftheimagetothecenterof the image
plane.
● LensDistortionCoefficients:Describeimperfectionsinthelens,suchas radial and
tangential distortions, that affect the mapping between 3D
worldpointsand2Dimagepoints.
CameraModel:
● Thecameramodel,oftenusedforintrinsiccalibration,isthepinhole
cameramodel.Thismodelassumesthatlightentersthecamerathrough a single point
(pinhole) and projects onto the image plane.
Calibration Patterns:
● Intrinsiccalibrationistypicallyperformedusingcalibrationpatternswith
knowngeometricfeatures,suchaschessboardpatterns.Thesepatterns allow for the
extraction of corresponding points in both 3D world
coordinatesand2Dimagecoordinates.
Calibration Process:
● ImageCapture:Multipleimagesofthecalibrationpatternarecaptured from
different viewpoints.
● FeatureExtraction:Detectedfeatures(corners,intersections)inthe
calibrationpatternareidentifiedinbothimageandworldcoordinates.

B.Tech [AIML/DS]
EAIDS254 – COMPUTER VISION JEPPIAAR UNIVERSITY

● Parameter Estimation: Theintrinsic parameters areestimated using


mathematicaloptimizationtechniques,suchasnonlinearleastsquares optimization.
● Evaluation:Theaccuracyofcalibrationisoftenassessedbyreprojecting 3D points
onto the images and comparing with the detected 2D points.
RadialandTangentialDistortions:
● RadialDistortion:Deviationfromaperfectpinholecameramodeldueto radial
symmetry. Corrected using distortion coefficients.
● TangentialDistortion:Causedbythelensnotbeingperfectlyparalleltothe image plane.
Corrected using tangential distortion coefficients.
MultipleViews:
● Calibrationisoftenperformedusingmultipleviewstoimproveaccuracy and handle
lens distortions more effectively.
Applications:
● Intrinsiccalibrationisessentialforvariouscomputervisionapplications,
including3Dreconstruction,cameraposeestimation,andstereovision.

Accurategeometricintrinsiccalibrationisacriticalstepinensuringthatthecamera
modelaccuratelyrepresentsthemappingbetweenthe3Dworldandthe2Dimage, facilitating precise
computer vision tasks

4. Triangulation:

Triangulation, in the context of computer vision and 3D computer graphics, is a


techniqueusedtodeterminethe3Dcoordinatesofapointinspacebycomputingits position relative to
multiple camera viewpoints. The process involves finding the
intersectionpointoflinesorraysoriginatingfromcorresponding2Dimagepointsin different camera
views.

B.Tech [AIML/DS]
EAIDS254 – COMPUTER VISION JEPPIAAR UNIVERSITY

Herearekeypointsrelatedtotriangulation:

BasicConcept:
● Triangulationisbasedontheprincipleoffindingthe3Dlocationofapoint in space by
measuring its projection onto two or more image planes.
CameraSetup:
● Triangulationrequiresatleasttwocameras(stereovision)ormoreto
capturethesamescenefromdifferentviewpoints.Eachcameraprovides a 2D
projection of the 3D point.
MathematicalRepresentation:

EpipolarGeometry:
● Epipolargeometryisutilizedtorelatethe2Dprojectionsofapointin
differentcameraviews.Itdefinesthegeometricrelationshipbetweenthe
twocameraviewsandhelpsestablishcorrespondencesbetweenpoints.
TriangulationMethods:
● DirectLinearTransform(DLT):Analgorithmicapproachthatinvolves solving a
system of linear equations to find the 3D coordinates.
● IterativeMethods:AlgorithmsliketheGauss-Newtonalgorithmorthe
Levenberg-Marquardtalgorithmcanbeusedforrefining theinitial estimate
obtained through DLT.
AccuracyandPrecision:
● Theaccuracyoftriangulationisinfluencedbyfactorssuchasthe
calibrationaccuracyofthecameras,thequalityoffeaturematching,and the level of
noise in the image data.
BundleAdjustment:
● Triangulation is often used in conjunction with bundle adjustment, a
techniquethatoptimizestheparametersofthecamerasandthe3Dpoints simultaneously
to minimize the reprojection error.
Applications:
● 3DReconstruction:Triangulationisfundamentaltocreating3Dmodelsof scenes or
objects from multiple camera views.

B.Tech [AIML/DS]
EAIDS254 – COMPUTER VISION JEPPIAAR UNIVERSITY

● StructurefromMotion(SfM):UsedinSfMpipelinestoestimatethe3D structure of
a scene from a sequence of images.
● StereoVision:Essentialfordepthestimationinstereovisionsystems. Challenges:
● Ambiguity: Ambiguities may arise when triangulating points from two
viewsiftheviewsarenotwell-separatedorifthepointisnearthebaseline connecting the
cameras.
● NoiseandErrors:Triangulationresultscanbesensitivetonoiseanderrors in feature
matching and camera calibration.

Triangulationisacoretechniqueincomputervisionthatenablesthereconstructionof
3Dgeometryfrommultiple2Dimages.Itplaysacrucialroleinapplicationssuchas3D
modeling,augmentedreality,andstructure-from-motionpipelines.

5. Two-framestructurefrommotion:

Two-Frame Structure from Motion

StructurefromMotion(SfM)isacomputervisiontechniquethataimstoreconstructthe three-
dimensionalstructureofascenefromasequenceoftwo-dimensionalimages.
Two-frameStructurefromMotionspecificallyreferstothereconstructionofscene geometry using
information from only two images (frames) taken from different
viewpoints.Thisprocessinvolvesestimatingboththe3Dstructureofthesceneandthe camera motion
between the two frames.

B.Tech [AIML/DS]
EAIDS254 – COMPUTER VISION JEPPIAAR UNIVERSITY

HerearekeypointsrelatedtoTwo-FrameStructurefromMotion:

BasicConcept:
● Two-frameStructurefromMotionreconstructsthe3Dstructureofascene by analyzing
the information from just two images taken from different perspectives.
CorrespondenceMatching:
● Establishing correspondences between points or features in the two
imagesisacrucialstep.Thisisoftendonebyidentifyingkeyfeatures
(suchaskeypoints)inbothimagesandfindingtheircorrespondences.
EpipolarGeometry:
● Epipolar geometry describes the relationship between corresponding
pointsintwoimagestakenbydifferentcameras.Ithelpsconstrainthe possible 3D
structures and camera motions.
EssentialMatrix:
● Theessentialmatrixisafundamentalmatrixinepipolargeometrythat
encapsulatestheessentialinformationabouttherelativeposeoftwo calibrated
cameras.
CameraPoseEstimation:
● Thecameraposes(positionsandorientations)areestimatedforboth
frames.Thisinvolvessolvingfortherotationandtranslationbetweenthe two camera
viewpoints.
Triangulation:
● Triangulationisappliedtofindthe3Dcoordinatesofpointsinthescene. By knowing
the camera poses and corresponding points, the depth of scene points can be
estimated.
BundleAdjustment:
● Bundleadjustmentisoftenusedtorefinetheestimatesofcameraposes and 3D points.
It is an optimization process that minimizes the error
betweenobservedandpredictedimagepoints.
DepthAmbiguity:
● Two-frameSfMissusceptibletodepthambiguity,meaningthatthe
reconstructedscenecouldbescaledormirroredwithoutaffectingthe projections onto
the images.
Applications:
● Robotics:Two-frameSfMisusedinroboticsforenvironmentmappingand navigation.
● AugmentedReality:Reconstructionofthe3Dstructureforoverlaying virtual
objects onto the real-world scene.

B.Tech [AIML/DS]
EAIDS254 – COMPUTER VISION JEPPIAAR UNIVERSITY

● ComputerVisionResearch:StudyingtheprinciplesofSfMandepipolar geometry.
Challenges:
● NoiseandOutliers:Theaccuracyofthereconstructioncanbeaffectedbynoise and outliers in
the correspondence matching process.
● LimitedBaseline:Withonlytwoframes,thebaseline(distancebetweencamera viewpoints)
may be limited, leading to potential depth ambiguities.

Two-frameStructurefromMotionisafundamentalconceptincomputervision,
providingafoundationforunderstanding3Dscenestructurefromapairofimages.Itis
oftenextendedtomulti-frameSfMformorerobustreconstructionsinscenarioswhere more images are
available.

6. Factorization:

Factorizationinthecontextofcomputervisiontypicallyreferstothefactorizationof
matricesortensorsrepresentingdatainvariouscomputervisiontasks.Onecommon
applicationisinthefieldofstructurefrommotion(SfM)andmultiple-viewgeometry. Here are key
points related to factorization in computer vision:

MatrixFactorizationinSfM:
● ProblemStatement:Instructurefrommotion,thegoalistoreconstruct the 3D
structure of a scene from a sequence of 2D images taken from different
viewpoints.
● MatrixRepresentation: The correspondencematrix, alsoknown as the
measurementmatrix,isconstructedbystackingtheimagecoordinatesof
corresponding points from multiple views.
● Matrix Factorization: Factorizing the correspondence matrix into two
matricesrepresentingcameraparametersand3Dstructureisacommon approach.
This factorization is often achieved through techniques like
SingularValueDecomposition(SVD).
SingularValueDecomposition(SVD):
● Application:SVDisfrequentlyusedinmatrixfactorizationproblemsin computer
vision.

B.Tech [AIML/DS]
EAIDS254 – COMPUTER VISION JEPPIAAR UNIVERSITY

Applications:
● StructurefromMotion(SfM):Factorizationisusedtorecovercamera poses and
3D scene structure from 2D image correspondences.
● BackgroundSubtraction:Matrixfactorizationtechniquesareemployedin background
subtraction methods for video analysis.
● FaceRecognition:EigenfaceandFisherfacemethodsinvolvefactorizing
covariance matrices for facial feature representation.
Non-NegativeMatrixFactorization(NMF):
● Application:NMFisavariantofmatrixfactorizationwherethefactorsare constrained
to be non-negative.
● UseCases:Itisappliedinareassuchastopicmodeling,image
segmentation, and feature extraction.
TensorFactorization:
● ExtensiontoHigherDimensions:Insomecases,dataisrepresentedas tensors, and
factorization techniques are extended to tensors for applications like multi-
way data analysis.
● Example:CanonicalPolyadicDecomposition(CPD)isatensor
factorization technique.
RobustFactorization:
● Challenges:Noiseandoutliersinthedatacanaffecttheaccuracyof factorization.
● RobustMethods:Robustfactorizationtechniquesaredesignedtohandle noisy data
and outliers, providing more reliable results.
DeepLearningApproaches:
● AutoencodersandNeuralNetworks:Deeplearningmodels,including
autoencoders,canbeconsideredasaformofnonlinearfactorization.
FactorizationMachine(FM):
● Application:FactorizationMachinesareusedincollaborativefilteringand
recommendation systems to model interactions between features.

Factorizationplaysacrucialroleinvariouscomputervisionandmachinelearningtasks, providing a
mathematical framework for extracting meaningful representations from

B.Tech [AIML/DS]
EAIDS254 – COMPUTER VISION JEPPIAAR UNIVERSITY

dataandsolvingcomplexproblemslike3Dreconstructionanddimensionality reduction.

7. Bundle adjustment:

Bundle Adjustment is a crucial optimization technique in computer vision and


photogrammetry.Itisusedtorefinetheparametersofa3Dscene,suchascamera
posesand3Dpoints,byminimizingthereprojectionerrorbetweentheobservedimage points and their
corresponding projections from the 3D scene. Bundle Adjustment is
commonlyemployedinthecontextofstructurefrommotion(SfM),simultaneous localization and
mapping (SLAM), and 3D reconstruction.

HerearekeypointsrelatedtoBundleAdjustment:
Optimization Objective:
● Minimization of Reprojection Error: Bundle Adjustment aims to find the
optimalsetofparameters(cameraposes,3Dpoints)thatminimizesthe difference
between the observed 2D image points and their projections onto the image
planes based on the estimated 3D scene.
ParameterstoOptimize:
● Camera Parameters: Intrinsic parameters (focal length, principal point) and
extrinsic parameters (camera poses - rotation and translation).
● 3DSceneStructure:Coordinatesof3Dpointsinthescene.
Reprojection Error:
● Definition: Thereprojectionerroristhedifferencebetweentheobserved
2Dimagepointsandtheprojectionsofthecorresponding3Dpointsonto the image
planes.
● SumofSquaredDifferences:Theobjectiveistominimizethesumof squared
differences between observed and projected points.

B.Tech [AIML/DS]
EAIDS254 – COMPUTER VISION JEPPIAAR UNIVERSITY

BundleAdjustmentProcess:
● Initialization:Startwithinitialestimatesofcameraposesand3Dpoints.
● ObjectiveFunction:Defineanobjectivefunctionthatmeasuresthe
reprojection error.
● Optimization:Useoptimizationalgorithms(suchasLevenberg-Marquardt, Gauss-
Newton,orothers)toiterativelyrefinetheparameters,minimizing the reprojection
error.
SparseandDenseBundleAdjustment:
● SparseBA:Considersasubsetof3Dpointsandimagepoints,makingit
computationally more efficient.
● DenseBA:Involvesall3Dpointsandimagepoints,providinghigher accuracy
but requiring more computational resources.
SequentialandGlobalBundleAdjustment:
● SequentialBA:Optimizescameraposesand3Dpointssequentially, typically
in a sliding window fashion.
● GlobalBA:Optimizesallcameraposesand3Dpointssimultaneously.
Providesamoreaccuratesolutionbutiscomputationallymore demanding.
Applications:
● StructurefromMotion(SfM):Refinesthereconstructionof3Dscenes from a
sequence of images.
● SimultaneousLocalizationandMapping(SLAM):Improvestheaccuracy of camera
pose estimation and map reconstruction in real-time
environments.
● 3DReconstruction:Enhancestheaccuracyofreconstructed3Dmodels from
images.
Challenges:
● LocalMinima:Theoptimizationproblemmayhavemultiplelocalminima, making it
essential to use robust optimization methods.
● OutliersandNoise:BundleAdjustmentneedstoberobusttooutliersand noise in the
input data.
IntegrationwithOtherTechniques:
● Feature Matching: Often used in conjunction with feature matching
techniquestoestablishcorrespondencesbetween2Dand3Dpoints.
● CameraCalibration:BundleAdjustmentmaybeprecededbyorintegrated with
camera calibration to refine intrinsic parameters.

BundleAdjustmentisafundamentaloptimizationtechniquethatsignificantlyimproves the accuracy of


3D reconstructions and camera pose estimations in computer vision

B.Tech [AIML/DS]
EAIDS254 – COMPUTER VISION JEPPIAAR UNIVERSITY

applications.Ithasbecomeacornerstoneinmanysystemsdealingwith3Dscene understanding and


reconstruction.

8. Constrainedstructureandmotion:

ConstrainedStructureandMotion

ConstrainedStructureandMotionreferstoasetoftechniquesandmethodsin
computervisionandphotogrammetrythatincorporateadditionalconstraintsinto the
structurefrommotion(SfM)process.Thegoalistoimprovetheaccuracyandreliability of 3D
reconstruction by imposing constraints on the estimated camera poses and 3D scene points.
These constraints may come from prior knowledge about the scene,
sensorcharacteristics,oradditionalinformation.

HerearekeypointsrelatedtoConstrainedStructureandMotion: Introduction
of Constraints:
● Prior Information: Constraints can be introduced based on prior
knowledgeaboutthescene,suchasknowndistances,planarstructures, or object
shapes.

B.Tech [AIML/DS]
EAIDS254 – COMPUTER VISION JEPPIAAR UNIVERSITY

● SensorConstraints:Informationaboutthecamerasystem,suchasfocal length or
aspect ratio, can be incorporated as constraints.
TypesofConstraints:
● Geometric Constraints: Constraints that enforce geometric relationships, such as
parallel lines, perpendicularity, or known distances between
points.
● SemanticConstraints:Incorporatingsemanticinformationaboutthe
scene,suchastheknowledgethatcertainpointsbelongtoaspecific object or
structure.
BundleAdjustmentwithConstraints:
● ObjectiveFunction:Thebundleadjustmentproblemisformulatedwithan
objectivefunctionthatincludesthereprojectionerror,aswellasadditional terms
representing the constraints.
● Optimization:Optimizationtechniques,suchasLevenberg-Marquardtor Gauss-
Newton, are used to minimize the combined cost function.
Advantages:
● ImprovedAccuracy:Incorporatingconstraintscanleadtomoreaccurate
andreliablereconstructions,especiallyinscenarioswithlimitedornoisy data.
● HandlingAmbiguities:Constraintshelpinresolvingambiguitiesthatmay arise in
typical SfM scenarios.
CommonTypesofConstraints:
● PlanarConstraints:Assumingthatcertainstructuresinthescenelieon planes,
which can be enforced during reconstruction.
● ScaleConstraints:Fixingorconstrainingthescaleofthescenetoprevent scale
ambiguity in the reconstruction.
● ObjectConstraints:Incorporatingconstraintsrelatedtospecificobjectsor entities in the
scene.
Applications:
● ArchitecturalReconstruction:Constrainingthereconstructionbasedon known
architectural elements or planar surfaces.
● RoboticsandAutonomousSystems:Utilizingconstraintstoenhancethe accuracy of
pose estimation and mapping in robotic navigation.
● AugmentedReality:Incorporatingsemanticconstraintsformoreaccurate alignment of
virtual objects with the real world.
Challenges:
● CorrectnessofConstraints:Theaccuracyofthereconstructiondepends on the
correctness of the imposed constraints.

B.Tech [AIML/DS]
EAIDS254 – COMPUTER VISION JEPPIAAR UNIVERSITY

● ComputationalComplexity:Someconstrainttypesmayincreasethe
computational complexity of the optimization problem.
IntegrationwithSemanticTechnologies:
● Semantic3DReconstruction:Integratingsemanticinformationintothe
reconstruction process to improve the understanding of the scene.

ConstrainedStructureandMotionprovidesawaytoincorporateadditionalinformation
anddomainknowledgeintothereconstructionprocess,makingitavaluableapproach for scenarios
where such information is available and reliable. It contributes to more
accurateandmeaningful3Dreconstructionsincomputervisionapplications.

9. Translationalalignment

Translationalalignment,inthecontextofcomputervisionandimageprocessing,refers to the process


of aligning two or more images based on translational transformations.
Translationalalignmentinvolvesadjustingthepositionofimagesalongthexandyaxes
tobringcorrespondingfeaturesorpointsintoalignment.Thistypeofalignmentisoften a fundamental
step in various computer vision tasks, such as image registration,
panoramastitching,andmotioncorrection.
Herearekeypointsrelatedtotranslationalalignment:
Objective:
● The primary goal of translational alignment is to align images by minimizing
the translation difference between corresponding points or features in the
images.
TranslationModel:

B.Tech [AIML/DS]
EAIDS254 – COMPUTER VISION JEPPIAAR UNIVERSITY

CorrespondenceMatching:
● Correspondencematchinginvolvesidentifyingcorrespondingfeaturesor points in
the images that can be used as reference for alignment.
Commontechniquesincludekeypointdetectionandmatching.
AlignmentProcess:
● Thetranslationalalignmentprocesstypicallyinvolvesthefollowingsteps:

Applications:
● ImageStitching:Inpanoramacreation,translationalalignmentisusedto align images
before merging them into a seamless panorama.
● MotionCorrection:Invideoprocessing,translationalalignmentcorrects for
translational motion between consecutive frames.
● RegistrationinMedicalImaging:Aligningmedicalimagesacquiredfrom different
modalities or at different time points.
Evaluation:
● Thesuccessoftranslationalalignmentisoftenevaluatedbymeasuring
theaccuracyofthealignment,typicallyintermsofthedistancebetween
corresponding points before and after alignment.
Robustness:
● Translational alignment is relatively straightforward and computationally
efficient.However,itmaybesensitivetonoiseandoutliers,particularlyin the
presence of large rotations or distortions.
IntegrationwithOtherTransformations:
● Translational alignment is frequently used as an initial step in more
complexalignmentprocessesthatinvolveadditionaltransformations, such as
rotational alignment or affine transformations.
AutomatedAlignment:
● Inmanyapplications,algorithmsfortranslationalalignmentaredesigned to operate
automatically without requiring manual intervention.

B.Tech [AIML/DS]
EAIDS254 – COMPUTER VISION JEPPIAAR UNIVERSITY

Translational alignment serves as a foundational step in various computer vision


applications,providingasimpleandeffectivemeanstoalignimagesbeforefurther processing or
analysis.

10. Parametricmotion

Parametricmotionreferstothemodelingandrepresentationofmotionincomputer
visionandcomputergraphicsusingparametricfunctionsormodels.Insteadofdirectly
capturingthemotionwithasetofdiscreteframes,parametricmotionmodelsdescribe how the motion
evolves over time using a set of parameters. These models are often
employedinvariousapplications,suchasvideoanalysis,animation,andtracking. Here are key
points related to parametric motion:
ParametricFunctions:
● Parametricmotionmodelsusemathematicalfunctionswithparameters
torepresentthemotionofobjectsorscenesovertime.Thesefunctions could be
simple mathematical equations or more complex models.
TypesofParametricMotionModels:
● LinearModels:Simplestformofparametricmotion,wheremotionis
representedbylinearequations.Forexample,linearinterpolationbetween keyframes.
● PolynomialModels:Higher-orderpolynomialfunctionscanbeusedto model
more complex motion. Cubic splines are commonly used for smooth motion
interpolation.
● TrigonometricModels:Sinusoidalfunctionscanbeemployedtorepresent periodic
motion, such as oscillations or repetitive patterns.
● ExponentialModels:Capturebehaviorsthatexhibitexponentialgrowthor decay,
suitable for certain types of motion.
KeyframeAnimation:
● Inparametricmotion,keyframesarespecifiedatcertainpointsintime,
andthemotionbetweenkeyframesisdefinedbytheparametricmotion
model.Interpolationisthenusedtogenerateframesbetweenkeyframes.
ControlPointsandHandles:
● Parametricmodelsofteninvolvecontrolpointsandhandlesthatinfluence the shape
and behavior of the motion curve. Adjusting these parameters allows for creative
control over the motion.

B.Tech [AIML/DS]
EAIDS254 – COMPUTER VISION JEPPIAAR UNIVERSITY

Applications:
● ComputerAnimation:Usedforanimatingcharacters,objects,orcamera movements
in 3D computer graphics and animation.
● VideoCompression:Parametricmotionmodelscanbeusedtodescribe the motion
between video frames, facilitating efficient compression
techniques.
● VideoSynthesis:Generatingrealisticvideosorpredictingfutureframesin a video
sequence based on learned parametric models.
● MotionTracking:Trackingthemovementofobjectsinavideobyfitting parametric
motion models to observed trajectories.
SmoothnessandContinuity:
● Oneadvantageofparametricmotionmodelsistheirabilitytoprovide smooth and
continuous motion, especially when using interpolation techniques between
keyframes.
ConstraintsandConstraints-BasedMotion:
● Parametricmodelscanbeextendedtoincludeconstraints,ensuringthat
themotionadherestospecificrulesorconditions.Forexample,enforcing constant
velocity or maintaining specific orientations.
MachineLearningIntegration:
● Parametricmotionmodelscanbelearnedfromdatausingmachine learning
techniques. Machine learning algorithms can learn the
parametersofthemotionmodelfromobservedexamples.
Challenges:
● Designingappropriateparametricmodelsthataccuratelycapturethe
desiredmotioncanbechallenging,especiallyforcomplexornon-linear motions.
● Ensuringthatthemotionremainsphysicallyplausibleandvisually appealing
is crucial in animation and simulation.

Parametricmotionprovidesaflexibleframeworkforrepresentingandcontrolling
motioninvariousvisualcomputingapplications.Thechoiceofparametricmodel
dependsonthespecificcharacteristicsofthemotiontoberepresentedandthedesired level of control
and realism.

11. Spline-basedmotion

B.Tech [AIML/DS]
EAIDS254 – COMPUTER VISION JEPPIAAR UNIVERSITY

Spline-basedmotionreferstotheuseofsplinecurvestomodelandinterpolatemotion
incomputergraphics,computer-aideddesign,andanimation.Splinesaremathematical curves that
provide a smooth and flexible way to represent motion paths and
trajectories.Theyarewidelyusedin3Dcomputergraphicsandanimationforcreating
naturalandvisuallypleasingmotion,particularlyinscenarioswherecontinuousand smooth paths are
desired.
Herearekeypointsrelatedtospline-basedmotion: Spline
Definition:
● SplineCurve:Asplineisapiecewise-definedpolynomialcurve.Itconsists
ofseveralpolynomialsegments(typicallylow-degree)thataresmoothly connected at
specific points called knots or control points.
● TypesofSplines:CommontypesofsplinesincludeB-splines,cubic splines, and
Bezier splines.
SplineInterpolation:
● Splinecurvesareoftenusedtointerpolatekeyframesorcontrolpointsin
animation.Thismeansthecurvepassesthroughorfollowsthespecified keyframes,
creating a smooth motion trajectory.
B-spline(BasisSpline):
● B-splinesarewidelyusedforspline-basedmotion.Theyaredefinedbya set of control
points, and their shape is influenced by a set of basis
functions.
● LocalControl:Modifyingthepositionofacontrolpointaffectsonlyalocal portion of
the curve, making B-splines versatile for animation.
CubicSplines:
● Cubicsplinesareaspecifictypeofsplinewhereeachpolynomialsegment is a cubic
(degree-3) polynomial.
● NaturalMotion:Cubicsplinesareoftenusedforcreatingnaturalmotion paths due to
their smoothness and continuity.
BezierSplines:
● Beziersplinesareatypeofsplinethatisdefinedbyasetofcontrolpoints. They have
intuitive control handles that influence the shape of the curve.
● BezierCurves:CubicBeziercurves,inparticular,arefrequentlyusedfor creating
motion paths in animation.
SplineTangentsandCurvature:
● Spline-basedmotionallowscontroloverthetangentsatcontrolpoints,
influencingthedirectionofmotion.Curvaturecontinuityensuressmooth transitions
between segments.

B.Tech [AIML/DS]
EAIDS254 – COMPUTER VISION JEPPIAAR UNIVERSITY

Applications:
● Computer Animation: Spline-based motion is extensively used for
animatingcharacters,cameramovements,andobjectsin3Dscenes.
● PathGeneration:Designingsmoothandvisuallyappealingpathsfor objects to
follow in simulations or virtual environments.
● MotionGraphics:Creatingdynamicandaestheticallypleasingvisual effects in
motion graphics projects.
ParametricRepresentation:
● Spline-basedmotionisparametric,meaningthepositionofapointonthe spline is
determined by a parameter. This allows for easy manipulation
andcontroloverthemotion.
InterpolationTechniques:
● KeyframeInterpolation:Splinecurvesinterpolatesmoothlybetween
keyframes, providing fluid motion transitions.
● Hermite Interpolation: Splines can be constructed using Hermite
interpolation,wherebothpositionandtangentinformationatcontrol points are
considered.
Challenges:
● Overfitting:Insomecases,splinecurvescanbeoverlyflexibleandleadto overfitting if
not properly controlled.
● ControlPointPlacement:Choosingtherightplacementforcontrolpoints is crucial
for achieving the desired motion characteristics.

Spline-basedmotionprovidesanimatorsanddesignerswithaversatiletoolforcreating
smoothandcontrolledmotionpathsincomputer-generatedimagery.Theabilityto adjust the shape of
the spline through control points and handles makes it a popular choice for a wide range of
animation and graphics applications.

12. Opticalflow

Optical flow is a computer vision technique that involves estimating the motion of
objectsorsurfacesinavisualscenebasedontheobservedchangesinbrightnessor
intensityovertime.Itisafundamentalconceptusedinvariousapplications,including motion analysis,
video processing, object tracking, and scene understanding.

Herearekeypointsrelatedtoopticalflow:

B.Tech [AIML/DS]
EAIDS254 – COMPUTER VISION JEPPIAAR UNIVERSITY

MotionEstimation:
● Objective:Theprimarygoalofopticalflowistoestimatethevelocity
vector(opticalflowvector)foreachpixelinanimage,indicatingthe apparent
motion of that pixel in the scene.
● Pixel-levelMotion:Opticalflowprovidesadenserepresentationofmotion at the pixel
level.
BrightnessConstancyAssumption:
● Assumption:Opticalflowisbasedontheassumptionofbrightness
constancy,whichstatesthatthebrightnessofapointinthescene remains
constant over time.

OpticalFlowEquation:
● Derivation:Theopticalflowequationisderivedfromthebrightness
constancyassumptionusingpartialderivativeswithrespecttospatial coordinates and
time.

DenseandSparseOpticalFlow:
● DenseOpticalFlow:Estimatingopticalflowforeverypixelintheimage, providing
a complete motion field.
● SparseOpticalFlow:Estimatingopticalflowonlyforselectedkeypointsor features in
the image.
ComputationalMethods:
● Correlation-basedMethods:Matchimagepatchesorwindowsbetween consecutive
frames to estimate motion.
● Gradient-basedMethods:Utilizeimagegradientstocomputeopticalflow.

B.Tech [AIML/DS]
EAIDS254 – COMPUTER VISION JEPPIAAR UNIVERSITY

● VariationalMethods:Formulateenergyminimizationproblemstoestimate optical
flow.
Lucas-KanadeMethod:
● Awell-knowndifferentialmethodforestimatingopticalflow,particularly suited for
small motion and local analysis.
Horn-SchunckMethod:
● Avariationalmethodthatminimizesaglobalenergyfunction,takinginto account
smoothness constraints in addition to brightness constancy.
Applications:
● VideoCompression:Opticalflowisusedinvideocompressionalgorithms to predict
motion between frames.
● ObjectTracking:Trackingmovingobjectsinavideosequence.
● Robotics:Providingvisualfeedbackfornavigationandobstacle
avoidance.
● AugmentedReality:Aligningvirtualobjectswiththereal-worldscene.
Challenges:
● IlluminationChanges:Opticalflowmaybesensitivetochangesin
illumination.
● Occlusions:Occlusionsandcomplexmotionpatternscanposechallenges for accurate
optical flow estimation.
● LargeDisplacements:Traditionalmethodsmaystrugglewithhandling large
displacements.
DeepLearningforOpticalFlow:
● Recent advances in deep learning have led to the development of neural
network-basedmethodsforopticalflowestimation,suchasFlowNetand PWC-Net.

Opticalflowisavaluabletoolforunderstandingandanalyzingmotioninvisualdata.
Whiletraditionalmethodshavebeenwidelyused,theintegrationofdeeplearninghas brought new
perspectives and improved performance in optical flow estimation.

13. Layeredmotion

Layeredmotion,inthecontextofcomputervisionandmotionanalysis,referstothe representation and


analysis of a scene where different objects or layers move
independentlyofeachother.Itassumesthatthemotioninascenecanbedecomposed into multiple
layers, each associated with a distinct object or surface. Layered motion

B.Tech [AIML/DS]
EAIDS254 – COMPUTER VISION JEPPIAAR UNIVERSITY

modelsareemployedtobettercapturecomplexsceneswithmultiplemovingentities, handling
occlusions and interactions between objects.
Herearekeypointsrelatedtolayeredmotion: Layered
Motion Models:
● Objective:Thegoaloflayeredmotionmodelsistorepresentthemotionof
distinctobjectsorsurfacesinasceneindependently,allowingforamore accurate
description of complex motion scenarios.
● Assumption:Itassumesthattheobservedmotioninascenecanbe decomposed
into the motion of different layers.
KeyConcepts:
● Independence:Layersareassumedtomoveindependentlyofeachother, simplifying
the modeling of complex scenes.
● Occlusions: Layered motion models can handle occlusions more
effectively,aseachlayerrepresentsaseparateentityinthescene.
MotionLayerSegmentation:
● SegmentationProcess:The processofidentifying andseparating the
differentmotionlayersinavideosequenceisreferredtoasmotionlayer segmentation.
● ForegroundandBackground:Layersmightrepresenttheforegroundand background
elements in a scene.
ChallengesinLayeredMotion:
● InteractionHandling:Representingtheinteractionbetweenlayers,suchas occlusions or
overlapping motions.
● DynamicSceneChanges:Adaptingtochangesinthescene,includingthe appearance
or disappearance of objects.
OpticalFlowforLayeredMotion:
● Opticalflowtechniquescanbeextendedtoestimatethemotionof individual
layers in a scene.
● Layer-SpecificOpticalFlow:Applyingopticalflowindependentlyto different
layers.
MultipleObjectTracking:
● Layeredmotionmodelsarecloselyrelatedtomultipleobjecttracking,as each layer
can correspond to a tracked object.
Applications:
● SurveillanceandSecurity:Trackingandanalyzingthemotionofmultiple objects in
surveillance videos.

B.Tech [AIML/DS]
EAIDS254 – COMPUTER VISION JEPPIAAR UNIVERSITY

● Robotics:Layeredmotionmodelscanaidrobotsinunderstandingand navigating
dynamic environments.
● AugmentedReality:Aligningvirtualobjectswiththereal-worldsceneby
understanding the layered motion.
RepresentationFormats:
● Layerscanberepresentedinvariousformats,suchasdepthmaps, segmentation
masks, or explicit motion models for each layer.
IntegrationwithSceneUnderstanding:
● Layeredmotionmodelscanbeintegratedwithhigher-levelsceneund

B.Tech [AIML/DS]
EAIDS254 – COMPUTER VISION JEPPIAAR UNIVERSITY

UNIT IV
3D RECONSTRUCTION
ShapefromX-Activerangefinding-Surfacerepresentations-
Point-basedrepresentationsVolumetricrepresentations-Model-based reconstruction
- Recovering texture maps and albedosos.

1. Shape from X:

"Shape from X" refers to a category of computer vision and computer graphics
techniquesthataimtorecoverthethree-dimensional(3D)shapeorstructureofobjects or scenes from
different types of information or cues, represented by the variable "X".
The"X"canstandforvarioussourcesormodalitiesthatprovideinformationaboutthe scene. Some
common examples include:

ShapefromShading(SfS):Thistechniqueinvolvesrecovering3Dshape
informationfromvariationsinbrightnessandshadingin2Dimages.Itassumes that the shading
patterns in an image are influenced by the underlying 3D
geometry.

ShapefromStereo(SfS):Thismethodutilizesthedisparityorparallax
informationbetweentwoormoreimagesofascenetakenfromdifferent

B.Tech [AIML/DS]
EAIDS254 – COMPUTER VISION JEPPIAAR UNIVERSITY

viewpoints.Bytriangulatingcorrespondingpoints,the3Dstructureofthescene can be
reconstructed.

ShapefromMotion(SfM):SfMaimstorecoverthe3Dstructureofasceneby
analyzingthemotionofobjectsorthecameraitself.Thisisoftenachievedby tracking
features across multiple frames of a video sequence.

ShapefromTexture(SfT):SfTreliesonthevariationsintexturepatternsacross surfaces to infer


their 3D structure. The assumption is that different surface orientations result in distinct
texture variations.

ShapefromFocus(SfF):InSfF,thedepthinformationisinferredfromthe
variationinimagesharpnessorfocus.Byanalyzingthefocusinformationat different depths,
the 3D shape can be estimated.
ShapefromDefocus(SfD):SimilartoSfF,SfDleveragestheeffectsofdefocusing
inimagestoestimatedepthinformation.Objectsatdifferentdistancesfromthe camera will
exhibit different degrees of blur.

Shape from Light (SfL): This technique involves using information about the
lightingconditionsinascenetoinfer3Dshape.Theinteractionbetweenlightand surfaces
provides cues about the geometry.

Theseapproachesdemonstratethediversityofmethodsusedincomputervisionto
recover3Dshapeinformationfromdifferenttypesofvisualcues.Thechoiceofthe specific "X"
(shading, stereo, motion, etc.) depends on the available data and the
characteristicsofthescenebeingreconstructed.

2. Activerangefinding:
Active range finding is a technique used in computer vision and remote sensing to
determinethedistancetoobjectsinasceneactively.Unlikepassivemethodsthatrely
onexistingambientillumination,activerangefindinginvolvesemittingasignalorprobe
towardsthetargetandmeasuringthetimeittakesforthesignaltoreturn.Thisprocess
isoftenbasedontheprinciplesoftime-of-flightorphase-shiftmeasurement.Thegoal is to obtain
accurate depth or distance information about the surfaces in the scene.

Hereareafewcommonmethodsofactiverangefinding:

B.Tech [AIML/DS]
EAIDS254 – COMPUTER VISION JEPPIAAR UNIVERSITY

LaserRangeFinding:Thismethodinvolvesemittinglaserbeamstowardsthe
targetandmeasuringthetimeittakesforthelaserpulsestotraveltotheobject and back. By
knowing the speed of light, the distance to the object can be
calculated.

StructuredLight:Instructuredlightrangefinding,aknownlightpattern,oftena grid or a set of


stripes, is projected onto the scene. Cameras capture the
deformedpatternonsurfaces,andthedistortionhelpscalculatedepth information based on the
known geometry of the projected pattern.

Time-of-Flight(ToF)Cameras:ToFcamerasemitmodulatedlightsignals(often infrared) and


measure the time it takes for the light to travel to the object and
return.Thephaseshiftofthemodulatedsignalisusedtodeterminethedistance to the object.

UltrasoundRangeFinding:Ultrasoundwavesareemitted,andthetimeittakes
forthewavestobouncebacktoasensorismeasured.Thismethodiscommonly used in
environments where optical methods may be less effective, such as in low-light
conditions.

Active range finding has various applications, including robotics, 3D scanning,


autonomousvehicles,augmentedreality,andindustrialinspection.Theabilitytoactively
measuredistancesisvaluableinscenarioswhereambientlightingconditionsmayvary or when accurate
depth information is essential for understanding the environment.

3. Surfacerepresentations:

Surfacerepresentationsincomputervisionrefertothewaysinwhichthegeometryor shape of surfaces


in a three-dimensional (3D) scene is represented. These
representationsarecrucialfortaskssuchas3Dreconstruction,computergraphics,and virtual reality.
Different methods exist for representing surfaces, and the choice often
dependsontheapplication'srequirementsandthecharacteristicsofthedata.Hereare some common
surface representations:

B.Tech [AIML/DS]
EAIDS254 – COMPUTER VISION JEPPIAAR UNIVERSITY

PolygonalMeshes:
● Description:Meshesarecomposedofvertices,edges,andfacesthat
definethesurfacegeometry.Triangularandquadrilateralmeshesare most
common.
● Application:Widelyusedincomputergraphics,gaming,and3Dmodeling. Point
Clouds:
● Description:Asetof3Dpointsinspace,eachrepresentingasampleon the surface of
an object.
● Application:Generatedby3Dscanners,LiDAR,ordepthsensors;usedin
applications like autonomous vehicles, robotics, and environmental
mapping.
Implicit Surfaces:
● Description:Representsurfacesasthezerolevelsetofascalarfunction. Points inside
the surface have negative values, points outside have positive values, and points
on the surface have values close to zero.
● Application:Usedinphysics-basedsimulations,medicalimaging,and shape
modeling.
NURBS(Non-UniformRationalB-Splines):
● Description:Mathematicalrepresentationsusingcontrolpointsandbasis functions to
define smooth surfaces.
● Application:Commonlyusedincomputer-aideddesign(CAD),automotive design,
and industrial design.
VoxelGrids:
● Description:3Dgridswhereeachvoxel(volumetricpixel)representsa small
volume in space, and the surface is defined by the boundary
betweenoccupiedandunoccupiedvoxels.
● Application:Usedinmedicalimaging,volumetricdataanalysis,and
computational fluid dynamics.
LevelSetMethods:
● Description:Representsurfacesasthezerolevelsetofa
higher-dimensionalfunction.Theevolutionofthisfunctionovertime captures the
motion of the surface.
● Application:Usedinimagesegmentation,shapeoptimization,andfluid dynamics
simulations.
Octrees:
● Description:Hierarchicaltreestructuresthatrecursivelydividespaceinto
octants.Eachleafnodecontainsinformationaboutthegeometrywithin that region.

B.Tech [AIML/DS]
EAIDS254 – COMPUTER VISION JEPPIAAR UNIVERSITY

● Application:Usedinreal-timerendering,collisiondetection,andefficient storage of
3D data.

Thechoiceofsurfacerepresentationdependsonfactorssuchasthenatureofthe scene, the desired level


of detail, computational efficiency, and the specific
requirementsoftheapplication.

4. Point-basedrepresentations:
Point-basedrepresentationsincomputervisionandcomputergraphicsreferto methods that
represent surfaces or objects using a set of individual points in
three-dimensional(3D)space.Insteadofexplicitlydefiningtheconnectivitybetween
pointsasinpolygonalmeshes,point-basedrepresentationsfocusonthespatial
distributionofpointstodescribethesurfacegeometry.Herearesomecommon point-based
representations:

PointClouds:
● Description:Acollectionof3Dpointsinspace,eachrepresentingasample on the
surface of an object or a scene.
● Application:Pointcloudsaregeneratedby3Dscanners,LiDAR,depth sensors, or
photogrammetry. They find applications in robotics, autonomous vehicles,
environmental mapping, and 3D modeling.
DensePointClouds:
● Description:Similartopointcloudsbutwithahighdensityofpoints, providing
more detailed surface information.

B.Tech [AIML/DS]
EAIDS254 – COMPUTER VISION JEPPIAAR UNIVERSITY

● Application:Usedinapplicationsrequiringdetailed3Dreconstructions, such as
cultural heritage preservation, archaeological studies, and
industrialinspections.
SparsePointSets:
● Description: Representations where only a subset of points is used to
describethesurface,resultinginasparserdatasetcomparedtoadense point cloud.
● Application:Sparsepointsetsareusefulinscenarioswhere
computationalefficiencyiscrucial,suchasreal-timeapplicationsand large-scale
environments.
PointSplats:
● Description:Representeachpointasadiscorasplatin3Dspace.The size and
orientation of the splats can convey additional information.
● Application:Commonlyusedinpoint-basedrenderingandvisualizationto represent
dense point clouds efficiently.
PointFeatures:
● Description:Representsurfacesusingdistinctivepointsorkeypoints,
eachassociatedwithlocalfeaturessuchasnormals,color,ortexture information.
● Application:Widelyusedinfeature-basedregistration,objectrecognition, and 3D
reconstruction.
PointSetSurfaces:
● Description:Representsurfacesasasetofunorganizedpointswithout connectivity
information. Surface properties can be interpolated from neighboring points.
● Application:Usedinsurfacereconstructionfrompointcloudsand point-
based rendering.
RadialBasisFunction(RBF)Representations:
● Description:Useradialbasisfunctionstointerpolatesurfaceproperties between
points. These functions define a smooth surface that passes through the given
points.
● Application:Commonlyusedinshapemodeling,surfacereconstruction, and
computer-aided design.

Point-basedrepresentationsareparticularlyusefulwhendealingwithunstructuredor
irregularlysampleddata.Theyprovideflexibilityinrepresentingsurfaceswithvarying
levelsofdetailandarewell-suitedforcapturingcomplexandintricatestructures.

B.Tech [AIML/DS]
EAIDS254 – COMPUTER VISION JEPPIAAR UNIVERSITY

5. Volumetricrepresentations:

Volumetricrepresentationsincomputervisionandcomputergraphicsaremethods
usedtodescribeandmodelthree-dimensional(3D)spaceinavolumetricmanner.
Unlikesurfacerepresentations,whichfocusondefiningthesurfacegeometryexplicitly, volumetric
representations capture information about the entire volume, including the
interiorofobjects.Herearesomecommonvolumetricrepresentations:

VoxelGrids:
● Description:Aregulargridofsmallvolumeelements,calledvoxels,where each voxel
represents a small unit of 3D space.
● Application:Usedinmedicalimaging,computer-aideddesign(CAD),
computationalfluiddynamics,androbotics.Voxelgridsareeffectivefor representing
both the exterior and interior of objects.
Octrees:
● Description:Ahierarchicaldatastructurethatrecursivelydivides3Dspace into octants.
Each leaf node in the octree contains information about the occupied or
unoccupied status of the corresponding volume.
● Application:Octreesareemployedforefficientstorageandrepresentation of
volumetric data, particularly in real-time rendering, collision detection,
andadaptiveresolution.
Signed Distance Fields (SDF):
● Description:Representthedistancefromeachpointinspacetothe
nearestsurfaceofanobject,withpositivevaluesinsidetheobjectand negative values
outside.
● Application: Used in shape modeling, surface reconstruction, and
physics-basedsimulations.SDFsprovideacompactrepresentationof geometry and are
often used in conjunction with implicit surfaces.
3DTextureMaps:

B.Tech [AIML/DS]
EAIDS254 – COMPUTER VISION JEPPIAAR UNIVERSITY

● Description:Extendtheconceptof2Dtexturemappingto3Dspace,
associatingcolororotherpropertieswithvoxelsinavolumetricgrid.
● Application:Employed in computer graphics, simulations, and
visualizationtorepresentcomplexvolumetricdetailssuchassmoke, clouds, or other
phenomena.
PointCloudswithOccupancyInformation:
● Description:Combinetheideaofpointcloudswithadditionalinformation about the
occupancy of each point in space.
● Application:Usefulinscenarioswherecapturingboththesurfaceand interior
details of objects is necessary, such as in robotics and 3D
reconstruction.
TensorFields:
● Description:Representthelocalstructureofavolumetricregionusing tensors.
Tensor fields capture directional information, making them suitable for
anisotropic materials and shapes.
● Application:Commonlyusedinmaterialsscience,biomechanics,and
simulations where capturing anisotropic properties is important.
ShellMaps:
● Description:Representthesurfacesofobjectsasacollectionofshellsor layers, each
encapsulating the object's geometry.
● Application:Usedincomputergraphicsandsimulationtoefficiently
representcomplexobjectsandenabledynamiclevel-of-detailrendering.

Volumetricrepresentationsarevaluableinvariousapplicationswhereacomprehensive understanding
of the 3D space is required, and they offer flexibility in capturing both
surfaceandinteriordetailsofobjects.Thechoiceofrepresentationoftendependson
thespecificrequirementsofthetaskathandandthecharacteristicsofthedatabeing modeled.

6. Model-basedreconstruction:
Model-basedreconstructionincomputervisionreferstoacategoryoftechniquesthat involve
creating a 3D model of a scene or object based on predefined models or
templates.Thesemethodsleveragepriorknowledgeaboutthegeometry,appearance, or structure of
the objects being reconstructed. Model-based reconstruction is often
usedinscenarioswhereaknownmodelcanbefittedtotheobserveddata,providinga structured and
systematic approach to understanding the scene. Here are some key aspects and applications of
model-based reconstruction:

B.Tech [AIML/DS]
EAIDS254 – COMPUTER VISION JEPPIAAR UNIVERSITY

PriorModelRepresentation:
● Description:Inmodel-basedreconstruction,amathematical
representationorageometricmodeloftheobjectorsceneisassumedor known in
advance.
● Application: Commonly used in computer-aided design (CAD), medical
imaging,andindustrialinspection,whereknownshapesorstructurescan be explicitly
represented.
ModelFitting:
● Description:Thereconstructionprocessinvolvesadjustingtheparameters
ofthemodeltobestfittheobserveddata,typicallyobtainedfromimages or sensor
measurements.
● Application: Used in applications such as object recognition, pose
estimation,and3Dreconstructionbyaligningthemodelwiththeobserved features.
GeometricConstraints:
● Description:Constraintsonthegeometryofthescene,suchasthe
relationshipsbetweendifferentcomponentsortheexpectedshape
characteristics, are incorporated into the reconstruction process.
● Application:Appliedinrobotics,augmentedreality,andcomputervision tasks
where geometric relationships play a crucial role.
DeformableModels:
● Description:Modelsthatcanadaptanddeformtofittheobserveddata, allowing for
more flexible and realistic representations.
● Application:Commonlyusedinmedicalimagingfororgansegmentation and shape
analysis, as well as in computer graphics for character
animation.
StereoVisionwithModelConstraints:
● Description:Stereovisiontechniquesthatincorporateknownmodelsto improve
depth estimation and 3D reconstruction.

B.Tech [AIML/DS]
EAIDS254 – COMPUTER VISION JEPPIAAR UNIVERSITY

● Application:Usedinstereomatchingalgorithmsand3Dreconstruction pipelines to
enhance accuracy by considering geometric priors.
ParametricSurfaces:
● Description: Representing surfaces using parametric equations or
functions,allowingforefficientadjustmentofparametersduringthe
reconstruction process.
● Application:Appliedincomputergraphics,virtualreality,andindustrial design
where surfaces can be described mathematically.
Multi-ViewReconstructionwithKnownModels:
● Description:Leveragingmultipleviewsorimagesofascenetoreconstruct a 3D model
while incorporating information from known models.
● Application: Common in photogrammetry and structure-from-motion
applications where multiple perspectives contribute to accurate 3D
reconstruction.

Model-basedreconstructionisvaluablewhenthereispriorknowledgeabouttheobjects or scenes being


reconstructed, as it allows for more efficient and accurate
reconstructioncomparedtopurelydata-drivenapproaches.Thisapproachis
particularlyusefulinfieldswhereawell-definedunderstandingoftheunderlying geometry is available.

7. Recoveringtexturemapsand albedos:
Recovering texture maps and albedos in computer vision and computer graphics
involvesestimatingthesurfaceappearance,color,andreflectancepropertiesofobjects
inascene.Theseprocessesareintegraltocreatingrealisticanddetailed3Dmodelsfor applications like
virtual reality, computer games, and simulations. Here's a brief
overviewoftheseconcepts:

B.Tech [AIML/DS]
EAIDS254 – COMPUTER VISION JEPPIAAR UNIVERSITY

TextureMaps:
● Description:Texturemappinginvolvesapplyinga2Dimage,knownasa texture
map, onto a 3D model's surface to simulate surface details, patterns, or color
variations.
● Recovery Process: Texture maps can be recovered through various
methods,includingimage-basedtechniques,photogrammetry,orusing specialized
3D scanners. These methods capture color information
associatedwiththesurfacegeometry.
● Application: Used in computer graphics, gaming, and virtual reality to
enhancethevisualappearanceof3Dmodelsbyaddingrealisticsurface details.
Albedo:
● Description:Albedorepresentstheintrinsiccolororreflectanceofa
surface,independentoflightingconditions.Itisameasureofhowmuch light a surface
reflects.
● Recovery Process: Albedo can be estimated by decoupling surface
reflectancefromlightingeffects.Photometricstereo,shape-from-shading, or using
multi-view images are common methods to recover albedo
information.
● Application:Albedoinformationiscrucialincomputervisionapplications, such as
material recognition, object tracking, and realistic rendering in
computergraphics.

RecoveringTextureMapsandAlbedosofteninvolvesthefollowingtechniques: Photometric Stereo:

B.Tech [AIML/DS]
EAIDS254 – COMPUTER VISION JEPPIAAR UNIVERSITY

● Description:Atechniquethatusesmultipleimagesofanobject
illuminatedfromdifferentdirectionstorecoversurfacenormalsand, subsequently,
albedo information.
● Application:Usedinscenarioswheredetailedsurfacepropertiesare needed,
such as facial recognition, material analysis, and industrial
inspection.
Shape-from-Shading:
● Description: Inferring the shape of a surface based on variations in
brightnessorshadinginimages.Bydecouplingshadingfromgeometry, albedo
information can be estimated.
● Application:Appliedincomputervisionforshaperecovery,aswellasin computer
graphics to enhance the realism of rendered images.
Multi-ViewStereo (MVS):
● Description: In the context of 3D reconstruction, MVS involves capturing
imagesofascenefrommultipleviewpointsandrecoveringbothgeometry and texture
information.
● Application:Commonlyusedin3Dmodeling,virtualreality,andcultural heritage
preservation to create detailed and textured 3D models.
ReflectanceTransformationImaging(RTI):
● Description:Atechniquethatcapturesaseriesofimageswithcontrolled
lightingconditionstorevealsurfacedetails,includingalbedovariations.
● Application:Widelyusedinculturalheritagepreservationandart
restoration for capturing fine details on surfaces.

Recoveringtexturemapsandalbedosiscrucialforcreatingvisuallyappealingand
realistic3Dmodels.Thesetechniquesbridgethegapbetweenthegeometryofthe
objectsandtheirappearance,contributingtotheoverallfidelityofvirtualoraugmented environments.

B.Tech [AIML/DS]
EAIDS254 – COMPUTER VISION JEPPIAAR UNIVERSITY

UNIT V
IMAGE-BASEDRENDERING AND RECOGNITION
View interpolation Layered depth images-Light fields and Lumi graphs-
Environment mattes - Video-based rendering-Object detection - Face recognition -
Instance recognition - Category recognition - Context and scene understanding-
Recognition databases and test sets.

1. View Interpolation:
Viewinterpolationisatechniqueusedincomputergraphicsandcomputervisionto
generatenewviewsofascenethatarenotpresentintheoriginalsetofcapturedor rendered views.
The goal is to create additional viewpoints between existing ones,
providingasmoothertransitionandamoreimmersiveexperience.Thisisparticularly
usefulinapplicationslike3Dgraphics,virtualreality,andvideoprocessing.Herearekey points about
view interpolation:

Description:
● Viewinterpolationinvolvessynthesizingviewsfromknownviewpointsina way that
appears visually plausible and coherent.
● Theprimaryaimistoprovideasenseofcontinuityandsmoothtransitions between the
available views.
Methods:
● Image-BasedMethods:Thesemethodsuseimagewarpingormorphing techniques to
generate new views by blending or deforming existing
images.

B.Tech [AIML/DS]
EAIDS254 – COMPUTER VISION JEPPIAAR UNIVERSITY

● 3DReconstructionMethods:Theseapproachesinvolveestimatingthe3D geometry
of the scene and generating new views based on the
reconstructed3Dmodel.
Applications:
● Virtual Reality (VR): In VR applications, view interpolation helps create a
moreimmersiveexperiencebygeneratingviewsbasedontheuser'shead movements.
● Free-viewpointVideo:Viewinterpolationisusedinvideoprocessingto generate
additional views for a more dynamic and interactive video
experience.
Challenges:
● Depth Discontinuities: Handling depth changes in the scene can be
challenging, especially when interpolating between views with different depths.
● Occlusions:Addressingocclusions,whereobjectsinthescenemayblock the view of
others, is a common challenge.
Techniques:
● LinearInterpolation:Basiclinearinterpolationisoftenusedtogenerate
intermediate views by blending the pixel values of adjacent views.
● Depth-Image-Based Rendering (DIBR): This method involves warping
images based on depth information to generate new views.
● Neural Network Approaches: Deep learning techniques, including
convolutionalneuralnetworks(CNNs),havebeenemployedforview synthesis
tasks.
UseCases:
● 3DGraphics:Viewinterpolationisusedtosmoothlytransitionbetween different
camera angles in 3D graphics applications and games.
● 360-DegreeVideos:Invirtualtoursorimmersivevideos,viewinterpolation helps create
a continuous viewing experience.

Viewinterpolationisavaluabletoolforenhancingthevisualqualityanduserexperience in applications
where dynamic or interactive viewpoints are essential. It enables the
creationofmorenaturalandfluidtransitionsbetweenviews,contributingtoamore realistic and
engaging visual presentation.

B.Tech [AIML/DS]
EAIDS254 – COMPUTER VISION JEPPIAAR UNIVERSITY

2. LayeredDepthImages:

Layered Depth Images (LDI) is a technique used in computer graphics for efficiently
representingcomplexsceneswithmultiplelayersofgeometryatvaryingdepths.The primary goal of
Layered Depth Images is to provide an effective representation of scenes with transparency and
occlusion effects. Here are key points about Layered Depth Images:

Description:
● LayeredRepresentation:LDIrepresentsasceneasastackofimages,
whereeachimagecorrespondstoaspecificdepthlayerwithinthescene.
● DepthInformation:EachpixelintheLDIcontainscolorinformationaswell as depth
information, indicating the position of the pixel along the view
direction.
Representation:
● 2DArrayofImages:Conceptually,anLDIcanbethoughtofasa2Darray of images,
where each image represents a different layer of the scene.
● DepthSlice:Theimagesinthearrayareoftenreferredtoas"depthslices,"
andtheorderoftheslicescorrespondstothedepthorderingofthelayers.
Advantages:
● EfficientStorage:LDIscanprovidemoreefficientstorageforsceneswith
transparency compared to traditional methods like z-buffers.
● OcclusionHandling:LDIsnaturallyhandleocclusionsandtransparency,
makingthemsuitableforrenderingsceneswithcomplexlayeringeffects.

B.Tech [AIML/DS]
EAIDS254 – COMPUTER VISION JEPPIAAR UNIVERSITY

UseCases:
● AugmentedReality:LDIsareusedinaugmentedrealityapplicationswhere virtual
objects need to be integrated seamlessly with the real world, considering
occlusions and transparency.
● ComputerGames:LDIscanbeemployedinvideogamestoefficiently handle
scenes with transparency effects, such as foliage or glass.
SceneComposition:
● Compositing:Torenderascenefromaparticularviewpoint,theimages
fromdifferentdepthslicesarecompositedtogether,takingintoaccount the depth
values to handle transparency and occlusion.
Challenges:
● MemoryUsage:Dependingonthecomplexityofthesceneandthe
numberofdepthlayers,LDIscanconsumeasignificantamountof memory.
● Anti-aliasing:Handlingsmoothtransitionsbetweenlayers,especiallywhen
dealingwithtransparency,canposechallengesforanti-aliasing.
Extensions:
● Sparse Layered Representations: Some extensions of LDIs involve using
sparserepresentationstoreducememoryrequirementswhilemaintaining the benefits
of layered depth information.

LayeredDepthImagesareparticularlyusefulinscenarioswheretraditionalrendering
techniques,suchasz-buffer-basedmethods,struggletohandletransparencyand
complexlayering.Byrepresentingscenesasastackofimages,LDIsprovideamore
naturalwaytodealwiththechallengesposedbyrenderingsceneswithvaryingdepths and transparency
effects.

3. LightFieldsandLumigraphs:

LightFields:

● Definition:Alightfieldisarepresentationofallthelightraystravelinginall directions
through every point in a 3D space.
● Components:Itconsistsofboththeintensityandthedirectionoflightat each point in
space.

B.Tech [AIML/DS]
EAIDS254 – COMPUTER VISION JEPPIAAR UNIVERSITY

● Capture:Lightfieldscanbecapturedusinganarrayofcamerasor
specializedcamerasetupstorecordtheraysoflightfromdifferent perspectives.
● Applications:Usedincomputergraphicsforrealisticrendering,virtual
reality,andpost-capturerefocusingwherethefocuspointcanbeadjusted after the image
is captured.


Lumigraphs:
● Definition:Alumigraphisatypeoflightfieldthatrepresentsthevisual information
in a scene as a function of both space and direction.
● Capture:Lumigraphsaretypicallycapturedusingasetofimagesfroma dense
camera array, capturing the scene from various viewpoints.
● Components:Similartolightfields,theyincludeinformationaboutthe intensity
and direction of light at different points in space.
● Applications:Primarilyusedincomputergraphicsandcomputervisionfor 3D
reconstruction, view interpolation, and realistic rendering of complex
scenes.
Comparison:
● Difference:Whilethetermsareoftenusedinterchangeably,alightfield
generallyreferstothecompletesetofraysin4Dspace,whilealumigraph specifically
refers to a light field in 3D space and direction.
● Similarities:Bothlightfieldsandlumigraphsaimtocapturea
comprehensivesetofvisualinformationaboutascenetoenablerealistic rendering and
various computational photography applications.
Advantages:
● Realism:Lightfieldsandlumigraphscontributetorealisticrenderingby capturing
the full complexity of how light interacts with a scene.

B.Tech [AIML/DS]
EAIDS254 – COMPUTER VISION JEPPIAAR UNIVERSITY

● Flexibility:Theyallowforpost-capturemanipulation,suchaschangingthe viewpoint or
adjusting focus, providing more flexibility in the rendering
process.
Challenges:
● DataSize:Lightfieldsandlumigraphscangeneratelargeamountsofdata, requiring
significant storage and processing capabilities.
● CaptureSetup:Acquiringahigh-qualitylightfieldorlumigraphoften requires
specialized camera arrays or complex setups.
Applications:
● VirtualReality:Usedtoenhancetherealismofvirtualenvironmentsby providing
a more immersive visual experience.
● 3DReconstruction:Appliedincomputervisionforreconstructing3D scenes
and objects from multiple viewpoints.
FutureDevelopments:
● ComputationalPhotography:Ongoingresearchexploresadvanced
computational photography techniques leveraging light fields for
applicationslikerefocusing, depthestimation,and novelviewsynthesis.
● HardwareAdvances:Continuedimprovementsincameratechnologymay lead to
more accessible methods for capturing high-quality light fields.

Lightfieldsandlumigraphsarepowerfulconceptsincomputergraphicsandcomputer
vision,offeringarichrepresentationofvisualinformationthatopensuppossibilitiesfor creating more
immersive and realistic virtual experiences.

4. EnvironmentMattes:

Definition:

● Environment Mattes refer to the process of separating the foreground


elementsfromthebackgroundinanimageorvideotoenablecompositing or
replacement of the background.
Purpose:
● Isolation of Foreground Elements: The primary goal is to isolate the
objectsorpeopleintheforegroundfromtheoriginalbackground,creating a "matte"
that can be replaced or composited with a new background.\

B.Tech [AIML/DS]
EAIDS254 – COMPUTER VISION JEPPIAAR UNIVERSITY

Techniques:
● ChromaKeying:Commonlyusedinfilmandtelevision,chromakeying
involvesshootingthesubjectagainstauniformlycoloredbackground (often green
or blue) that can be easily removed in post-production.
● Rotoscoping: Involves manually tracing the outlines of the subject frame
byframe,providingprecisecontroloverthemattebutrequiringsignificant labor.
● Depth-basedMattes:In3Dapplications,depthinformationcanbeusedto create a
matte, allowing for more accurate separation of foreground and background
elements.
Applications:
● FilmandTelevisionProduction:Widelyusedintheentertainmentindustry
tocreatespecialeffects,insertvirtualbackgrounds,orcompositeactors into different
scenes.
● VirtualStudios:Invirtualproductionsetups,environmentmattesare crucial for
seamlessly integrating live-action footage with
computer-generatedbackgrounds.
Challenges:
● Soft Edges: Achieving smooth and natural transitions between the
foregroundandbackgroundischallenging,especiallywhendealingwith fine details
like hair or transparent objects.
● MotionDynamics:Handlingdynamicsceneswithmovingsubjectsor
dynamiccameramovementsrequiresadvancedtechniquestomaintain accurate mattes.
SpillSuppression:

B.Tech [AIML/DS]
EAIDS254 – COMPUTER VISION JEPPIAAR UNIVERSITY

● Definition:Spillreferstotheunwantedinfluenceofthebackgroundcolor
ontheforegroundsubject.Spillsuppressiontechniquesareemployedto minimize
this effect.
● Importance:Ensuresthattheforegroundsubjectlooksnaturalwhen placed
against a new background.
Foreground-BackgroundIntegration:
● LightingandReflection Matching:Forrealisticresults,it'sessentialto
matchthelightingandreflectionsbetweentheforegroundandthenew background.
● Shadow Casting: Consideration of shadows cast by the foreground
elementstoensuretheyalignwiththelightingconditionsofthenew background.
AdvancedTechniques:
● MachineLearning:Advancedmachinelearningtechniques,including
semanticsegmentationanddeeplearning,areincreasinglybeingapplied to automate
and enhance the environment matte creation process.
● Real-timeCompositing:Insomeapplications,especiallyinliveeventsor broadcasts,
real-time compositing technologies are used to create
environmentmattesonthefly.
EvolutionwithTechnology:
● HDRand3DCapture:HighDynamicRange(HDR)imagingand3Dcapture
technologies contribute to more accurate and detailed environment
mattes.
● Real-timeProcessing:Advancesinreal-timeprocessingenablemore efficient
and immediate creation of environment mattes, reducing
post-productiontime.

Environmentmattesplayacrucialroleinmodernvisualeffectsandvirtualproduction, allowing
filmmakers and content creators to seamlessly integrate real and virtual elements to tell
compelling stories.

5. Video-basedRendering:

Definition:

B.Tech [AIML/DS]
EAIDS254 – COMPUTER VISION JEPPIAAR UNIVERSITY

● Video-basedRendering(VBR)referstotheprocessofgenerating
novelviewsorframesofascenebyutilizinginformationfromaset of input
video sequences.

CaptureTechniques:

● Multiple Viewpoints: VBR often involves capturing a scene from


multipleviewpoints,eitherthroughanarrayofcamerasorbyutilizing video
footage captured from different angles.
● Light Field Capture: Some VBR techniques leverage light field
capturemethodstoacquirebothspatialanddirectionalinformation, allowing
for more flexibility in view synthesis.
Techniques:

● ViewSynthesis:Thecoreobjectiveofvideo-basedrenderingisto
synthesizenewviewsorframesthatwerenotoriginallycapturedbut can be
realistically generated from the available footage.
● Image-BasedRendering(IBR):Techniquessuchasimage-based
rendering,whichusecapturedimagesorvideoframesasthebasis for view
synthesis.
Applications:

B.Tech [AIML/DS]
EAIDS254 – COMPUTER VISION JEPPIAAR UNIVERSITY

● VirtualReality(VR):VBRisusedinVRapplicationstoprovideamore
immersive experience by allowing users to explore scenes from various
perspectives.
● Free-Viewpoint Video: VBR techniques enable the creation of free-
viewpointvideo,allowinguserstointeractivelychoosetheir viewpoint
within a scene.
ViewSynthesisChallenges:

● Occlusions:Handlingocclusionsandensuringthatsynthesized
viewsaccountforobjectsobstructingthelineofsightisasignificant challenge.
● Consistency:Ensuringvisualconsistencyandcoherenceacross
synthesized views to avoid artifacts or discrepancies.
3DReconstruction:

● DepthEstimation:Somevideo-basedrenderingapproachesinvolve estimating
depth information from the input video sequences, enabling more
accurate view synthesis.
● Multi-ViewStereo(MVS):Utilizingmultipleviewpointsfor3D
reconstructiontoenhancethequalityofsynthesizedviews.
Real-timeVideo-basedRendering:

● LiveEvents:Incertainscenarios,real-timevideo-basedrenderingis employed
for live events, broadcasts, or interactive applications.
● LowLatency:Minimizinglatencyiscrucialforapplicationswherethe rendered
views need to be presented in real-time.
EmergingTechnologies:

B.Tech [AIML/DS]
EAIDS254 – COMPUTER VISION JEPPIAAR UNIVERSITY

● DeepLearning:Advancesindeeplearning,particularlyconvolutional neural
networks (CNNs) and generative models, have been applied tovideo-
basedrenderingtasks,enhancingthequalityofsynthesized views.
● NeuralRendering:Techniqueslikeneuralrenderingleverageneural
networkstogeneraterealisticnovelviews,addressingchallenges like
specular reflections and complex lighting conditions.
HybridApproaches:

● CombiningTechniques:Somevideo-basedrenderingmethods
combinetraditionalcomputergraphicsapproacheswithmachine learning
techniques for improved results.
● IncorporatingVR/AR:VBRisoftenintegratedwithvirtualreality(VR)
andaugmentedreality(AR)systemstoprovidemoreimmersiveand interactive
experiences.
FutureDirections:

● ImprovedRealism:Ongoingresearchaimstoenhancetherealismof
synthesizedviews,addressingchallengesrelatedtocomplexscene
dynamics,lightingvariations,andrealisticmaterialrendering.
● ApplicationsBeyondEntertainment:Video-basedrenderingis
expandingintofieldslikeremotecollaboration,telepresence,and
interactive content creation.

Video-basedrenderingisadynamicfieldthatplaysacrucialroleinshaping immersive
experiences across various domains, including entertainment,

B.Tech [AIML/DS]
EAIDS254 – COMPUTER VISION JEPPIAAR UNIVERSITY

communication,andvirtualexploration.Advancesintechnologyandresearch
continuetopushtheboundariesofwhatisachievableintermsofrealisticview synthesis.

6. Object Detection:

Definition:

● ObjectDetectionisacomputervisiontaskthatinvolvesidentifyingand
locatingobjectswithinanimageorvideo.Thegoalistodrawbounding
boxesaroundthedetectedobjectsandassignalabeltoeachidentified object.

ObjectLocalizationvs.ObjectRecognition:
● ObjectLocalization:Inadditiontoidentifyingobjects,objectdetectionalso involves
providing precise coordinates (bounding box) for the location of each detected
object within the image.
● Object Recognition: While object detection includes localization, the term
isoftenusedinconjunctionwithrecognizingandcategorizingtheobjects.
Methods:

B.Tech [AIML/DS]
EAIDS254 – COMPUTER VISION JEPPIAAR UNIVERSITY

● Two-StageDetectors:Thesemethodsfirstproposeregionsintheimage
thatmightcontainobjectsandthenclassifyandrefinethoseproposals. Examples
include Faster R-CNN.
● One-Stage Detectors: These methods simultaneously predict object
boundingboxesandclasslabelswithoutaseparateproposalstage.
ExamplesincludeYOLO(YouOnlyLookOnce)andSSD(SingleShot
Multibox Detector).
● Anchor-basedandAnchor-freeApproaches:Somemethodsuseanchor
boxestopredictobjectlocationsandsizes,whileothersadoptanchor-free strategies.
Applications:
● AutonomousVehicles:Objectdetectioniscrucialforautonomousvehicles to identify
pedestrians, vehicles, and other obstacles.
● SurveillanceandSecurity:Usedinsurveillancesystemstodetectand track
objects or individuals of interest.
● Retail:Appliedinretailforinventorymanagementandcustomerbehavior analysis.
● MedicalImaging:Objectdetectionisusedtoidentifyandlocate
abnormalities in medical images.
● AugmentedReality:UtilizedforrecognizingandtrackingobjectsinAR
applications.
Challenges:
● ScaleVariations:Objectscanappearatdifferentscalesinimages, requiring
detectors to be scale-invariant.
● Occlusions:Handlingsituationswhereobjectsarepartiallyorfully occluded
by other objects.
● Real-timeProcessing:Achievingreal-timeperformanceforapplications like video
analysis and robotics.
EvaluationMetrics:
● IntersectionoverUnion(IoU):Measurestheoverlapbetweenthepredicted and ground
truth bounding boxes.
● PrecisionandRecall:Metricstoevaluatethetrade-offbetweencorrectly detected
objects and false positives.
DeepLearninginObjectDetection:
● ConvolutionalNeuralNetworks(CNNs):Deeplearning,especiallyCNNs, has
significantly improved object detection accuracy.
● Region-basedCNNs(R-CNN):Introducedtheideaofregionproposal networks
to improve object localization.

B.Tech [AIML/DS]
EAIDS254 – COMPUTER VISION JEPPIAAR UNIVERSITY

● SingleShotMultiboxDetector(SSD),YouOnlyLookOnce(YOLO):
One-stagedetectorsthatarefasterandsuitableforreal-timeapplications.
TransferLearning:
● Pre-trainedModels:Transferlearninginvolvesusingpre-trainedmodelson large
datasets and fine-tuning them for specific object detection tasks.
● PopularArchitectures:ModelslikeResNet,VGG,andMobileNetareoften used as
backbone architectures for object detection.
RecentAdvancements:
● EfficientDet:Anefficientobjectdetectionmodelthatbalancesaccuracy and
efficiency.
● CenterNet:Focusesonpredictingobjectcentersandregressingbounding box
parameters.
ObjectDetectionDatasets:
● COCO(CommonObjectsinContext):Widelyusedforevaluatingobject detection
algorithms.
● PASCALVOC(VisualObjectClasses):Anotherbenchmarkdatasetfor object
detection tasks.
● ImageNet:Originallyknownforimageclassification,ImageNethasalso been used
for object detection challenges.

Objectdetectionisafundamentaltaskincomputervisionwithwidespreadapplications
acrossvariousindustries.Advancesindeeplearningandtheavailabilityoflarge-scale datasets have
significantly improved the accuracy and efficiency of object detection models in recent years.

7. FaceRecognition:

Definition:

● Face Recognition is a biometric technology that involves identifying and


verifying individuals based on their facial features. It aims to match the
uniquepatternsandcharacteristicsofaperson'sfaceagainstadatabase of known
faces.
Components:
● FaceDetection:Theprocessoflocatingandextractingfacialfeatures from an
image or video frame.

B.Tech [AIML/DS]
EAIDS254 – COMPUTER VISION JEPPIAAR UNIVERSITY

● FeatureExtraction:Capturingdistinctivefeaturesoftheface,suchasthe distances
between eyes, nose, and mouth, and creating a unique
representation.
● MatchingAlgorithm:Comparingtheextractedfeatureswithpre-existing templates
to identify or verify a person.

Methods:
● Eigenfaces:Atechniquethatrepresentsfacesaslinearcombinationsof principal
components.
● LocalBinaryPatterns(LBP):Atexture-basedmethodthatcaptures patterns of
pixel intensities in local neighborhoods.
● Deep Learning: Convolutional Neural Networks (CNNs) have significantly
improvedfacerecognitionaccuracy,witharchitectureslikeFaceNetand VGGFace.
Applications:
● SecurityandAccessControl:Commonlyusedinsecureaccesssystems, unlocking
devices, and building access.
● LawEnforcement:Appliedforidentifyingindividualsincriminal
investigations and monitoring public spaces.
● Retail:Usedforcustomeranalytics,personalizedadvertising,and enhancing
customer experiences.
● Human-ComputerInteraction:Implementedinapplicationsforfacial expression
analysis, emotion recognition, and virtual avatars.
Challenges:
● VariabilityinPose:Recognizingfacesunderdifferentposesand
orientations.
● IlluminationChanges:Handlingvariationsinlightingconditionsthatcan affect the
appearance of faces.

B.Tech [AIML/DS]
EAIDS254 – COMPUTER VISION JEPPIAAR UNIVERSITY

● AgingandEnvironmentalFactors:Adaptingtochangesinappearancedue to aging,
facial hair, or accessories.
PrivacyandEthicalConsiderations:
● DataPrivacy:Concernsaboutthecollectionandstorageoffacialdataand the potential
misuse of such information.
● Bias and Fairness: Ensuring fairness and accuracy, particularly across
diversedemographicgroups,toavoidbiasesinfacerecognitionsystems.
LivenessDetection:
● Definition:Atechniqueusedtodeterminewhetherthepresentedfaceis from a live
person or a static image.
● Importance:Preventsunauthorizedaccessusingphotosorvideostotrick the system.
MultimodalBiometrics:
● Fusionwith OtherModalities: Combining facerecognition with other
biometricmethods,suchasfingerprintoririsrecognition,forimproved accuracy.
Real-time FaceRecognition:
● Applications:Real-timefacerecognitionisessentialforapplicationslike video
surveillance, access control, and human-computer interaction.
● Challenges:Ensuringlowlatencyandhighaccuracyinreal-timescenarios. Benchmark
Datasets:
● LabeledFacesintheWild(LFW):Apopulardatasetforfacerecognition, containing
images collected from the internet.
● CelebA:Datasetwithcelebrityfacesfortrainingandevaluation.
● MegaFace:Benchmarkforevaluatingtheperformanceoffacerecognition systems at
a large scale.

Facerecognitionisarapidlyevolvingfieldwithnumerousapplicationsandongoing
researchtoaddresschallengesandenhanceitscapabilities.Itplaysacrucialrolein various industries,
from security to personalized services, contributing to the advancement of biometric
technologies.

8. Instance Recognition:

Definition:

B.Tech [AIML/DS]
EAIDS254 – COMPUTER VISION JEPPIAAR UNIVERSITY

● Instance Recognition, also known as instance-level recognition or


instance-levelsegmentation,involvesidentifyinganddistinguishing
individual instances of objects or entities within an image or a scene. It
goesbeyondcategory-levelrecognitionbyassigninguniqueidentifiersto different
instances of the same object category.


ObjectRecognition vs. Instance Recognition:
● ObjectRecognition:Identifiesobjectcategoriesinanimagewithout
distinguishing between different instances of the same category.
● InstanceRecognition:Assignsuniqueidentifiers toindividualinstancesof objects,
allowing for differentiation between multiple occurrences of the same category.
SemanticSegmentationandInstanceSegmentation:
● Semantic Segmentation: Assigns a semantic label to each pixel in an
image,indicatingthecategorytowhichitbelongs(e.g.,road,person,car).
● InstanceSegmentation:Extendssemanticsegmentationbyassigninga unique
identifier to each instance of an object, enabling differentiation between
separate objects of the same category.
Methods:
● MaskR-CNN:Apopularinstancesegmentationmethodthatextendsthe FasterR-
CNNarchitecturetoprovidepixel-levelmasksforeachdetected object instance.
● Point-basedMethods:Someinstancerecognitionapproachesoperateon point clouds
or 3D data to identify and distinguish individual instances.
● FeatureEmbeddings:Utilizingdeeplearningmethodstolearn
discriminative feature embeddings for different instances.
Applications:
● AutonomousVehicles:Instancerecognitioniscrucialfordetectingand tracking
individual vehicles, pedestrians, and other objects in the
environment.

B.Tech [AIML/DS]
EAIDS254 – COMPUTER VISION JEPPIAAR UNIVERSITY

● Robotics:Usedforobjectmanipulation,navigation,andscene
understanding in robotics applications.
● AugmentedReality:Enablestheaccurateoverlayofvirtualobjectsonto the real
world by recognizing and tracking specific instances.
● MedicalImaging:Identifyinganddistinguishingindividualstructuresor anomalies
in medical images.
Challenges:
● Occlusions:Handlingsituationswhereobjectspartiallyorfullyocclude each
other.
● ScaleVariations:Recognizinginstancesatdifferentscaleswithinthe same
image or scene.
● ComplexBackgrounds:Dealingwithclutteredorcomplexbackgrounds that may
interfere with instance recognition.
Datasets:
● COCO(CommonObjectsinContext):Whileprimarilyusedforobject
detectionandsegmentation,COCOalsocontainsinstancesegmentation annotations.
● Cityscapes:Adatasetdesignedforurbansceneunderstanding,including pixel-level
annotations for object instances.
● ADE20K:Alarge-scaledatasetforsemanticandinstancesegmentationin diverse
scenes.
EvaluationMetrics:
● IntersectionoverUnion(IoU):Measurestheoverlapbetweenpredicted and
ground truth masks.
● MeanAveragePrecision(mAP):Commonlyusedforevaluatingthe precision
of instance segmentation algorithms.
Real-timeInstanceRecognition:
● Applications:Inscenarioswherereal-timeprocessingiscrucial,suchas robotics,
autonomous vehicles, and augmented reality.
● Challenges:Balancingaccuracywithlow-latencyrequirementsfor real-time
performance.
FutureDirections:
● WeaklySupervised Learning: Exploring methodsthat require less
annotationeffort,suchasweaklysupervisedorself-supervisedlearning for instance
recognition.
● Cross-ModalInstanceRecognition:Extendinginstancerecognitionto
operateacrossdifferentmodalities,suchascombiningvisualandtextual information
for more comprehensive recognition.

B.Tech [AIML/DS]
EAIDS254 – COMPUTER VISION JEPPIAAR UNIVERSITY

Instancerecognitionisafundamentaltaskincomputervisionthatenhancesourability
tounderstandandinteractwiththevisualworldbyprovidingdetailedinformationabout individual
instances of objects or entities within a scene.

9. CategoryRecognition:

Definition:

● CategoryRecognition,alsoknownasobjectcategoryrecognitionorimage
categorization, involves assigning a label or category to an entire image
based on the objects or scenes it contains. The goal is to identify the
overallcontentorthemeofanimagewithoutnecessarilydistinguishing individual
instances or objects within it.
Scope:
● Whole-ImageRecognition:Categoryrecognitionfocusesonrecognizing and
classifying the entire content of an image rather than identifying
specificinstancesordetailswithintheimage.


Methods:
● ConvolutionalNeuralNetworks(CNNs):Deeplearningmethods,
particularlyCNNs,haveshownsignificantsuccessinimagecategorization tasks,
learning hierarchical features.
● Bag-of-Visual-Words:Traditionalcomputervisionapproachesthat
representimagesashistogramsofvisualwordsbasedonlocalfeatures.
● TransferLearning:Leveragingpre-trainedmodelsonlargedatasetsand fine-tuning
them for specific category recognition tasks.
Applications:
● ImageTagging:Automaticallyassigningrelevanttagsorlabelstoimages for
organization and retrieval.

B.Tech [AIML/DS]
EAIDS254 – COMPUTER VISION JEPPIAAR UNIVERSITY

● Content-BasedImageRetrieval(CBIR):Enablingtheretrievalofimages based on
their content rather than textual metadata.
● VisualSearch:Poweringapplicationswhereuserscansearchforsimilar images by
providing a sample image.
Challenges:
● Intra-classVariability:Dealingwithvariationswithinthesamecategory, such as
different poses, lighting conditions, or object appearances.
● Fine-grainedCategorization:Recognizingsubtledifferencesbetween closely
related categories.
● HandlingClutter:Recognizingthemaincategoryinimageswithcomplex
backgrounds or multiple objects.
Datasets:
● ImageNet:Alarge-scaledatasetcommonlyusedforimageclassification tasks,
consisting of a vast variety of object categories.
● CIFAR-10andCIFAR-100:Datasetswithsmallerimagesandmultiple
categories,oftenusedforbenchmarkingimagecategorizationmodels.
● OpenImages:Adatasetwithalargenumberofannotatedimages covering
diverse categories.
EvaluationMetrics:
● Top-kAccuracy:Measurestheproportionofimagesforwhichthecorrect category is
among the top-k predicted categories.
● ConfusionMatrix:Providesadetailedbreakdownofcorrectandincorrect predictions
across different categories.
Multi-LabelCategorization:
● Definition:Extendscategoryrecognitiontohandlecaseswhereanimage may belong
to multiple categories simultaneously.
● Applications:Usefulinscenarioswhereimagescanhavecomplexcontent that falls
into multiple distinct categories.
Real-worldApplications:
● E-commerce:Categorizingproductimagesforonlineshoppingplatforms.
● ContentModeration:Identifyingandcategorizingcontentformoderation purposes,
such as detecting inappropriate or unsafe content.
● AutomatedTagging:Automaticallycategorizingandtaggingimagesin digital
libraries or social media platforms.
FutureTrends:
● WeaklySupervised Learning: Exploring methodsthat require less
annotateddatafortraining,suchasweaklysupervisedorself-supervised learning for
category recognition.

B.Tech [AIML/DS]
EAIDS254 – COMPUTER VISION JEPPIAAR UNIVERSITY

● InterpretableModels:Developingmodelsthatprovideinsightsintothe decision-
makingprocessforbetterinterpretabilityandtrustworthiness.

Categoryrecognitionformsthebasisforvariousapplicationsinimageunderstanding
andretrieval,providingawaytoorganizeandinterpretvisualinformationatabroader
level.Advancesindeeplearningandtheavailabilityoflarge-scaledatasetscontinueto drive
improvements in the accuracy and scalability of category recognition models.

10. ContextandSceneUnderstanding:

Definition:

● ContextandSceneUnderstandingincomputervisioninvolves
comprehendingtheoverallcontextofascene,recognizingrelationships
betweenobjects,andunderstandingthesemanticmeaningofthevisual elements
within an image or a sequence of images.
SceneUnderstandingvs.ObjectRecognition:
● ObjectRecognition:Focusesonidentifyingandcategorizingindividual objects
within an image.
● Scene Understanding: Encompasses a broader understanding of the
relationships,interactions,andcontextualinformationthatcharacterize the overall
scene.
ElementsofContextandSceneUnderstanding:
● SpatialRelationships:Understandingthespatialarrangementandrelative positions of
objects within a scene.
● TemporalContext:Incorporatinginformationfromasequenceofimages or frames
to understand changes and dynamics over time.
● SemanticContext:Recognizingthesemanticrelationshipsandmeanings associated
with objects and their interactions.
Methods:
● Graph-based Representations: Modeling scenes as graphs, where nodes
representobjectsandedgesrepresentrelationships,tocapturecontextual information.
● RecurrentNeuralNetworks(RNNs)andLongShort-TermMemory(LSTM):
Utilizingrecurrentarchitecturesforprocessingsequencesofimagesand capturing
temporal context.

B.Tech [AIML/DS]
EAIDS254 – COMPUTER VISION JEPPIAAR UNIVERSITY

● GraphNeuralNetworks(GNNs):ApplyingGNNstomodelcomplex
relationships and dependencies in scenes.
Applications:
● AutonomousVehicles:Sceneunderstandingiscriticalforautonomous
navigation,asitinvolvescomprehendingtheroad,traffic,anddynamic elements in
the environment.
● Robotics:Enablingrobotstounderstandandnavigatethroughindoorand outdoor
environments.
● AugmentedReality:Integratingvirtualobjectsintotherealworldinaway that
considers the context and relationships with the physical
environment.
● SurveillanceandSecurity:Enhancingtheanalysisofsurveillancefootage by
understanding activities and anomalies in scenes.
Challenges:
● Ambiguity:Scenescanbeambiguous,andobjectsmayhavemultiple
interpretations depending on context.
● ScaleandComplexity:Handlinglarge-scalesceneswithnumerousobjects and complex
interactions.
● DynamicEnvironments:Adaptingtochangesinscenesovertime, especially
in dynamic and unpredictable environments.
SemanticSegmentationandSceneParsing:
● SemanticSegmentation:Assigningsemanticlabelstoindividualpixelsin an image,
providing a detailed understanding of object boundaries.
● SceneParsing:Extendingsemanticsegmentationtorecognizeand understand
the overall scene layout and context.
HierarchicalRepresentations:
● MultiscaleRepresentations:Capturinginformationatmultiplescales,from individual
objects to the overall scene layout.
● HierarchicalModels:Employinghierarchicalstructurestorepresent objects,
sub-scenes, and the global context.
Context-AwareObjectRecognition:
● Definition:Enhancingobjectrecognitionbyconsideringthecontextual
information surrounding objects.
● Example:Understandingthata"bat"inascenewithaballandagloveis likely
associated with the sport of baseball.
FutureDirections:
● Cross-Modal Understanding: Integrating information from different
modalities,suchascombiningvisualandtextualinformationforamore
comprehensive understanding.

B.Tech [AIML/DS]
EAIDS254 – COMPUTER VISION JEPPIAAR UNIVERSITY

● ExplainabilityandInterpretability:Developingmodelsthatcanprovide
explanations for their decisions to enhance transparency and trust.

Contextandsceneunderstandingareessentialforcreatingintelligentsystemsthatcan interpret and


interact with the visual world in a manner similar to human perception.
Ongoingresearchinthisfieldaimstoimprovetherobustness,adaptability,and
interpretabilityofcomputervisionsystemsindiversereal-worldscenarios.

11. RecognitionDatabasesandTestSets:

Recognitiondatabasesandtestsetsplayacrucialroleinthedevelopmentand evaluation of
computer vision algorithms, providing standardized datasets for
training,validating,andbenchmarkingvariousrecognitiontasks.Thesedatasets
oftencoverawiderangeofdomains,fromobjectrecognitiontoscene
understanding.Herearesomecommonlyusedrecognitiondatabasesandtest sets:

ImageNet:
● Task:ImageClassification,Object Recognition
● Description:ImageNetLargeScaleVisualRecognitionChallenge(ILSVRC) is a
widely used dataset for image classification and object detection. It includes
millions of labeled images across thousands of categories.
COCO(CommonObjectsinContext):
● Tasks:ObjectDetection,InstanceSegmentation,KeypointDetection
● Description:COCOisalarge-scaledatasetthatincludescomplexscenes with
multiple objects and diverse annotations. It is commonly used for evaluating
algorithms in object detection and segmentation tasks.
PASCALVOC(VisualObjectClasses):
● Tasks:Object Detection,Image Segmentation, ObjectRecognition
● Description:PASCALVOCdatasetsprovideannotatedimageswithvarious
objectcategories.Theyarewidelyusedforbenchmarkingobjectdetection and
segmentation algorithms.
MOT(Multiple Object Tracking) Datasets:
● Task:MultipleObjectTracking
● Description:MOTdatasetsfocusontrackingmultipleobjectsinvideo sequences.
They include challenges related to object occlusion,
appearancechanges,andinteractions.

B.Tech [AIML/DS]
EAIDS254 – COMPUTER VISION JEPPIAAR UNIVERSITY

KITTIVisionBenchmarkSuite:
● Tasks:ObjectDetection,Stereo,VisualOdometry
● Description:KITTI dataset isdesigned for autonomousdriving research
andincludestaskssuchasobjectdetection,stereoestimation,andvisual odometry
using data collected from a car.
ADE20K:
● Tasks:SceneParsing,SemanticSegmentation
● Description:ADE20Kisadatasetforsemanticsegmentationandscene
parsing.Itcontainsimageswithdetailedannotationsforpixel-levelobject categories
and scene labels.
Cityscapes:
● Tasks:SemanticSegmentation,InstanceSegmentation
● Description:Cityscapes dataset focuseson urban scenesand is
commonlyusedforsemanticsegmentationandinstancesegmentation tasks in the
context of autonomous driving and robotics.
CelebA:
● Tasks:FaceRecognition,AttributeRecognition
● Description:CelebAisadatasetcontainingimagesofcelebritieswith annotations
for face recognition and attribute recognition tasks.
LFW(LabeledFacesintheWild):
● Task:FaceVerification
● Description: LFW dataset is widely used for face verification tasks,
consistingofimagesoffacescollectedfromtheinternetwithlabeled pairs of
matching and non-matching faces.
OpenImagesDataset:
● Tasks:ObjectDetection,ImageClassification
● Description:OpenImagesDatasetisalarge-scaledatasetthatincludes
imageswithannotationsforobjectdetection,imageclassification,and visual
relationship prediction.

Theserecognitiondatabasesandtestsetsserveasbenchmarksforevaluatingthe
performanceofcomputervisionalgorithms.Theyprovidestandardizedanddiverse
data,allowingresearchersanddeveloperstocomparetheeffectivenessofdifferent approaches
across a wide range of tasks and applications

B.Tech [AIML/DS]

You might also like