Computer Vision Notes
Computer Vision Notes
UNIT-1
 INTRODUCTION TO IMAGE FORMATION AND PROCESSING
 Computer Vision - Geometric primitives and transformations – Photometric
 image formation-The digital camera-Point operators- Linear filtering - More
 neighborhood operators - Fourier transforms - Pyramids and wavelets -
 Geometric transformations - Global optimization.
1. Computer Vision:
 Computer vision is a multidisciplinary field that enables machines to interpret and make
 decisions based on visual data. It involves the development of algorithms and systems
 that allow computers to gain high-level understanding from digital images or videos. The
 goal of computer vision is to replicate and improve upon human vision capabilities,
 enabling machines to recognize and understand visual information.
 2. Object Detection: Locating and classifying multiple objects within an image or video
 stream.
B.Tech [AIML/DS]
EAIDS254 – COMPUTER VISION JEPPIAAR UNIVERSITY
8. 3DReconstruction:Creating
 Computer vision applications are diverse and found in various fields, including
 healthcare (medical image analysis), autonomous vehicles, surveillance,
 augmentedreality,robotics,industrialautomation,andmore.Advancesindeep learning,
 especially convolutional neural networks (CNNs), have significantly contributed to the
 progress and success of computer vision tasks by enabling efficient feature learning from
 large datasets.
 Geometric primitives and transformations are fundamental concepts in computer graphics and computer vision.
 They form the basis for representing and manipulating visual elements in both 2D and 3D spaces. Let's explore
 each of these concepts:
 Geometric Primitives:
 1. Points: Represented by coordinates (x, y) in 2D or (x, y, z) in 3D space.
2. Lines and Line Segments: Defined by two points or a point and a direction vector.
 3. Polygons: Closed shapes with straight sides. Triangles, quadrilaterals, and other polygons are common
 geometric primitives.
4. Circles and Ellipses: Defined by a center point and radii (or axes in the case of ellipses).
5. Curves: Bézier curves, spline curves, and other parametric curves are used to represent smooth shapes.
 Geometric Transformations:
Geometric transformations involve modifying the position, orientation, and scale of geometric primitives.
Common transformations include
B.Tech [AIML/DS]
EAIDS254 – COMPUTER VISION JEPPIAAR UNIVERSITY
4. Shearing: Distorts the shape of an object by stretching or compressing along one or more axes.
Applications:
 Computer Graphics: Geometric primitives and transformations are fundamental for rendering 2D and 3D
 graphics in applications such as video games, simulations, and virtual reality.
Computer-Aided Design (CAD): Used for designing and modeling objects in engineering and architecture.
 Computer Vision: Geometric transformations are applied to align and process images, correct distortions, and
 perform other tasks in image analysis.
Robotics: Essential for robot navigation, motion planning, and spatial reasoning.
 Understanding geometric primitives and transformations is crucial for creating realistic and visually appealing
 computer-generated images, as well as for solving various problems in computer vision and robotics.
Photometric image formation refers to the process by which light interacts with surfaces and is captured by a
camera, resulting in the creation of a digital image. This process involves various factors related to the properties
of light, the surfaces of objects, and the characteristics of the imaging system. Understanding photometric Image
formation is crucial in computer vision, computer graphics, and image processing.
B.Tech [AIML/DS]
EAIDS254 – COMPUTER VISION JEPPIAAR UNIVERSITY
Illumination:
- Ambient Light: The overall illumination of a scene that comes from all directions.
- Directional Light: Light coming from a specific direction, which can create highlights and shadows.
Reflection:
- Diffuse Reflection: Light that is scattered in various directions by rough surfaces.
- Specular Reflection: Light that reflects off smooth surfaces in a concentrated direction, creating highlights.
Shading:
- Lambertian Shading: A model that assumes diffuse reflection and constant shading across a surface.
- Phong Shading: A more sophisticated model that considers specular reflection, creating more realistic
highlights.
Surface Properties:
- Reflectance Properties: Material characteristics that determine how light is reflected (e.g., diffuse and specular
reflectance).
- Albedo: The inherent reflectivity of a surface, representing the fraction of incident light that is reflected.
Lighting Models:
- Phong Lighting Model: Combines diffuse and specular reflection components to model lighting.
- Blinn-Phong Model: Similar to the Phong model but computationally more efficient.
Shadows:
- Cast Shadows: Darkened areas on surfaces where light is blocked by other objects.
- Self Shadows: Shadows cast by parts of an object onto itself.
B.Tech [AIML/DS]
EAIDS254 – COMPUTER VISION JEPPIAAR UNIVERSITY
Cameras:
- Camera Exposure: The amount of light allowed to reach the camera sensor or film.
- Camera Response Function: Describes how a camera responds to light of different intensities.
A digital camera is an electronic device that captures and stores digital images. It differs from traditional film
cameras in that it uses electronic sensors to record images rather than photographic film. Digital cameras have
become widespread due to their convenience, ability to instantly review images, and ease of sharing and storing
photos digitally. Here are key components and concepts related to digital cameras:
Image Sensor:
- Digital cameras use image sensors (such as CCD or CMOS) to convert light into electrical signals.
- The sensor captures the image by measuring the intensity of light at each pixel location.
Lens:
- The lens focuses light onto the image sensor.
- Zoom lenses allow users to adjust the focal length, providing optical zoom.
Aperture:
- The aperture is an adjustable opening in the lens that controls the amount of light entering the camera.
Shutter:
- The shutter mechanism controls the duration of light exposure to the image sensor.
- Fast shutter speeds freeze motion, while slower speeds create motion blur.
Image Processor:
- Digital cameras include a built-in image processor to convert raw sensor data into a viewable image.
- Image processing algorithms may enhance color, sharpness, and reduce noise.
Memory Card:
- Digital images are stored on removable memory cards, such as SD or CF cards.
- Memory cards provide a convenient and portable way to store and transfer images.
White Balance:
- White balance settings adjust the color temperature of the captured image to match different lighting
conditions.
Connectivity:
- USB, HDMI, or wireless connectivity allows users to transfer images to computers, share online, or connect to
other devices.
Battery:
- Digital cameras are powered by rechargeable batteries, providing the necessary energy for capturing and
processing images.
B.Tech [AIML/DS]
EAIDS254 – COMPUTER VISION JEPPIAAR UNIVERSITY
 5. Point operators:
Point operators, also known as point processing or pixel-wise operations, are basic image processing operations that
operate on individual pixels independently. These operations are applied to each pixel in an image without considering the
values of neighboring pixels. Point operators typically involve mathematical operations or functions that transform the
pixel values, resulting in changes to the image's appearance. Here are some common point operators:
Brightness Adjustment:
- Addition/Subtraction: Increase or decrease the intensity of all pixels by adding or subtracting a constant value.
- Multiplication/Division: Scale the intensity values by multiplying or dividing them by a constant factor.
Contrast Adjustment:
- Linear Contrast Stretching: Rescale the intensity values to cover the full dynamic range.
- Histogram Equalization: Adjust the distribution of pixel intensities to enhance contrast.
Gamma Correction:
- Adjust the gamma value to control the overall brightness and contrast of an image.
Thresholding:
- Convert a grayscale image to binary by setting a threshold value. Pixels with values above the threshold become white,
and those below become black.
Bit-plane Slicing:
- Decompose an image into its binary representation by considering individual bits.
Color Mapping:
- Apply color transformations to change the color balance or convert between color spaces (e.g., RGB to grayscale).
Inversion:
- Invert the intensity values of pixels, turning bright areas dark and vice versa.
Image Arithmetic:
- Perform arithmetic operations between pixels of two images, such as addition, subtraction, multiplication, or division.
B.Tech [AIML/DS]
EAIDS254 – COMPUTER VISION JEPPIAAR UNIVERSITY
B.Tech [AIML/DS]
EAIDS254 – COMPUTER VISION JEPPIAAR UNIVERSITY
Point operators are foundational in image processing and form the basis for more complex operations. They are
often used in combination to achieve desired enhancements or modifications to images. These operations are
computationally efficient, as they can be applied independently to each pixel, making them suitable for real-time
applications and basic image manipulation tasks.
It's important to note that while point operators are powerful for certain tasks, more advanced image processing
techniques, such as filtering and convolution, involve considering the values of neighboring pixels and are
applied to local image regions.
Linear filtering:
Linear filtering is a fundamental concept in image processing that involves applying a linear operator to an
image. The linear filter operates on each pixel in the image by combining its value with the values of its
neighboring pixels according to a predefined convolution kernel or matrix. The convolution operation is a
mathematical operation that computes the weighted sum of pixel values in the image, producing a new value for
the center pixel.
Where:
B.Tech [AIML/DS]
EAIDS254 – COMPUTER VISION JEPPIAAR UNIVERSITY
Edge Detection:
- Sobel filter: Emphasizes edges by computing gradients in the x and y directions.
- Prewitt filter: Similar to Sobel but uses a different kernel for gradient computation.
Sharpening:
- Laplacian filter: Enhances high-frequency components to highlight edges.
- High-pass filter: Emphasizes details by subtracting a blurred version of the image.
Embossing:
- Applies an embossing effect by highlighting changes in intensity.
B.Tech [AIML/DS]
EAIDS254 – COMPUTER VISION JEPPIAAR UNIVERSITY
Linear filtering is a versatile technique and forms the basis for more advanced image processing operations. The
convolution operation can be efficiently implemented using convolutional neural networks (CNNs) in deep
learning, where filters are learned during the training process to perform tasks such as image recognition,
segmentation, and denoising. The choice of filter kernel and parameters determines the specific effect achieved
through linear filtering.
 6. More neighborhood operators :
Neighborhood operators in image processing involve the consideration of pixel values in the vicinity of a target
pixel, usually within a defined neighborhood or window. Unlike point operators that operate on individual pixels,
neighborhood operators take into account the local structure of the image. Here are some common neighborhood
operators:
B.Tech [AIML/DS]
EAIDS254 – COMPUTER VISION JEPPIAAR UNIVERSITY
Median Filter:
- Computes the median value of pixel intensities within a local neighborhood.
- Effective for removing salt-and-pepper noise while preserving edges.
Gaussian Filter:
- Applies a weighted average to pixel values using a Gaussian distribution.
- Used for blurring and smoothing, with the advantage of preserving edges.
Bilateral Filter:
- Combines spatial and intensity information to smooth images while preserving edges.
- Uses two Gaussian distributions, one for spatial proximity and one for intensity similarity.
Anisotropic Diffusion:
- Reduces noise while preserving edges by iteratively diffusing intensity values along edges.
- Particularly useful for images with strong edges.
Morphological Operators:
- Dilation: Expands bright regions by considering the maximum pixel value in a neighborhood.
Erosion:
- Contracts bright regions by considering the minimum pixel value in a neighborhood.
- Used for operations like noise reduction, object segmentation, and shape analysis.
Homomorphic Filtering:
- Adjusts image intensity by separating the image into illumination and reflectance components.
- Useful for enhancing images with non-uniform illumination.
These neighborhood operators play a crucial role in image enhancement, denoising, edge detection, and other
image processing tasks. The choice of operator depends on the specific characteristics of the image and the
desired outcome.
B.Tech [AIML/DS]
EAIDS254 – COMPUTER VISION JEPPIAAR UNIVERSITY
7. Fourier transforms:
Fourier transforms play a significant role in computer vision for analyzing and processing images. They are used
to decompose an image into its frequency components, providing valuable information for tasks such as image
filtering, feature extraction, and pattern recognition. Here are some ways Fourier transforms are employed in
computer vision:
Frequency Analysis:
- Fourier transforms help in understanding the frequency content of an image. High-frequency components
correspond to edges and fine details, while low-frequency components represent smooth regions.
Image Filtering:
Filtering in the frequency domain allows for efficient operations such as blurring or sharpening. Low-pass filters
remove high-frequency noise, while high-pass filters enhance edges and fine details.
Image Enhancement:
- Adjusting the amplitude of specific frequency components can enhance or suppress certain features in an
image. This is commonly used in image enhancement techniques.
Texture Analysis:
- Fourier analysis is useful in characterizing and classifying textures based on their frequency characteristics. It
helps distinguish between textures with different patterns.
Pattern Recognition:
- Fourier descriptors, which capture shape information, are used for representing and recognizing objects in
images. They provide a compact representation of shape by capturing the dominant frequency components.
Image Compression:
- Transform-based image compression, such as JPEG compression, utilizes Fourier transforms to transform
image data into the frequency domain. This allows for efficient quantization and coding of frequency
components.
Image Registration:
- Fourier transforms are used in image registration, aligning images or transforming them to a common
coordinate system. Cross-correlation in the frequency domain is often employed for this purpose.
Homomorphic Filtering:
- Homomorphic filtering, which involves transforming an image to a logarithmic domain using Fourier
transforms, is used in applications such as document analysis and enhancement.
B.Tech [AIML/DS]
EAIDS254 – COMPUTER VISION JEPPIAAR UNIVERSITY
Image Reconstruction:
- Fourier transforms are involved in techniques like computed tomography (CT) or magnetic resonance imaging
(MRI) for reconstructing images from their projections.
The efficient computation of Fourier transforms, particularly through the use of the Fast Fourier Transform (FFT)
algorithm, has made these techniques computationally feasible for real-time applications in computer vision. The ability to
analyze images in the frequency domain provides valuable insights and contributes to the development of advanced image
processing techniques.
Image Pyramids:
Image pyramids are a series of images representing the same scene but at different resolutions. There are two main types of
image pyramids:
Gaussian Pyramid:
- Created by repeatedly applying Gaussian smoothing and downsampling to an image.
- At each level, the image is smoothed to remove high-frequency information, and then it is subsampled to reduce its size.
- Useful for tasks like image blending, image matching, and coarse-to-fine image processing.
Laplacian Pyramid:
- Derived from the Gaussian pyramid.
- Each level of the Laplacian pyramid is obtained by subtracting the expanded version of the higher level Gaussian pyramid
from the original image.
- Useful for image compression and coding, where the Laplacian pyramid represents the residual information not captured
by the Gaussian pyramid.
Image pyramids are especially useful for creating multi-scale representations of images, which can be beneficial for various
computer vision tasks.
 Wavelets:
Wavelets are mathematical functions that can be used to analyze signals and images. Wavelet transforms provide a multi-
resolution analysis by decomposing an image into approximation (low-frequency) and detail (high-frequency) components.
Key concepts include:
Wavelet Transform:
- The wavelet transform decomposes an image into different frequency components by convolving the image with wavelet
functions.
- The result is a set of coefficients that represent the image at various scales and orientations.
Multi-resolution Analysis:
- Wavelet transforms offer a multi-resolution analysis, allowing the representation of an image at different scales.
- The approximation coefficients capture the low-frequency information, while detail coefficients capture high-frequency
information.
B.Tech [AIML/DS]
EAIDS254 – COMPUTER VISION JEPPIAAR UNIVERSITY
Haar Wavelet:
- The Haar wavelet is a simple wavelet function used in basic wavelet transforms.
- It represents changes in intensity between adjacent pixels.
Wavelet Compression:
- Wavelet-based image compression techniques, such as JPEG2000, utilize wavelet transforms to efficiently represent
image data in both spatial and frequency domains.
Image Denoising:
- Wavelet-based thresholding techniques can be applied to denoise images by thresholding the wavelet coefficients.
Edge Detection:
- Wavelet transforms can be used for edge detection by analyzing the high-frequency components of the image.
Both pyramids and wavelets offer advantages in multi-resolution analysis, but they differ in terms of their representation
and construction. Pyramids use a hierarchical structure of smoothed and subsampled images, while wavelets use a
transform-based approach that decomposes the image into frequency components. The choice between pyramids and
wavelets often depends on the specific requirements of the image processing task at hand.
 8. Geometric transformations :
Geometric transformations are operations that modify the spatial configuration of objects in a digital image. These
transformations are applied to change the position, orientation, scale, or shape of objects while preserving certain geometric
properties. Geometric transformations are commonly used in computer graphics, computer vision, and image processing.
Here are some fundamental geometric transformations:
B.Tech [AIML/DS]
EAIDS254 – COMPUTER VISION JEPPIAAR UNIVERSITY
 1. Translation:
- Description: Moves an object by a specified distance along the x and/or y axes.
- Transformation Matrix (2D):
 2. Rotation:
 ● Description: Rotates an object by a specified angle about a fixed point.
 ● Transformation Matrix(2D):
B.Tech [AIML/DS]
EAIDS254 – COMPUTER VISION JEPPIAAR UNIVERSITY
 3. Scaling:
 ● Description: Changes the size of an object by multiplying its coordinates by
 scaling factors.
 ● Transformation Matrix(2D):
 4. Shearing:
 ● Description: Distorts the shape of an object by varying its coordinates linearly.
 ● Transformation Matrix(2D):
 5. Affine Transformation:
 ● Description:Combines translation, rotation, scaling, and shearing.
B.Tech [AIML/DS]
EAIDS254 – COMPUTER VISION JEPPIAAR UNIVERSITY
● Transformation Matrix(2D):
 6. Perspective Transformation:
 ● Description: Represents a perspective projection, useful for simulating three-
 dimensional effects.
 ● Transformation Matrix(3D):
 7. Projective Transformation:
 ● Description: Generalization of perspective transformation with additional control points.
 ● Transformation Matrix(3D):More complex than the perspective transformation matrix.
 ● Applications: Computer graphics, augmented reality.
B.Tech [AIML/DS]
EAIDS254 – COMPUTER VISION JEPPIAAR UNIVERSITY
These transformations are crucial for various applications, including image manipulation, computer-aided design (CAD),
computer vision, and graphics rendering. Understanding and applying geometric transformations are fundamental skills in
computer science and engineering fields related to digital image processing.
9. Global optimization:
Global optimization is a branch of optimization that focuses on finding the global minimum or maximum of a
function over its entire feasible domain. Unlike local optimization, which aims to find the optimal solution
within a specific region, global optimization seeks the best possible solution across the entire search space.
Global optimization problems are often challenging due to the presence of multiple local optima or complex,
non-convex search spaces.
 Here are key concepts and approaches related to global optimization:
 Concepts:
Objective Function:
- The function to be minimized or maximized.
Feasible Domain:
- The set of input values (parameters) for which the objective function is defined.
Global Minimum/Maximum:
- The lowest or highest value of the objective function over the entire feasible domain.
Local Minimum/Maximum:
 ● A minimum or maximum within a specific region of the feasible domain.
B.Tech [AIML/DS]
EAIDS254 – COMPUTER VISION JEPPIAAR UNIVERSITY
 Approaches:
Grid Search:
- Dividing the feasible domain into a grid and evaluating the objective function at each grid point to find the optimal
solution.
Random Search:
- Randomly sampling points in the feasible domain and evaluating the objective function to explore different regions.
Evolutionary Algorithms:
- Genetic algorithms, particle swarm optimization, and other evolutionary techniques use populations of solutions and
genetic operators to iteratively evolve toward the optimal solution.
Simulated Annealing:
- Inspired by the annealing process in metallurgy, simulated annealing gradually decreases the temperature to allow the
algorithm to escape local optima.
Genetic Algorithms:
- Inspired by biological evolution, genetic algorithms use mutation, crossover, and selection to evolve a population of
potential solutions.
Bayesian Optimization:
- Utilizes probabilistic models to model the objective function and guide the search toward promising regions.
Quasi-Newton Methods:
- Iterative optimization methods that use an approximation of the Hessian matrix to find the optimal solution efficiently.
Global optimization is applied in various fields, including engineering design, machine learning,
finance, and parameter tuning in algorithmic optimization. The choice of a specific global
optimization method depends on the characteristics of the objective function, the dimensionality
of the search space, and the available computational resources.
B.Tech [AIML/DS]
EAIDS254 – COMPUTER VISION JEPPIAAR UNIVERSITY
 UNIT II
 FEATUREDETECTION,MATCHINGANDSEGMENTATION
 Pointsandpatches-Edges-Lines-Segmentation-Activecontours-Splitand merge - Mean
 shift and mode finding - Normalized cuts - Graph cuts and energy-based methods.
1. PointsandPatches:
Points:
 Applications:Pointsarecrucialinvariouscomputervisiontasks,including
 featurematching,imageregistration,andobjecttracking.Algorithmsoften
 detectandusepointsasreferencelocationsforcomparingandanalyzing images.
Patches:
Definition:Patchesaresmall,localizedregionsorsegmentswithinan image.
B.Tech [AIML/DS]
EAIDS254 – COMPUTER VISION JEPPIAAR UNIVERSITY
 while "points" usually refer to specific coordinates or locations within an image, "patches"
 are small, localized regions or segments extracted from images. Both
 conceptsarefundamentalinvariouscomputervisionapplications,providingessential
 informationfortaskssuchasimageanalysis,recognition,andunderstanding.Points
 andpatchesplayacrucialroleintheextractionofmeaningfulfeaturesthatcontribute to the overall
 interpretation of visual data by computer vision systems.
2. Edges
 Inimageprocessingandcomputervision,"edges"refertosignificantchangesin
 intensityorcolorwithinanimage.Edgesoftenrepresentboundariesortransitions
 betweendifferentobjectsorregionsinanimage.Detectingedgesisafundamentalstep in various
 computer vision tasks, as edges contain important information about the
 structureandcontentofanimage. Here are
 key points about edges:
 Definition:
B.Tech [AIML/DS]
EAIDS254 – COMPUTER VISION JEPPIAAR UNIVERSITY
B.Tech [AIML/DS]
EAIDS254 – COMPUTER VISION JEPPIAAR UNIVERSITY
 3. Lines
 Inthecontextofimageprocessingandcomputervision,"lines"refertostraightor
 curvedsegmentswithinanimage.Detectingandanalyzinglinesisafundamental aspect of
 image understanding and is important in various computer vision
 applications.Herearekeypointsaboutlines:
 Definition:
 ● Alineisasetofconnectedpixelswithsimilarcharacteristics,typically representing
 a continuous or approximate curve or straight segment within an image.
 LineDetection:
 ● Linedetectionistheprocessofidentifyingandextractinglinesfroman
 image.HoughTransformisapopulartechniqueusedforlinedetection, especially
 for straight lines.
 TypesofLines:
 ● StraightLines:Linearsegmentswithaconstantslope.
 ● CurvedLines:Non-linearsegmentswithvaryingcurvature.
 ● LineSegments:Partiallineswithastartingandendingpoint.
 Applications:
 ● ObjectDetection:Linescanbeimportantfeaturesinrecognizingand
 understanding objects within an image.
B.Tech [AIML/DS]
EAIDS254 – COMPUTER VISION JEPPIAAR UNIVERSITY
 ● LaneDetection:Inthecontextofautonomousvehicles,detectingand tracking
 lanes on a road.
 ● DocumentAnalysis:Recognizingandextractinglinesoftextindocument images.
 ● IndustrialInspection:Inspectingandanalyzingpatternsorstructuresin
 manufacturing processes.
 Representation:
 ● Linescanberepresentedusingmathematicalequations,suchasthe slope-
 intercept form (y = mx + b) for straight lines.
 Challenges:
 ● Linedetectionmaybeaffectedbynoiseintheimageorvariationsin lighting
 conditions. Robust algorithms are needed to handle these
 challenges.
 LineSegmentation:
 ● Linesegmentationinvolvesdividinganimageintosegmentsbasedonthe presence of
 lines. This is useful in applications like document layout analysis and text
 extraction.
 HoughTransform:
 ● TheHoughTransformisawidelyusedtechniquefordetectinglinesinan
 image.Itrepresentslinesinaparameterspaceandidentifiespeaksinthis space as
 potential lines.
 Inthislinesareimportantfeaturesinimagesandplayacrucialroleincomputervision applications.
 Detecting and understanding lines contribute to tasks such as object
 recognition,imagesegmentation,andanalysisofstructuralpatterns.Thechoiceofline
 detectionmethodsdependsonthespecificcharacteristicsoftheimageandthegoals of the computer
 vision application.
 4. Segmentation
 Imagesegmentationisacomputervisiontaskthatinvolvespartitioninganimageinto meaningful and
 semantically coherent regions or segments. The goal is to group
 togetherpixelsorregionsthatsharesimilarvisualcharacteristics,suchascolor,texture, or intensity.
 Image segmentation is a crucial step in various computer vision
 applicationsasitprovidesamoredetailedandmeaningfulunderstandingofthecontent within an image.
 Here are key points about image segmentation:
Definition:
B.Tech [AIML/DS]
EAIDS254 – COMPUTER VISION JEPPIAAR UNIVERSITY
 ● Imagesegmentationistheprocessofdividinganimageintodistinctand
 meaningfulsegments.Eachsegmenttypicallycorrespondstoaregionor object in the
 image.
 Purpose:
 ● Segmentationisusedtosimplifytherepresentationofanimage,makingit easier to
 analyze and understand. It helps in identifying and delineating
 differentobjectsorregionswithintheimage.
 TypesofSegmentation:
 ● SemanticSegmentation:Assigningaspecificclasslabeltoeachpixelin
 theimage,resultinginadetailedunderstandingoftheobjectcategories present.
 ● InstanceSegmentation:Identifyinganddelineatingindividualinstancesof objects
 within the image. Each instance is assigned a unique label.
 ● BoundaryorEdge-basedSegmentation:Detectingedgesorboundaries between
 different regions in the image.
 ● Region-basedSegmentation:Groupingpixelsintohomogeneousregions based on
 similarity criteria.
 Algorithms:
 ● Variousalgorithmsareusedforimagesegmentation,including
 region-growingmethods,clusteringalgorithms(e.g.,K-means),watershed algorithms,
 and deep learning-based approaches using convolutional
 neuralnetworks(CNNs).
 Applications:
 ● ObjectRecognition:Segmentationhelpsinisolatingandrecognizing individual
 objects within an image.
 ● MedicalImaging:Identifyingandsegmentingstructuresoranomaliesin medical
 images.
 ● AutonomousVehicles:Segmentingtheenvironmenttodetectand
 understand objects on the road.
 ● Satellite Image Analysis: Partitioning satellite images into meaningful regions
 for land cover classification.
 ● Robotics:Enablingrobotstounderstandandinteractwiththeir
 environment by segmenting objects and obstacles.
 Challenges:
 ● Imagesegmentationcanbechallengingduetovariationsinlighting,
 complexobjectshapes,occlusions,andthepresenceofnoiseinthe image.
 EvaluationMetrics:
B.Tech [AIML/DS]
EAIDS254 – COMPUTER VISION JEPPIAAR UNIVERSITY
 ● Commonmetricsforevaluatingsegmentationalgorithmsinclude
 IntersectionoverUnion(IoU),Dicecoefficient,andPixelAccuracy.
5. ActiveContours
 Activecontours,alsoknownassnakes,areaconceptincomputervisionandimage
 processingthatreferstodeformablemodelsusedforimagesegmentation.Theidea
 behindactivecontoursistoevolveacurveorcontourwithinanimageinawaythat
 capturestheboundariesofobjectsorregionsofinterest.Thesecurvesdeformunder the influence of
 internal forces (encouraging smoothness) and external forces
 (attractedtofeaturesintheimage).
B.Tech [AIML/DS]
EAIDS254 – COMPUTER VISION JEPPIAAR UNIVERSITY
Keyfeaturesofactivecontoursinclude:
 Initialization:
 ● Activecontoursaretypicallyinitializedneartheboundariesoftheobjects to be
 segmented. The initial contour can be a closed curve or an open
 curvedependingontheapplication.
 EnergyMinimization:
 ● The evolution of the active contour is guided by an energy function that
 combinesinternalandexternalforces.Thegoalistominimizethisenergy to achieve an
 optimal contour that fits the boundaries of the object.
 InternalForces:
 ● Internalforcesareassociatedwiththedeformationofthecontouritself.
 Theyincludetermsthatencouragesmoothnessandcontinuityofthe
 curve.Theinternalenergyhelpspreventthecontourfromoscillatingor exhibiting
 unnecessary deformations.
 ExternalForces:
 ● Externalforcesarederivedfromtheimagedataanddrivethecontour
 towardtheboundariesofobjects.Theseforcesareattractedtofeatures such as edges,
 intensity changes, or texture gradients in the image.
 SnakesAlgorithm:
 ● Thesnakesalgorithmisawell-knownmethodforactivecontourmodeling. It was
 introduced by Michael Kass, Andrew Witkin, and Demetri Terzopoulos in 1987.
 The algorithm involves iterative optimization of the
 energyfunctiontodeformthecontour.
 Applications:
 ● Activecontoursareusedinvariousimagesegmentationapplications,
 suchasmedicalimageanalysis,objecttracking,andcomputervision tasks where
 precise delineation of object boundaries is required.
 Challenges:
 ● Activecontoursmayfacechallengesinthepresenceofnoise,weak edges, or
 complex object structures. Careful parameter tuning and
 initializationareoftenrequired.
 Variations:
 ● Therearevariationsofactivecontours,includinggeodesicactivecontours and level-set
 methods, which offer different formulations for contour
 evolutionandsegmentation.
 Activecontoursprovideaflexibleframeworkforinteractiveandsemi-automatic
 segmentationbyallowinguserstoguidetheevolutionofthecontour.Whiletheyhave
B.Tech [AIML/DS]
EAIDS254 – COMPUTER VISION JEPPIAAR UNIVERSITY
 6. SplitandMerge
 SplitandMergeisarecursiveimagesegmentationalgorithmthatdividesanimageinto
 homogeneousregionsbasedoncertaincriteria.Theprimaryideabehindthealgorithm
 istorecursivelysplitanimageintosmallerblocksuntilcertainconditionsaremet,and
 thenmergethoseblocksiftheyaresufficientlyhomogeneous.Thisprocesscontinues
 iterativelyuntilthedesiredlevelofsegmentationisachieved.
 HereisanoverviewoftheSplitandMergealgorithm: Splitting
 Phase:
 ● Thealgorithmstartswiththeentireimageasasingleblock.
 ● Itevaluatesasplittingcriteriontodetermineiftheblockissufficiently
 homogeneous or should be split further.
 ● Ifthesplittingcriterionismet,theblockisdividedintofourequal
 sub-blocks(quadrants),andtheprocessisappliedrecursivelytoeach sub-block.
B.Tech [AIML/DS]
EAIDS254 – COMPUTER VISION JEPPIAAR UNIVERSITY
 MergingPhase:
 ● Oncetherecursivesplittingreachesacertainlevelorthesplittingcriterion is no longer
 satisfied, the merging phase begins.
 ● Adjacentblocksareexaminedtocheckiftheyarehomogeneousenough to be
 merged.
 ● Ifthemergingcriterionissatisfied,neighboringblocksaremergedintoa larger block.
 ● Themergingprocesscontinuesuntilnofurthermergingispossible,and the
 segmentation is complete.
 HomogeneityCriteria:
 ● The homogeneity of a block or region is determined based on certain criteria,
 such as color similarity, intensity, or texture. For example, blocks
 maybeconsideredhomogeneousifthevarianceofpixelvalueswithinthe block is
 below a certain threshold.
 RecursiveProcess:
 ● Thesplittingandmergingphasesareappliedrecursively,leadingtoa hierarchical
 segmentation of the image.
 Applications:
 ● Split and Merge can be used for image segmentation in various
 applications,includingobjectrecognition,sceneanalysis,andcomputer vision tasks
 where delineation of regions is essential.
 Challenges:
 ● The performance of Split and Merge can be affected by factors such as
 noise,unevenlighting,orthepresenceofcomplexstructuresintheimage.
 The Split and Merge algorithm provides a way to divide an image into regions of homogeneous
 content, creating a hierarchical structure. While it has been used historically, more recent image
 segmentation methods often involve advanced techniques,suchasmachinelearning-
 basedapproaches(e.g.,convolutionalneural networks)orotherregion-
 growingalgorithms.Thechoiceofsegmentationmethod
 dependsonthecharacteristicsoftheimagesandthespecific requirementsofthe application.
 7. MeanShiftandModeFinding
 MeanShiftisanon-parametricclusteringalgorithmcommonlyusedforimage
 segmentationandobjecttracking.Thealgorithmworksbyiterativelyshiftingasetof
 datapointstowardsthemodeorpeakofthedatadistribution.Inthecontextofimage
B.Tech [AIML/DS]
EAIDS254 – COMPUTER VISION JEPPIAAR UNIVERSITY
 processing,MeanShiftcanbeappliedtogrouppixelswithsimilarcharacteristicsinto coherent
 segments.
 Here'sabriefoverviewoftheMeanShiftalgorithm: Kernel
 Density Estimation:
 ● Thealgorithmbeginsbyestimatingtheprobabilitydensityfunction(PDF)
 oftheinputdatapoints.Thisistypicallydoneusingakernelfunction,such as a Gaussian
 kernel.
 Initialization:
 ● Eachdatapointisconsideredasacandidateclustercenter. Mean Shift
 Iterations:
 ● Foreachdatapoint,ameanshiftvectoriscomputed.Themeanshift vector points
 towards the mode or peak of the underlying data
 distribution.
 ● Datapointsareiterativelyshiftedinthedirectionofthemeanshiftvector until
 convergence.
 ConvergenceCriteria:
 ● Thealgorithmconvergeswhenthemeanshiftvectorsbecomeverysmall or when the
 points reach local modes in the data distribution.
 ClusterAssignment:
 ● Afterconvergence,datapointsthatconvergetothesamemodeare assigned to
 the same cluster.
Now,let'stalkaboutmodefinding:
 Instatisticsanddataanalysis,a"mode"referstothevalueorvaluesthatappearmost frequently in a
 dataset. Mode finding, in the context of Mean Shift or other clustering algorithms, involves
 identifying the modes or peaks in the data distribution.
ForMeanShift:
 ● ModeFindinginMeanShift:
 ● Themeanshiftprocessinvolvesiterativelyshiftingtowardsthemodesof the
 underlying data distribution.
B.Tech [AIML/DS]
EAIDS254 – COMPUTER VISION JEPPIAAR UNIVERSITY
 MeanShiftisanalgorithmthatperformsmodefindingtoidentifyclustersinadataset. In image
 processing, it is often used for segmentation by iteratively shifting towards modes in the color
 or intensity distribution, effectively grouping pixels into coherent
 segments.
 8. Normalized Cuts
 NormalizedCutsisagraph-basedimagesegmentationalgorithmthatseekstodivide
 animageintomeaningfulsegmentsbyconsideringboththesimilaritybetweenpixels
 andthedissimilaritybetweendifferentsegments.ItwasintroducedbyJianboShiand Jitendra Malik
 in 2000 and has been widely used in computer vision and image
 processing.
 Here'sahigh-leveloverviewoftheNormalizedCutsalgorithm: Graph
 Representation:
 ● Theimageisrepresentedasanundirectedgraph,whereeachpixelisa node in the
 graph, and edges represent relationships between pixels. Edges are weighted
 based on the similarity between pixel values.
 AffinityMatrix:
 ● An affinity matrix is constructed to capture the similarity between pixels.
 Theentriesofthismatrixrepresenttheweightsofedgesinthegraph,and
 thevaluesaredeterminedbyasimilaritymetric,suchascolorsimilarityor texture
 similarity.
 SegmentationObjective:
B.Tech [AIML/DS]
EAIDS254 – COMPUTER VISION JEPPIAAR UNIVERSITY
 Normalized Cuts has been widely used in image segmentation tasks, especially when capturing
 global structures and relationships between pixels is essential. It has
 applicationsincomputervision,medicalimageanalysis,andotherareaswhereprecise segmentation is
 crucial.
 9. GraphCutsandEnergy-BasedMethods
 Graphcutsandenergy-basedmethodsarewidelyusedincomputervisionandimage processing for
 solving optimization problems related to image segmentation. These
 methodsoftenleveragegraphrepresentationsofimagesanduseenergyfunctionsto model the desired
 properties of the segmentation.
B.Tech [AIML/DS]
EAIDS254 – COMPUTER VISION JEPPIAAR UNIVERSITY
 GraphCuts:
 Graphcutsinvolvepartitioningagraphintotwodisjointsetssuchthatthecutcost(the
 sumofweightsofedgescrossingthecut)isminimized.Inimagesegmentation,pixels
 arerepresentedasnodes,andedgesareweightedbasedonthedissimilaritybetween pixels.
 GraphRepresentation:
 ● Eachpixelisanode,andedgesconnectadjacentpixels.Theweightsof edges reflect
 the dissimilarity between pixels (e.g., color, intensity).
 EnergyMinimization:
 ● The problem is formulated as an energy minimization task, where the
 energyfunctionincludestermsencouragingsimilaritywithinsegments and
 dissimilarity between segments.
 BinaryGraphCut:
 ● Inthesimplestcase,thegoalistopartitionthegraphintotwosets
 (foregroundandbackground)byfindingthecutwiththeminimumenergy.
 MulticlassGraphCut:
 ● Theapproachcanbeextendedtohandlemultipleclassesorsegmentsby using
 techniques like the normalized cut criterion.
 Applications:
 ● Graphcutsareusedinimagesegmentation,objectrecognition,stereo vision, and
 other computer vision tasks.
 Energy-BasedMethods:
 Energy-basedmethodsinvolveformulatinganenergyfunctionthatmeasuresthequality
 ofaparticularconfigurationorassignmentoflabelstopixels.Theoptimizationprocess
B.Tech [AIML/DS]
EAIDS254 – COMPUTER VISION JEPPIAAR UNIVERSITY
aimstofindthelabelassignmentthatminimizestheenergy.
 EnergyFunction:
 ● Theenergyfunctionisdefinedbasedonfactorssuchasdataterms
 (measuringagreementwithobserveddata)andsmoothnessterms
 (encouraging spatial coherence).
 UnaryandPairwiseTerms:
 ● Unarytermsareassociatedwithindividualpixelsandcapturethe
 likelihoodofapixelbelongingtoaparticularclass.Pairwisetermsmodel relationships
 between neighboring pixels and enforce smoothness.
 MarkovRandomFields(MRFs)andConditionalRandomFields(CRFs):
 ● MRFsandCRFsarecommonframeworksformodelingenergy-based methods.
 MRFs consider local interactions, while CRFs model dependencies more
 globally.
 IterativeOptimization:
 ● Optimizationtechniqueslikebeliefpropagationorgraphcutsareoften
 usediterativelytofindthelabelassignmentthatminimizestheenergy.
 Applications:
 ● Energy-basedmethodsareappliedinimagesegmentation,image denoising,
 image restoration, and various other vision tasks.
 Bothgraphcutsandenergy-basedmethodsprovidepowerfultoolsforimage
 segmentationbyincorporatinginformationaboutpixelrelationshipsandmodelingthe
 desiredpropertiesofsegmentedregions.Thechoicebetweenthemoftendependson the specific
 characteristics of the problem at hand.
B.Tech [AIML/DS]
EAIDS254 – COMPUTER VISION JEPPIAAR UNIVERSITY
B.Tech [AIML/DS]
EAIDS254 – COMPUTER VISION JEPPIAAR UNIVERSITY
 UNITIII
 FEATURE-BASEDALIGNMENT&MOTIONESTIMATION
 2Dand3Dfeature-basedalignment-Poseestimation-Geometricintrinsic calibration-
 Triangulation-Two-framestructurefrommotion-Factorization
 -Bundleadjustment-Constrainedstructureandmotion-Translational alignment -
 Parametric motion - Spline-based motion - Optical flow - Layered motion.
1. 2Dand3Dfeature-basedalignment:
 Feature-basedalignmentisatechniqueusedincomputervisionandimageprocessing
 toalignormatchcorrespondingfeaturesindifferentimagesorscenes.Thealignment can be
 performed in either 2D or 3D space, depending on the nature of the data.
 2DFeature-BasedAlignment:
 ● Definition:In2Dfeature-basedalignment,thegoalistoalignandmatch features in
 two or more 2D images.
 ● Features:Featurescanincludepoints,corners,edges,orotherdistinctive patterns.
 ● Applications:Commonlyusedinimagestitching,panoramacreation, object
 recognition, and image registration.
 3DFeature-BasedAlignment:
B.Tech [AIML/DS]
EAIDS254 – COMPUTER VISION JEPPIAAR UNIVERSITY
 ● Definition:In3Dfeature-basedalignment,thegoalistoalignandmatch features in
 three-dimensional space, typically in the context of 3D
 reconstructionorscene understanding.
 ● Features:Featurescanincludekeypoints,landmarks,orotherdistinctive 3D points.
 ● Applications: Used in 3D reconstruction, simultaneous localization and
 mapping(SLAM),objectrecognitionin3Dscenes,andaugmentedreality.
 Techniquesfor2Dand3DFeature-BasedAlignment:
 ● Correspondence Matching: Identifying corresponding features in different images
 or 3D point clouds.
 ● RANSAC(RandomSampleConsensus):Robustestimationtechniqueto find the
 best-fitting model despite the presence of outliers.
 ● TransformationModels:Applyingtransformationmodels(affine,
 homography for 2D; rigid body, affine for 3D) to align features.
 ● IterativeOptimization:Refiningthealignmentthroughiterative
 optimization methods such as Levenberg-Marquardt.
 Challenges:
 ● NoiseandOutliers:Real-worlddataoftencontainsnoiseandoutliers, requiring
 robust techniques for feature matching.
 ● ScaleandViewpointChanges:Featuresmayundergochangesinscaleor viewpoint,
 requiring methods that are invariant to such variations.
 Applications:
 ● ImageStitching:Aligningandstitchingtogethermultipleimagestocreate panoramic
 views.
 ● RoboticsandSLAM:Aligningconsecutiveframesinthecontextofrobotic navigation
 and simultaneous localization and mapping.
 ● MedicalImaging:Aligning2Dslicesor3Dvolumesforaccuratemedical image
 analysis.
 Evaluation:
 ● AccuracyandRobustness:Theaccuracyandrobustnessoffeature-based alignment
 methods are crucial for their successful application in various domains.
 Feature-basedalignmentisafundamentaltaskincomputervision,enablingthe
 integrationofinformationfrommultipleviewsormodalitiesforimprovedanalysisand understanding
 of the visual world.
B.Tech [AIML/DS]
EAIDS254 – COMPUTER VISION JEPPIAAR UNIVERSITY
2. Poseestimation:
 Poseestimationisacomputervisiontaskthatinvolvesdeterminingthepositionand
 orientationofanobjectorcamerarelativetoacoordinatesystem.Itisacrucialaspect
 ofunderstandingthespatialrelationshipsbetweenobjectsinascene.Poseestimation can be applied to
 both 2D and 3D scenarios, and it finds applications in various fields,
 includingrobotics,augmentedreality,autonomousvehicles,andhuman-computer
 interaction.
 2DPoseEstimation:
 ● Definition:In2Dposeestimation,thegoalistoestimatetheposition
 (translation)andorientation(rotation)ofanobjectinatwo-dimensional image.
 ● Methods: Techniques include keypoint-based approaches, where
 distinctivepoints(suchascornersorjoints)aredetectedandusedto
 estimatepose.CommonmethodsincludePnP(Perspective-n-Point) algorithms.
 3DPoseEstimation:
 ● Definition:In3Dposeestimation,thegoalistoestimatethepositionand orientation of
 an object in three-dimensional space.
 ● Methods:Ofteninvolvesassociating2Dkeypointswithcorresponding3D points. PnP
 algorithms can be extended to 3D, and there are other methods like Iterative
 Closest Point (ICP) for aligning a 3D model with a point cloud.
 Applications:
 ● Robotics:Poseestimationiscrucialforroboticsystemstonavigateand interact with
 the environment.
 ● AugmentedReality:Enablesthealignmentofvirtualobjectswiththe real-world
 environment.
B.Tech [AIML/DS]
EAIDS254 – COMPUTER VISION JEPPIAAR UNIVERSITY
 ● AutonomousVehicles:Usedforunderstandingthepositionand
 orientation of the vehicle in its surroundings.
 ● HumanPoseEstimation:Estimatingtheposeofaperson,oftenusedin applications
 like gesture recognition and action recognition.
 CameraPoseEstimation:
 ● Definition:Estimatingtheposeofacamera,whichinvolvesdeterminingits position
 and orientation in the scene.
 ● Methods:Cameraposecanbeestimatedusingvisualodometry,SLAM
 (Simultaneous Localization and Mapping), or using known reference points
 in the environment.
 Challenges:
 ● Ambiguity:Limitedinformationorsimilarappearanceofdifferentposes can
 introduce ambiguity.
 ● Occlusion:Partiallyorfullyoccludedobjectscanmakeposeestimation challenging.
 ● Real-timeRequirements:Manyapplications,especiallyinroboticsand augmented
 reality, require real-time pose estimation.
 EvaluationMetrics:
 ● Commonmetricsincludetranslationandrotationerrors,whichmeasure the
 accuracy of the estimated pose compared to ground truth.
 DeepLearningApproaches:
 ● Recent advances in deep learning have led to the development of neural
 network-basedmethodsforposeestimation,leveragingarchitectureslike
 convolutional neural networks (CNNs) for feature extraction.
 Poseestimationisafundamentaltaskincomputervisionwithwidespreadapplications.
 Itplaysacrucialroleinenablingmachinestounderstandthespatialrelationships between objects and
 the environment.
3. Geometricintrinsiccalibration:
B.Tech [AIML/DS]
EAIDS254 – COMPUTER VISION JEPPIAAR UNIVERSITY
 Herearekeypointsrelatedtogeometricintrinsiccalibration: Intrinsic
 Parameters:
 ● FocalLength(f):Representsthedistancefromthecamera'sopticalcenter to the image
 plane. It is a critical parameter for determining the scale of objects in the scene.
 ● PrincipalPoint(c):Denotesthecoordinatesoftheimagecenter.It
 representstheoffsetfromthetop-leftcorneroftheimagetothecenterof the image
 plane.
 ● LensDistortionCoefficients:Describeimperfectionsinthelens,suchas radial and
 tangential distortions, that affect the mapping between 3D
 worldpointsand2Dimagepoints.
 CameraModel:
 ● Thecameramodel,oftenusedforintrinsiccalibration,isthepinhole
 cameramodel.Thismodelassumesthatlightentersthecamerathrough a single point
 (pinhole) and projects onto the image plane.
 Calibration Patterns:
 ● Intrinsiccalibrationistypicallyperformedusingcalibrationpatternswith
 knowngeometricfeatures,suchaschessboardpatterns.Thesepatterns allow for the
 extraction of corresponding points in both 3D world
 coordinatesand2Dimagecoordinates.
 Calibration Process:
 ● ImageCapture:Multipleimagesofthecalibrationpatternarecaptured from
 different viewpoints.
 ● FeatureExtraction:Detectedfeatures(corners,intersections)inthe
 calibrationpatternareidentifiedinbothimageandworldcoordinates.
B.Tech [AIML/DS]
EAIDS254 – COMPUTER VISION JEPPIAAR UNIVERSITY
 Accurategeometricintrinsiccalibrationisacriticalstepinensuringthatthecamera
 modelaccuratelyrepresentsthemappingbetweenthe3Dworldandthe2Dimage, facilitating precise
 computer vision tasks
4. Triangulation:
B.Tech [AIML/DS]
EAIDS254 – COMPUTER VISION JEPPIAAR UNIVERSITY
Herearekeypointsrelatedtotriangulation:
 BasicConcept:
 ● Triangulationisbasedontheprincipleoffindingthe3Dlocationofapoint in space by
 measuring its projection onto two or more image planes.
 CameraSetup:
 ● Triangulationrequiresatleasttwocameras(stereovision)ormoreto
 capturethesamescenefromdifferentviewpoints.Eachcameraprovides a 2D
 projection of the 3D point.
 MathematicalRepresentation:
 EpipolarGeometry:
 ● Epipolargeometryisutilizedtorelatethe2Dprojectionsofapointin
 differentcameraviews.Itdefinesthegeometricrelationshipbetweenthe
 twocameraviewsandhelpsestablishcorrespondencesbetweenpoints.
 TriangulationMethods:
 ● DirectLinearTransform(DLT):Analgorithmicapproachthatinvolves solving a
 system of linear equations to find the 3D coordinates.
 ● IterativeMethods:AlgorithmsliketheGauss-Newtonalgorithmorthe
 Levenberg-Marquardtalgorithmcanbeusedforrefining theinitial estimate
 obtained through DLT.
 AccuracyandPrecision:
 ● Theaccuracyoftriangulationisinfluencedbyfactorssuchasthe
 calibrationaccuracyofthecameras,thequalityoffeaturematching,and the level of
 noise in the image data.
 BundleAdjustment:
 ● Triangulation is often used in conjunction with bundle adjustment, a
 techniquethatoptimizestheparametersofthecamerasandthe3Dpoints simultaneously
 to minimize the reprojection error.
 Applications:
 ● 3DReconstruction:Triangulationisfundamentaltocreating3Dmodelsof scenes or
 objects from multiple camera views.
B.Tech [AIML/DS]
EAIDS254 – COMPUTER VISION JEPPIAAR UNIVERSITY
 ● StructurefromMotion(SfM):UsedinSfMpipelinestoestimatethe3D structure of
 a scene from a sequence of images.
 ● StereoVision:Essentialfordepthestimationinstereovisionsystems. Challenges:
 ● Ambiguity: Ambiguities may arise when triangulating points from two
 viewsiftheviewsarenotwell-separatedorifthepointisnearthebaseline connecting the
 cameras.
 ● NoiseandErrors:Triangulationresultscanbesensitivetonoiseanderrors in feature
 matching and camera calibration.
 Triangulationisacoretechniqueincomputervisionthatenablesthereconstructionof
 3Dgeometryfrommultiple2Dimages.Itplaysacrucialroleinapplicationssuchas3D
 modeling,augmentedreality,andstructure-from-motionpipelines.
5. Two-framestructurefrommotion:
 StructurefromMotion(SfM)isacomputervisiontechniquethataimstoreconstructthe three-
 dimensionalstructureofascenefromasequenceoftwo-dimensionalimages.
 Two-frameStructurefromMotionspecificallyreferstothereconstructionofscene geometry using
 information from only two images (frames) taken from different
 viewpoints.Thisprocessinvolvesestimatingboththe3Dstructureofthesceneandthe camera motion
 between the two frames.
B.Tech [AIML/DS]
EAIDS254 – COMPUTER VISION JEPPIAAR UNIVERSITY
HerearekeypointsrelatedtoTwo-FrameStructurefromMotion:
 BasicConcept:
 ● Two-frameStructurefromMotionreconstructsthe3Dstructureofascene by analyzing
 the information from just two images taken from different perspectives.
 CorrespondenceMatching:
 ● Establishing correspondences between points or features in the two
 imagesisacrucialstep.Thisisoftendonebyidentifyingkeyfeatures
 (suchaskeypoints)inbothimagesandfindingtheircorrespondences.
 EpipolarGeometry:
 ● Epipolar geometry describes the relationship between corresponding
 pointsintwoimagestakenbydifferentcameras.Ithelpsconstrainthe possible 3D
 structures and camera motions.
 EssentialMatrix:
 ● Theessentialmatrixisafundamentalmatrixinepipolargeometrythat
 encapsulatestheessentialinformationabouttherelativeposeoftwo calibrated
 cameras.
 CameraPoseEstimation:
 ● Thecameraposes(positionsandorientations)areestimatedforboth
 frames.Thisinvolvessolvingfortherotationandtranslationbetweenthe two camera
 viewpoints.
 Triangulation:
 ● Triangulationisappliedtofindthe3Dcoordinatesofpointsinthescene. By knowing
 the camera poses and corresponding points, the depth of scene points can be
 estimated.
 BundleAdjustment:
 ● Bundleadjustmentisoftenusedtorefinetheestimatesofcameraposes and 3D points.
 It is an optimization process that minimizes the error
 betweenobservedandpredictedimagepoints.
 DepthAmbiguity:
 ● Two-frameSfMissusceptibletodepthambiguity,meaningthatthe
 reconstructedscenecouldbescaledormirroredwithoutaffectingthe projections onto
 the images.
 Applications:
 ● Robotics:Two-frameSfMisusedinroboticsforenvironmentmappingand navigation.
 ● AugmentedReality:Reconstructionofthe3Dstructureforoverlaying virtual
 objects onto the real-world scene.
B.Tech [AIML/DS]
EAIDS254 – COMPUTER VISION JEPPIAAR UNIVERSITY
 ● ComputerVisionResearch:StudyingtheprinciplesofSfMandepipolar geometry.
 Challenges:
 ● NoiseandOutliers:Theaccuracyofthereconstructioncanbeaffectedbynoise and outliers in
 the correspondence matching process.
 ● LimitedBaseline:Withonlytwoframes,thebaseline(distancebetweencamera viewpoints)
 may be limited, leading to potential depth ambiguities.
 Two-frameStructurefromMotionisafundamentalconceptincomputervision,
 providingafoundationforunderstanding3Dscenestructurefromapairofimages.Itis
 oftenextendedtomulti-frameSfMformorerobustreconstructionsinscenarioswhere more images are
 available.
6. Factorization:
 Factorizationinthecontextofcomputervisiontypicallyreferstothefactorizationof
 matricesortensorsrepresentingdatainvariouscomputervisiontasks.Onecommon
 applicationisinthefieldofstructurefrommotion(SfM)andmultiple-viewgeometry. Here are key
 points related to factorization in computer vision:
 MatrixFactorizationinSfM:
 ● ProblemStatement:Instructurefrommotion,thegoalistoreconstruct the 3D
 structure of a scene from a sequence of 2D images taken from different
 viewpoints.
 ● MatrixRepresentation: The correspondencematrix, alsoknown as the
 measurementmatrix,isconstructedbystackingtheimagecoordinatesof
 corresponding points from multiple views.
 ● Matrix Factorization: Factorizing the correspondence matrix into two
 matricesrepresentingcameraparametersand3Dstructureisacommon approach.
 This factorization is often achieved through techniques like
 SingularValueDecomposition(SVD).
 SingularValueDecomposition(SVD):
 ● Application:SVDisfrequentlyusedinmatrixfactorizationproblemsin computer
 vision.
B.Tech [AIML/DS]
EAIDS254 – COMPUTER VISION JEPPIAAR UNIVERSITY
 Applications:
 ● StructurefromMotion(SfM):Factorizationisusedtorecovercamera poses and
 3D scene structure from 2D image correspondences.
 ● BackgroundSubtraction:Matrixfactorizationtechniquesareemployedin background
 subtraction methods for video analysis.
 ● FaceRecognition:EigenfaceandFisherfacemethodsinvolvefactorizing
 covariance matrices for facial feature representation.
 Non-NegativeMatrixFactorization(NMF):
 ● Application:NMFisavariantofmatrixfactorizationwherethefactorsare constrained
 to be non-negative.
 ● UseCases:Itisappliedinareassuchastopicmodeling,image
 segmentation, and feature extraction.
 TensorFactorization:
 ● ExtensiontoHigherDimensions:Insomecases,dataisrepresentedas tensors, and
 factorization techniques are extended to tensors for applications like multi-
 way data analysis.
 ● Example:CanonicalPolyadicDecomposition(CPD)isatensor
 factorization technique.
 RobustFactorization:
 ● Challenges:Noiseandoutliersinthedatacanaffecttheaccuracyof factorization.
 ● RobustMethods:Robustfactorizationtechniquesaredesignedtohandle noisy data
 and outliers, providing more reliable results.
 DeepLearningApproaches:
 ● AutoencodersandNeuralNetworks:Deeplearningmodels,including
 autoencoders,canbeconsideredasaformofnonlinearfactorization.
 FactorizationMachine(FM):
 ● Application:FactorizationMachinesareusedincollaborativefilteringand
 recommendation systems to model interactions between features.
 Factorizationplaysacrucialroleinvariouscomputervisionandmachinelearningtasks, providing a
 mathematical framework for extracting meaningful representations from
B.Tech [AIML/DS]
EAIDS254 – COMPUTER VISION JEPPIAAR UNIVERSITY
dataandsolvingcomplexproblemslike3Dreconstructionanddimensionality reduction.
7. Bundle adjustment:
 HerearekeypointsrelatedtoBundleAdjustment:
 Optimization Objective:
 ● Minimization of Reprojection Error: Bundle Adjustment aims to find the
 optimalsetofparameters(cameraposes,3Dpoints)thatminimizesthe difference
 between the observed 2D image points and their projections onto the image
 planes based on the estimated 3D scene.
 ParameterstoOptimize:
 ● Camera Parameters: Intrinsic parameters (focal length, principal point) and
 extrinsic parameters (camera poses - rotation and translation).
 ● 3DSceneStructure:Coordinatesof3Dpointsinthescene.
 Reprojection Error:
 ● Definition: Thereprojectionerroristhedifferencebetweentheobserved
 2Dimagepointsandtheprojectionsofthecorresponding3Dpointsonto the image
 planes.
 ● SumofSquaredDifferences:Theobjectiveistominimizethesumof squared
 differences between observed and projected points.
B.Tech [AIML/DS]
EAIDS254 – COMPUTER VISION JEPPIAAR UNIVERSITY
 BundleAdjustmentProcess:
 ● Initialization:Startwithinitialestimatesofcameraposesand3Dpoints.
 ● ObjectiveFunction:Defineanobjectivefunctionthatmeasuresthe
 reprojection error.
 ● Optimization:Useoptimizationalgorithms(suchasLevenberg-Marquardt, Gauss-
 Newton,orothers)toiterativelyrefinetheparameters,minimizing the reprojection
 error.
 SparseandDenseBundleAdjustment:
 ● SparseBA:Considersasubsetof3Dpointsandimagepoints,makingit
 computationally more efficient.
 ● DenseBA:Involvesall3Dpointsandimagepoints,providinghigher accuracy
 but requiring more computational resources.
 SequentialandGlobalBundleAdjustment:
 ● SequentialBA:Optimizescameraposesand3Dpointssequentially, typically
 in a sliding window fashion.
 ● GlobalBA:Optimizesallcameraposesand3Dpointssimultaneously.
 Providesamoreaccuratesolutionbutiscomputationallymore demanding.
 Applications:
 ● StructurefromMotion(SfM):Refinesthereconstructionof3Dscenes from a
 sequence of images.
 ● SimultaneousLocalizationandMapping(SLAM):Improvestheaccuracy of camera
 pose estimation and map reconstruction in real-time
 environments.
 ● 3DReconstruction:Enhancestheaccuracyofreconstructed3Dmodels from
 images.
 Challenges:
 ● LocalMinima:Theoptimizationproblemmayhavemultiplelocalminima, making it
 essential to use robust optimization methods.
 ● OutliersandNoise:BundleAdjustmentneedstoberobusttooutliersand noise in the
 input data.
 IntegrationwithOtherTechniques:
 ● Feature Matching: Often used in conjunction with feature matching
 techniquestoestablishcorrespondencesbetween2Dand3Dpoints.
 ● CameraCalibration:BundleAdjustmentmaybeprecededbyorintegrated with
 camera calibration to refine intrinsic parameters.
B.Tech [AIML/DS]
EAIDS254 – COMPUTER VISION JEPPIAAR UNIVERSITY
8. Constrainedstructureandmotion:
ConstrainedStructureandMotion
 ConstrainedStructureandMotionreferstoasetoftechniquesandmethodsin
 computervisionandphotogrammetrythatincorporateadditionalconstraintsinto the
 structurefrommotion(SfM)process.Thegoalistoimprovetheaccuracyandreliability of 3D
 reconstruction by imposing constraints on the estimated camera poses and 3D scene points.
 These constraints may come from prior knowledge about the scene,
 sensorcharacteristics,oradditionalinformation.
 HerearekeypointsrelatedtoConstrainedStructureandMotion: Introduction
 of Constraints:
 ● Prior Information: Constraints can be introduced based on prior
 knowledgeaboutthescene,suchasknowndistances,planarstructures, or object
 shapes.
B.Tech [AIML/DS]
EAIDS254 – COMPUTER VISION JEPPIAAR UNIVERSITY
 ● SensorConstraints:Informationaboutthecamerasystem,suchasfocal length or
 aspect ratio, can be incorporated as constraints.
 TypesofConstraints:
 ● Geometric Constraints: Constraints that enforce geometric relationships, such as
 parallel lines, perpendicularity, or known distances between
 points.
 ● SemanticConstraints:Incorporatingsemanticinformationaboutthe
 scene,suchastheknowledgethatcertainpointsbelongtoaspecific object or
 structure.
 BundleAdjustmentwithConstraints:
 ● ObjectiveFunction:Thebundleadjustmentproblemisformulatedwithan
 objectivefunctionthatincludesthereprojectionerror,aswellasadditional terms
 representing the constraints.
 ● Optimization:Optimizationtechniques,suchasLevenberg-Marquardtor Gauss-
 Newton, are used to minimize the combined cost function.
 Advantages:
 ● ImprovedAccuracy:Incorporatingconstraintscanleadtomoreaccurate
 andreliablereconstructions,especiallyinscenarioswithlimitedornoisy data.
 ● HandlingAmbiguities:Constraintshelpinresolvingambiguitiesthatmay arise in
 typical SfM scenarios.
 CommonTypesofConstraints:
 ● PlanarConstraints:Assumingthatcertainstructuresinthescenelieon planes,
 which can be enforced during reconstruction.
 ● ScaleConstraints:Fixingorconstrainingthescaleofthescenetoprevent scale
 ambiguity in the reconstruction.
 ● ObjectConstraints:Incorporatingconstraintsrelatedtospecificobjectsor entities in the
 scene.
 Applications:
 ● ArchitecturalReconstruction:Constrainingthereconstructionbasedon known
 architectural elements or planar surfaces.
 ● RoboticsandAutonomousSystems:Utilizingconstraintstoenhancethe accuracy of
 pose estimation and mapping in robotic navigation.
 ● AugmentedReality:Incorporatingsemanticconstraintsformoreaccurate alignment of
 virtual objects with the real world.
 Challenges:
 ● CorrectnessofConstraints:Theaccuracyofthereconstructiondepends on the
 correctness of the imposed constraints.
B.Tech [AIML/DS]
EAIDS254 – COMPUTER VISION JEPPIAAR UNIVERSITY
 ● ComputationalComplexity:Someconstrainttypesmayincreasethe
 computational complexity of the optimization problem.
 IntegrationwithSemanticTechnologies:
 ● Semantic3DReconstruction:Integratingsemanticinformationintothe
 reconstruction process to improve the understanding of the scene.
 ConstrainedStructureandMotionprovidesawaytoincorporateadditionalinformation
 anddomainknowledgeintothereconstructionprocess,makingitavaluableapproach for scenarios
 where such information is available and reliable. It contributes to more
 accurateandmeaningful3Dreconstructionsincomputervisionapplications.
9. Translationalalignment
B.Tech [AIML/DS]
EAIDS254 – COMPUTER VISION JEPPIAAR UNIVERSITY
 CorrespondenceMatching:
 ● Correspondencematchinginvolvesidentifyingcorrespondingfeaturesor points in
 the images that can be used as reference for alignment.
 Commontechniquesincludekeypointdetectionandmatching.
 AlignmentProcess:
 ● Thetranslationalalignmentprocesstypicallyinvolvesthefollowingsteps:
 Applications:
 ● ImageStitching:Inpanoramacreation,translationalalignmentisusedto align images
 before merging them into a seamless panorama.
 ● MotionCorrection:Invideoprocessing,translationalalignmentcorrects for
 translational motion between consecutive frames.
 ● RegistrationinMedicalImaging:Aligningmedicalimagesacquiredfrom different
 modalities or at different time points.
 Evaluation:
 ● Thesuccessoftranslationalalignmentisoftenevaluatedbymeasuring
 theaccuracyofthealignment,typicallyintermsofthedistancebetween
 corresponding points before and after alignment.
 Robustness:
 ● Translational alignment is relatively straightforward and computationally
 efficient.However,itmaybesensitivetonoiseandoutliers,particularlyin the
 presence of large rotations or distortions.
 IntegrationwithOtherTransformations:
 ● Translational alignment is frequently used as an initial step in more
 complexalignmentprocessesthatinvolveadditionaltransformations, such as
 rotational alignment or affine transformations.
 AutomatedAlignment:
 ● Inmanyapplications,algorithmsfortranslationalalignmentaredesigned to operate
 automatically without requiring manual intervention.
B.Tech [AIML/DS]
EAIDS254 – COMPUTER VISION JEPPIAAR UNIVERSITY
10. Parametricmotion
 Parametricmotionreferstothemodelingandrepresentationofmotionincomputer
 visionandcomputergraphicsusingparametricfunctionsormodels.Insteadofdirectly
 capturingthemotionwithasetofdiscreteframes,parametricmotionmodelsdescribe how the motion
 evolves over time using a set of parameters. These models are often
 employedinvariousapplications,suchasvideoanalysis,animation,andtracking. Here are key
 points related to parametric motion:
 ParametricFunctions:
 ● Parametricmotionmodelsusemathematicalfunctionswithparameters
 torepresentthemotionofobjectsorscenesovertime.Thesefunctions could be
 simple mathematical equations or more complex models.
 TypesofParametricMotionModels:
 ● LinearModels:Simplestformofparametricmotion,wheremotionis
 representedbylinearequations.Forexample,linearinterpolationbetween keyframes.
 ● PolynomialModels:Higher-orderpolynomialfunctionscanbeusedto model
 more complex motion. Cubic splines are commonly used for smooth motion
 interpolation.
 ● TrigonometricModels:Sinusoidalfunctionscanbeemployedtorepresent periodic
 motion, such as oscillations or repetitive patterns.
 ● ExponentialModels:Capturebehaviorsthatexhibitexponentialgrowthor decay,
 suitable for certain types of motion.
 KeyframeAnimation:
 ● Inparametricmotion,keyframesarespecifiedatcertainpointsintime,
 andthemotionbetweenkeyframesisdefinedbytheparametricmotion
 model.Interpolationisthenusedtogenerateframesbetweenkeyframes.
 ControlPointsandHandles:
 ● Parametricmodelsofteninvolvecontrolpointsandhandlesthatinfluence the shape
 and behavior of the motion curve. Adjusting these parameters allows for creative
 control over the motion.
B.Tech [AIML/DS]
EAIDS254 – COMPUTER VISION JEPPIAAR UNIVERSITY
 Applications:
 ● ComputerAnimation:Usedforanimatingcharacters,objects,orcamera movements
 in 3D computer graphics and animation.
 ● VideoCompression:Parametricmotionmodelscanbeusedtodescribe the motion
 between video frames, facilitating efficient compression
 techniques.
 ● VideoSynthesis:Generatingrealisticvideosorpredictingfutureframesin a video
 sequence based on learned parametric models.
 ● MotionTracking:Trackingthemovementofobjectsinavideobyfitting parametric
 motion models to observed trajectories.
 SmoothnessandContinuity:
 ● Oneadvantageofparametricmotionmodelsistheirabilitytoprovide smooth and
 continuous motion, especially when using interpolation techniques between
 keyframes.
 ConstraintsandConstraints-BasedMotion:
 ● Parametricmodelscanbeextendedtoincludeconstraints,ensuringthat
 themotionadherestospecificrulesorconditions.Forexample,enforcing constant
 velocity or maintaining specific orientations.
 MachineLearningIntegration:
 ● Parametricmotionmodelscanbelearnedfromdatausingmachine learning
 techniques. Machine learning algorithms can learn the
 parametersofthemotionmodelfromobservedexamples.
 Challenges:
 ● Designingappropriateparametricmodelsthataccuratelycapturethe
 desiredmotioncanbechallenging,especiallyforcomplexornon-linear motions.
 ● Ensuringthatthemotionremainsphysicallyplausibleandvisually appealing
 is crucial in animation and simulation.
 Parametricmotionprovidesaflexibleframeworkforrepresentingandcontrolling
 motioninvariousvisualcomputingapplications.Thechoiceofparametricmodel
 dependsonthespecificcharacteristicsofthemotiontoberepresentedandthedesired level of control
 and realism.
11. Spline-basedmotion
B.Tech [AIML/DS]
EAIDS254 – COMPUTER VISION JEPPIAAR UNIVERSITY
 Spline-basedmotionreferstotheuseofsplinecurvestomodelandinterpolatemotion
 incomputergraphics,computer-aideddesign,andanimation.Splinesaremathematical curves that
 provide a smooth and flexible way to represent motion paths and
 trajectories.Theyarewidelyusedin3Dcomputergraphicsandanimationforcreating
 naturalandvisuallypleasingmotion,particularlyinscenarioswherecontinuousand smooth paths are
 desired.
 Herearekeypointsrelatedtospline-basedmotion: Spline
 Definition:
 ● SplineCurve:Asplineisapiecewise-definedpolynomialcurve.Itconsists
 ofseveralpolynomialsegments(typicallylow-degree)thataresmoothly connected at
 specific points called knots or control points.
 ● TypesofSplines:CommontypesofsplinesincludeB-splines,cubic splines, and
 Bezier splines.
 SplineInterpolation:
 ● Splinecurvesareoftenusedtointerpolatekeyframesorcontrolpointsin
 animation.Thismeansthecurvepassesthroughorfollowsthespecified keyframes,
 creating a smooth motion trajectory.
 B-spline(BasisSpline):
 ● B-splinesarewidelyusedforspline-basedmotion.Theyaredefinedbya set of control
 points, and their shape is influenced by a set of basis
 functions.
 ● LocalControl:Modifyingthepositionofacontrolpointaffectsonlyalocal portion of
 the curve, making B-splines versatile for animation.
 CubicSplines:
 ● Cubicsplinesareaspecifictypeofsplinewhereeachpolynomialsegment is a cubic
 (degree-3) polynomial.
 ● NaturalMotion:Cubicsplinesareoftenusedforcreatingnaturalmotion paths due to
 their smoothness and continuity.
 BezierSplines:
 ● Beziersplinesareatypeofsplinethatisdefinedbyasetofcontrolpoints. They have
 intuitive control handles that influence the shape of the curve.
 ● BezierCurves:CubicBeziercurves,inparticular,arefrequentlyusedfor creating
 motion paths in animation.
 SplineTangentsandCurvature:
 ● Spline-basedmotionallowscontroloverthetangentsatcontrolpoints,
 influencingthedirectionofmotion.Curvaturecontinuityensuressmooth transitions
 between segments.
B.Tech [AIML/DS]
EAIDS254 – COMPUTER VISION JEPPIAAR UNIVERSITY
 Applications:
 ● Computer Animation: Spline-based motion is extensively used for
 animatingcharacters,cameramovements,andobjectsin3Dscenes.
 ● PathGeneration:Designingsmoothandvisuallyappealingpathsfor objects to
 follow in simulations or virtual environments.
 ● MotionGraphics:Creatingdynamicandaestheticallypleasingvisual effects in
 motion graphics projects.
 ParametricRepresentation:
 ● Spline-basedmotionisparametric,meaningthepositionofapointonthe spline is
 determined by a parameter. This allows for easy manipulation
 andcontroloverthemotion.
 InterpolationTechniques:
 ● KeyframeInterpolation:Splinecurvesinterpolatesmoothlybetween
 keyframes, providing fluid motion transitions.
 ● Hermite Interpolation: Splines can be constructed using Hermite
 interpolation,wherebothpositionandtangentinformationatcontrol points are
 considered.
 Challenges:
 ● Overfitting:Insomecases,splinecurvescanbeoverlyflexibleandleadto overfitting if
 not properly controlled.
 ● ControlPointPlacement:Choosingtherightplacementforcontrolpoints is crucial
 for achieving the desired motion characteristics.
 Spline-basedmotionprovidesanimatorsanddesignerswithaversatiletoolforcreating
 smoothandcontrolledmotionpathsincomputer-generatedimagery.Theabilityto adjust the shape of
 the spline through control points and handles makes it a popular choice for a wide range of
 animation and graphics applications.
12. Opticalflow
 Optical flow is a computer vision technique that involves estimating the motion of
 objectsorsurfacesinavisualscenebasedontheobservedchangesinbrightnessor
 intensityovertime.Itisafundamentalconceptusedinvariousapplications,including motion analysis,
 video processing, object tracking, and scene understanding.
Herearekeypointsrelatedtoopticalflow:
B.Tech [AIML/DS]
EAIDS254 – COMPUTER VISION JEPPIAAR UNIVERSITY
 MotionEstimation:
 ● Objective:Theprimarygoalofopticalflowistoestimatethevelocity
 vector(opticalflowvector)foreachpixelinanimage,indicatingthe apparent
 motion of that pixel in the scene.
 ● Pixel-levelMotion:Opticalflowprovidesadenserepresentationofmotion at the pixel
 level.
 BrightnessConstancyAssumption:
 ● Assumption:Opticalflowisbasedontheassumptionofbrightness
 constancy,whichstatesthatthebrightnessofapointinthescene remains
 constant over time.
 OpticalFlowEquation:
 ● Derivation:Theopticalflowequationisderivedfromthebrightness
 constancyassumptionusingpartialderivativeswithrespecttospatial coordinates and
 time.
 DenseandSparseOpticalFlow:
 ● DenseOpticalFlow:Estimatingopticalflowforeverypixelintheimage, providing
 a complete motion field.
 ● SparseOpticalFlow:Estimatingopticalflowonlyforselectedkeypointsor features in
 the image.
 ComputationalMethods:
 ● Correlation-basedMethods:Matchimagepatchesorwindowsbetween consecutive
 frames to estimate motion.
 ● Gradient-basedMethods:Utilizeimagegradientstocomputeopticalflow.
B.Tech [AIML/DS]
EAIDS254 – COMPUTER VISION JEPPIAAR UNIVERSITY
 ● VariationalMethods:Formulateenergyminimizationproblemstoestimate optical
 flow.
 Lucas-KanadeMethod:
 ● Awell-knowndifferentialmethodforestimatingopticalflow,particularly suited for
 small motion and local analysis.
 Horn-SchunckMethod:
 ● Avariationalmethodthatminimizesaglobalenergyfunction,takinginto account
 smoothness constraints in addition to brightness constancy.
 Applications:
 ● VideoCompression:Opticalflowisusedinvideocompressionalgorithms to predict
 motion between frames.
 ● ObjectTracking:Trackingmovingobjectsinavideosequence.
 ● Robotics:Providingvisualfeedbackfornavigationandobstacle
 avoidance.
 ● AugmentedReality:Aligningvirtualobjectswiththereal-worldscene.
 Challenges:
 ● IlluminationChanges:Opticalflowmaybesensitivetochangesin
 illumination.
 ● Occlusions:Occlusionsandcomplexmotionpatternscanposechallenges for accurate
 optical flow estimation.
 ● LargeDisplacements:Traditionalmethodsmaystrugglewithhandling large
 displacements.
 DeepLearningforOpticalFlow:
 ● Recent advances in deep learning have led to the development of neural
 network-basedmethodsforopticalflowestimation,suchasFlowNetand PWC-Net.
 Opticalflowisavaluabletoolforunderstandingandanalyzingmotioninvisualdata.
 Whiletraditionalmethodshavebeenwidelyused,theintegrationofdeeplearninghas brought new
 perspectives and improved performance in optical flow estimation.
13. Layeredmotion
B.Tech [AIML/DS]
EAIDS254 – COMPUTER VISION JEPPIAAR UNIVERSITY
 modelsareemployedtobettercapturecomplexsceneswithmultiplemovingentities, handling
 occlusions and interactions between objects.
 Herearekeypointsrelatedtolayeredmotion: Layered
 Motion Models:
 ● Objective:Thegoaloflayeredmotionmodelsistorepresentthemotionof
 distinctobjectsorsurfacesinasceneindependently,allowingforamore accurate
 description of complex motion scenarios.
 ● Assumption:Itassumesthattheobservedmotioninascenecanbe decomposed
 into the motion of different layers.
 KeyConcepts:
 ● Independence:Layersareassumedtomoveindependentlyofeachother, simplifying
 the modeling of complex scenes.
 ● Occlusions: Layered motion models can handle occlusions more
 effectively,aseachlayerrepresentsaseparateentityinthescene.
 MotionLayerSegmentation:
 ● SegmentationProcess:The processofidentifying andseparating the
 differentmotionlayersinavideosequenceisreferredtoasmotionlayer segmentation.
 ● ForegroundandBackground:Layersmightrepresenttheforegroundand background
 elements in a scene.
 ChallengesinLayeredMotion:
 ● InteractionHandling:Representingtheinteractionbetweenlayers,suchas occlusions or
 overlapping motions.
 ● DynamicSceneChanges:Adaptingtochangesinthescene,includingthe appearance
 or disappearance of objects.
 OpticalFlowforLayeredMotion:
 ● Opticalflowtechniquescanbeextendedtoestimatethemotionof individual
 layers in a scene.
 ● Layer-SpecificOpticalFlow:Applyingopticalflowindependentlyto different
 layers.
 MultipleObjectTracking:
 ● Layeredmotionmodelsarecloselyrelatedtomultipleobjecttracking,as each layer
 can correspond to a tracked object.
 Applications:
 ● SurveillanceandSecurity:Trackingandanalyzingthemotionofmultiple objects in
 surveillance videos.
B.Tech [AIML/DS]
EAIDS254 – COMPUTER VISION JEPPIAAR UNIVERSITY
 ● Robotics:Layeredmotionmodelscanaidrobotsinunderstandingand navigating
 dynamic environments.
 ● AugmentedReality:Aligningvirtualobjectswiththereal-worldsceneby
 understanding the layered motion.
 RepresentationFormats:
 ● Layerscanberepresentedinvariousformats,suchasdepthmaps, segmentation
 masks, or explicit motion models for each layer.
 IntegrationwithSceneUnderstanding:
 ● Layeredmotionmodelscanbeintegratedwithhigher-levelsceneund
B.Tech [AIML/DS]
EAIDS254 – COMPUTER VISION JEPPIAAR UNIVERSITY
 UNIT IV
 3D RECONSTRUCTION
 ShapefromX-Activerangefinding-Surfacerepresentations-
 Point-basedrepresentationsVolumetricrepresentations-Model-based reconstruction
 - Recovering texture maps and albedosos.
1. Shape from X:
 "Shape from X" refers to a category of computer vision and computer graphics
 techniquesthataimtorecoverthethree-dimensional(3D)shapeorstructureofobjects or scenes from
 different types of information or cues, represented by the variable "X".
 The"X"canstandforvarioussourcesormodalitiesthatprovideinformationaboutthe scene. Some
 common examples include:
 ShapefromShading(SfS):Thistechniqueinvolvesrecovering3Dshape
 informationfromvariationsinbrightnessandshadingin2Dimages.Itassumes that the shading
 patterns in an image are influenced by the underlying 3D
 geometry.
 ShapefromStereo(SfS):Thismethodutilizesthedisparityorparallax
 informationbetweentwoormoreimagesofascenetakenfromdifferent
B.Tech [AIML/DS]
EAIDS254 – COMPUTER VISION JEPPIAAR UNIVERSITY
 viewpoints.Bytriangulatingcorrespondingpoints,the3Dstructureofthescene can be
 reconstructed.
 ShapefromMotion(SfM):SfMaimstorecoverthe3Dstructureofasceneby
 analyzingthemotionofobjectsorthecameraitself.Thisisoftenachievedby tracking
 features across multiple frames of a video sequence.
 ShapefromFocus(SfF):InSfF,thedepthinformationisinferredfromthe
 variationinimagesharpnessorfocus.Byanalyzingthefocusinformationat different depths,
 the 3D shape can be estimated.
 ShapefromDefocus(SfD):SimilartoSfF,SfDleveragestheeffectsofdefocusing
 inimagestoestimatedepthinformation.Objectsatdifferentdistancesfromthe camera will
 exhibit different degrees of blur.
 Shape from Light (SfL): This technique involves using information about the
 lightingconditionsinascenetoinfer3Dshape.Theinteractionbetweenlightand surfaces
 provides cues about the geometry.
 Theseapproachesdemonstratethediversityofmethodsusedincomputervisionto
 recover3Dshapeinformationfromdifferenttypesofvisualcues.Thechoiceofthe specific "X"
 (shading, stereo, motion, etc.) depends on the available data and the
 characteristicsofthescenebeingreconstructed.
 2. Activerangefinding:
 Active range finding is a technique used in computer vision and remote sensing to
 determinethedistancetoobjectsinasceneactively.Unlikepassivemethodsthatrely
 onexistingambientillumination,activerangefindinginvolvesemittingasignalorprobe
 towardsthetargetandmeasuringthetimeittakesforthesignaltoreturn.Thisprocess
 isoftenbasedontheprinciplesoftime-of-flightorphase-shiftmeasurement.Thegoal is to obtain
 accurate depth or distance information about the surfaces in the scene.
Hereareafewcommonmethodsofactiverangefinding:
B.Tech [AIML/DS]
EAIDS254 – COMPUTER VISION JEPPIAAR UNIVERSITY
 LaserRangeFinding:Thismethodinvolvesemittinglaserbeamstowardsthe
 targetandmeasuringthetimeittakesforthelaserpulsestotraveltotheobject and back. By
 knowing the speed of light, the distance to the object can be
 calculated.
 UltrasoundRangeFinding:Ultrasoundwavesareemitted,andthetimeittakes
 forthewavestobouncebacktoasensorismeasured.Thismethodiscommonly used in
 environments where optical methods may be less effective, such as in low-light
 conditions.
3. Surfacerepresentations:
B.Tech [AIML/DS]
EAIDS254 – COMPUTER VISION JEPPIAAR UNIVERSITY
 PolygonalMeshes:
 ● Description:Meshesarecomposedofvertices,edges,andfacesthat
 definethesurfacegeometry.Triangularandquadrilateralmeshesare most
 common.
 ● Application:Widelyusedincomputergraphics,gaming,and3Dmodeling. Point
 Clouds:
 ● Description:Asetof3Dpointsinspace,eachrepresentingasampleon the surface of
 an object.
 ● Application:Generatedby3Dscanners,LiDAR,ordepthsensors;usedin
 applications like autonomous vehicles, robotics, and environmental
 mapping.
 Implicit Surfaces:
 ● Description:Representsurfacesasthezerolevelsetofascalarfunction. Points inside
 the surface have negative values, points outside have positive values, and points
 on the surface have values close to zero.
 ● Application:Usedinphysics-basedsimulations,medicalimaging,and shape
 modeling.
 NURBS(Non-UniformRationalB-Splines):
 ● Description:Mathematicalrepresentationsusingcontrolpointsandbasis functions to
 define smooth surfaces.
 ● Application:Commonlyusedincomputer-aideddesign(CAD),automotive design,
 and industrial design.
 VoxelGrids:
 ● Description:3Dgridswhereeachvoxel(volumetricpixel)representsa small
 volume in space, and the surface is defined by the boundary
 betweenoccupiedandunoccupiedvoxels.
 ● Application:Usedinmedicalimaging,volumetricdataanalysis,and
 computational fluid dynamics.
 LevelSetMethods:
 ● Description:Representsurfacesasthezerolevelsetofa
 higher-dimensionalfunction.Theevolutionofthisfunctionovertime captures the
 motion of the surface.
 ● Application:Usedinimagesegmentation,shapeoptimization,andfluid dynamics
 simulations.
 Octrees:
 ● Description:Hierarchicaltreestructuresthatrecursivelydividespaceinto
 octants.Eachleafnodecontainsinformationaboutthegeometrywithin that region.
B.Tech [AIML/DS]
EAIDS254 – COMPUTER VISION JEPPIAAR UNIVERSITY
 ● Application:Usedinreal-timerendering,collisiondetection,andefficient storage of
 3D data.
 4. Point-basedrepresentations:
 Point-basedrepresentationsincomputervisionandcomputergraphicsreferto methods that
 represent surfaces or objects using a set of individual points in
 three-dimensional(3D)space.Insteadofexplicitlydefiningtheconnectivitybetween
 pointsasinpolygonalmeshes,point-basedrepresentationsfocusonthespatial
 distributionofpointstodescribethesurfacegeometry.Herearesomecommon point-based
 representations:
 PointClouds:
 ● Description:Acollectionof3Dpointsinspace,eachrepresentingasample on the
 surface of an object or a scene.
 ● Application:Pointcloudsaregeneratedby3Dscanners,LiDAR,depth sensors, or
 photogrammetry. They find applications in robotics, autonomous vehicles,
 environmental mapping, and 3D modeling.
 DensePointClouds:
 ● Description:Similartopointcloudsbutwithahighdensityofpoints, providing
 more detailed surface information.
B.Tech [AIML/DS]
EAIDS254 – COMPUTER VISION JEPPIAAR UNIVERSITY
 ● Application:Usedinapplicationsrequiringdetailed3Dreconstructions, such as
 cultural heritage preservation, archaeological studies, and
 industrialinspections.
 SparsePointSets:
 ● Description: Representations where only a subset of points is used to
 describethesurface,resultinginasparserdatasetcomparedtoadense point cloud.
 ● Application:Sparsepointsetsareusefulinscenarioswhere
 computationalefficiencyiscrucial,suchasreal-timeapplicationsand large-scale
 environments.
 PointSplats:
 ● Description:Representeachpointasadiscorasplatin3Dspace.The size and
 orientation of the splats can convey additional information.
 ● Application:Commonlyusedinpoint-basedrenderingandvisualizationto represent
 dense point clouds efficiently.
 PointFeatures:
 ● Description:Representsurfacesusingdistinctivepointsorkeypoints,
 eachassociatedwithlocalfeaturessuchasnormals,color,ortexture information.
 ● Application:Widelyusedinfeature-basedregistration,objectrecognition, and 3D
 reconstruction.
 PointSetSurfaces:
 ● Description:Representsurfacesasasetofunorganizedpointswithout connectivity
 information. Surface properties can be interpolated from neighboring points.
 ● Application:Usedinsurfacereconstructionfrompointcloudsand point-
 based rendering.
 RadialBasisFunction(RBF)Representations:
 ● Description:Useradialbasisfunctionstointerpolatesurfaceproperties between
 points. These functions define a smooth surface that passes through the given
 points.
 ● Application:Commonlyusedinshapemodeling,surfacereconstruction, and
 computer-aided design.
 Point-basedrepresentationsareparticularlyusefulwhendealingwithunstructuredor
 irregularlysampleddata.Theyprovideflexibilityinrepresentingsurfaceswithvarying
 levelsofdetailandarewell-suitedforcapturingcomplexandintricatestructures.
B.Tech [AIML/DS]
EAIDS254 – COMPUTER VISION JEPPIAAR UNIVERSITY
5. Volumetricrepresentations:
 Volumetricrepresentationsincomputervisionandcomputergraphicsaremethods
 usedtodescribeandmodelthree-dimensional(3D)spaceinavolumetricmanner.
 Unlikesurfacerepresentations,whichfocusondefiningthesurfacegeometryexplicitly, volumetric
 representations capture information about the entire volume, including the
 interiorofobjects.Herearesomecommonvolumetricrepresentations:
 VoxelGrids:
 ● Description:Aregulargridofsmallvolumeelements,calledvoxels,where each voxel
 represents a small unit of 3D space.
 ● Application:Usedinmedicalimaging,computer-aideddesign(CAD),
 computationalfluiddynamics,androbotics.Voxelgridsareeffectivefor representing
 both the exterior and interior of objects.
 Octrees:
 ● Description:Ahierarchicaldatastructurethatrecursivelydivides3Dspace into octants.
 Each leaf node in the octree contains information about the occupied or
 unoccupied status of the corresponding volume.
 ● Application:Octreesareemployedforefficientstorageandrepresentation of
 volumetric data, particularly in real-time rendering, collision detection,
 andadaptiveresolution.
 Signed Distance Fields (SDF):
 ● Description:Representthedistancefromeachpointinspacetothe
 nearestsurfaceofanobject,withpositivevaluesinsidetheobjectand negative values
 outside.
 ● Application: Used in shape modeling, surface reconstruction, and
 physics-basedsimulations.SDFsprovideacompactrepresentationof geometry and are
 often used in conjunction with implicit surfaces.
 3DTextureMaps:
B.Tech [AIML/DS]
EAIDS254 – COMPUTER VISION JEPPIAAR UNIVERSITY
 ● Description:Extendtheconceptof2Dtexturemappingto3Dspace,
 associatingcolororotherpropertieswithvoxelsinavolumetricgrid.
 ● Application:Employed in computer graphics, simulations, and
 visualizationtorepresentcomplexvolumetricdetailssuchassmoke, clouds, or other
 phenomena.
 PointCloudswithOccupancyInformation:
 ● Description:Combinetheideaofpointcloudswithadditionalinformation about the
 occupancy of each point in space.
 ● Application:Usefulinscenarioswherecapturingboththesurfaceand interior
 details of objects is necessary, such as in robotics and 3D
 reconstruction.
 TensorFields:
 ● Description:Representthelocalstructureofavolumetricregionusing tensors.
 Tensor fields capture directional information, making them suitable for
 anisotropic materials and shapes.
 ● Application:Commonlyusedinmaterialsscience,biomechanics,and
 simulations where capturing anisotropic properties is important.
 ShellMaps:
 ● Description:Representthesurfacesofobjectsasacollectionofshellsor layers, each
 encapsulating the object's geometry.
 ● Application:Usedincomputergraphicsandsimulationtoefficiently
 representcomplexobjectsandenabledynamiclevel-of-detailrendering.
 Volumetricrepresentationsarevaluableinvariousapplicationswhereacomprehensive understanding
 of the 3D space is required, and they offer flexibility in capturing both
 surfaceandinteriordetailsofobjects.Thechoiceofrepresentationoftendependson
 thespecificrequirementsofthetaskathandandthecharacteristicsofthedatabeing modeled.
 6. Model-basedreconstruction:
 Model-basedreconstructionincomputervisionreferstoacategoryoftechniquesthat involve
 creating a 3D model of a scene or object based on predefined models or
 templates.Thesemethodsleveragepriorknowledgeaboutthegeometry,appearance, or structure of
 the objects being reconstructed. Model-based reconstruction is often
 usedinscenarioswhereaknownmodelcanbefittedtotheobserveddata,providinga structured and
 systematic approach to understanding the scene. Here are some key aspects and applications of
 model-based reconstruction:
B.Tech [AIML/DS]
EAIDS254 – COMPUTER VISION JEPPIAAR UNIVERSITY
 PriorModelRepresentation:
 ● Description:Inmodel-basedreconstruction,amathematical
 representationorageometricmodeloftheobjectorsceneisassumedor known in
 advance.
 ● Application: Commonly used in computer-aided design (CAD), medical
 imaging,andindustrialinspection,whereknownshapesorstructurescan be explicitly
 represented.
 ModelFitting:
 ● Description:Thereconstructionprocessinvolvesadjustingtheparameters
 ofthemodeltobestfittheobserveddata,typicallyobtainedfromimages or sensor
 measurements.
 ● Application: Used in applications such as object recognition, pose
 estimation,and3Dreconstructionbyaligningthemodelwiththeobserved features.
 GeometricConstraints:
 ● Description:Constraintsonthegeometryofthescene,suchasthe
 relationshipsbetweendifferentcomponentsortheexpectedshape
 characteristics, are incorporated into the reconstruction process.
 ● Application:Appliedinrobotics,augmentedreality,andcomputervision tasks
 where geometric relationships play a crucial role.
 DeformableModels:
 ● Description:Modelsthatcanadaptanddeformtofittheobserveddata, allowing for
 more flexible and realistic representations.
 ● Application:Commonlyusedinmedicalimagingfororgansegmentation and shape
 analysis, as well as in computer graphics for character
 animation.
 StereoVisionwithModelConstraints:
 ● Description:Stereovisiontechniquesthatincorporateknownmodelsto improve
 depth estimation and 3D reconstruction.
B.Tech [AIML/DS]
EAIDS254 – COMPUTER VISION JEPPIAAR UNIVERSITY
 ● Application:Usedinstereomatchingalgorithmsand3Dreconstruction pipelines to
 enhance accuracy by considering geometric priors.
 ParametricSurfaces:
 ● Description: Representing surfaces using parametric equations or
 functions,allowingforefficientadjustmentofparametersduringthe
 reconstruction process.
 ● Application:Appliedincomputergraphics,virtualreality,andindustrial design
 where surfaces can be described mathematically.
 Multi-ViewReconstructionwithKnownModels:
 ● Description:Leveragingmultipleviewsorimagesofascenetoreconstruct a 3D model
 while incorporating information from known models.
 ● Application: Common in photogrammetry and structure-from-motion
 applications where multiple perspectives contribute to accurate 3D
 reconstruction.
 7. Recoveringtexturemapsand albedos:
 Recovering texture maps and albedos in computer vision and computer graphics
 involvesestimatingthesurfaceappearance,color,andreflectancepropertiesofobjects
 inascene.Theseprocessesareintegraltocreatingrealisticanddetailed3Dmodelsfor applications like
 virtual reality, computer games, and simulations. Here's a brief
 overviewoftheseconcepts:
B.Tech [AIML/DS]
EAIDS254 – COMPUTER VISION JEPPIAAR UNIVERSITY
 TextureMaps:
 ● Description:Texturemappinginvolvesapplyinga2Dimage,knownasa texture
 map, onto a 3D model's surface to simulate surface details, patterns, or color
 variations.
 ● Recovery Process: Texture maps can be recovered through various
 methods,includingimage-basedtechniques,photogrammetry,orusing specialized
 3D scanners. These methods capture color information
 associatedwiththesurfacegeometry.
 ● Application: Used in computer graphics, gaming, and virtual reality to
 enhancethevisualappearanceof3Dmodelsbyaddingrealisticsurface details.
 Albedo:
 ● Description:Albedorepresentstheintrinsiccolororreflectanceofa
 surface,independentoflightingconditions.Itisameasureofhowmuch light a surface
 reflects.
 ● Recovery Process: Albedo can be estimated by decoupling surface
 reflectancefromlightingeffects.Photometricstereo,shape-from-shading, or using
 multi-view images are common methods to recover albedo
 information.
 ● Application:Albedoinformationiscrucialincomputervisionapplications, such as
 material recognition, object tracking, and realistic rendering in
 computergraphics.
B.Tech [AIML/DS]
EAIDS254 – COMPUTER VISION JEPPIAAR UNIVERSITY
 ● Description:Atechniquethatusesmultipleimagesofanobject
 illuminatedfromdifferentdirectionstorecoversurfacenormalsand, subsequently,
 albedo information.
 ● Application:Usedinscenarioswheredetailedsurfacepropertiesare needed,
 such as facial recognition, material analysis, and industrial
 inspection.
 Shape-from-Shading:
 ● Description: Inferring the shape of a surface based on variations in
 brightnessorshadinginimages.Bydecouplingshadingfromgeometry, albedo
 information can be estimated.
 ● Application:Appliedincomputervisionforshaperecovery,aswellasin computer
 graphics to enhance the realism of rendered images.
 Multi-ViewStereo (MVS):
 ● Description: In the context of 3D reconstruction, MVS involves capturing
 imagesofascenefrommultipleviewpointsandrecoveringbothgeometry and texture
 information.
 ● Application:Commonlyusedin3Dmodeling,virtualreality,andcultural heritage
 preservation to create detailed and textured 3D models.
 ReflectanceTransformationImaging(RTI):
 ● Description:Atechniquethatcapturesaseriesofimageswithcontrolled
 lightingconditionstorevealsurfacedetails,includingalbedovariations.
 ● Application:Widelyusedinculturalheritagepreservationandart
 restoration for capturing fine details on surfaces.
 Recoveringtexturemapsandalbedosiscrucialforcreatingvisuallyappealingand
 realistic3Dmodels.Thesetechniquesbridgethegapbetweenthegeometryofthe
 objectsandtheirappearance,contributingtotheoverallfidelityofvirtualoraugmented environments.
B.Tech [AIML/DS]
EAIDS254 – COMPUTER VISION JEPPIAAR UNIVERSITY
 UNIT V
 IMAGE-BASEDRENDERING AND RECOGNITION
 View interpolation Layered depth images-Light fields and Lumi graphs-
 Environment mattes - Video-based rendering-Object detection - Face recognition -
 Instance recognition - Category recognition - Context and scene understanding-
 Recognition databases and test sets.
 1. View Interpolation:
 Viewinterpolationisatechniqueusedincomputergraphicsandcomputervisionto
 generatenewviewsofascenethatarenotpresentintheoriginalsetofcapturedor rendered views.
 The goal is to create additional viewpoints between existing ones,
 providingasmoothertransitionandamoreimmersiveexperience.Thisisparticularly
 usefulinapplicationslike3Dgraphics,virtualreality,andvideoprocessing.Herearekey points about
 view interpolation:
 Description:
 ● Viewinterpolationinvolvessynthesizingviewsfromknownviewpointsina way that
 appears visually plausible and coherent.
 ● Theprimaryaimistoprovideasenseofcontinuityandsmoothtransitions between the
 available views.
 Methods:
 ● Image-BasedMethods:Thesemethodsuseimagewarpingormorphing techniques to
 generate new views by blending or deforming existing
 images.
B.Tech [AIML/DS]
EAIDS254 – COMPUTER VISION JEPPIAAR UNIVERSITY
 ● 3DReconstructionMethods:Theseapproachesinvolveestimatingthe3D geometry
 of the scene and generating new views based on the
 reconstructed3Dmodel.
 Applications:
 ● Virtual Reality (VR): In VR applications, view interpolation helps create a
 moreimmersiveexperiencebygeneratingviewsbasedontheuser'shead movements.
 ● Free-viewpointVideo:Viewinterpolationisusedinvideoprocessingto generate
 additional views for a more dynamic and interactive video
 experience.
 Challenges:
 ● Depth Discontinuities: Handling depth changes in the scene can be
 challenging, especially when interpolating between views with different depths.
 ● Occlusions:Addressingocclusions,whereobjectsinthescenemayblock the view of
 others, is a common challenge.
 Techniques:
 ● LinearInterpolation:Basiclinearinterpolationisoftenusedtogenerate
 intermediate views by blending the pixel values of adjacent views.
 ● Depth-Image-Based Rendering (DIBR): This method involves warping
 images based on depth information to generate new views.
 ● Neural Network Approaches: Deep learning techniques, including
 convolutionalneuralnetworks(CNNs),havebeenemployedforview synthesis
 tasks.
 UseCases:
 ● 3DGraphics:Viewinterpolationisusedtosmoothlytransitionbetween different
 camera angles in 3D graphics applications and games.
 ● 360-DegreeVideos:Invirtualtoursorimmersivevideos,viewinterpolation helps create
 a continuous viewing experience.
 Viewinterpolationisavaluabletoolforenhancingthevisualqualityanduserexperience in applications
 where dynamic or interactive viewpoints are essential. It enables the
 creationofmorenaturalandfluidtransitionsbetweenviews,contributingtoamore realistic and
 engaging visual presentation.
B.Tech [AIML/DS]
EAIDS254 – COMPUTER VISION JEPPIAAR UNIVERSITY
2. LayeredDepthImages:
 Layered Depth Images (LDI) is a technique used in computer graphics for efficiently
 representingcomplexsceneswithmultiplelayersofgeometryatvaryingdepths.The primary goal of
 Layered Depth Images is to provide an effective representation of scenes with transparency and
 occlusion effects. Here are key points about Layered Depth Images:
 Description:
 ● LayeredRepresentation:LDIrepresentsasceneasastackofimages,
 whereeachimagecorrespondstoaspecificdepthlayerwithinthescene.
 ● DepthInformation:EachpixelintheLDIcontainscolorinformationaswell as depth
 information, indicating the position of the pixel along the view
 direction.
 Representation:
 ● 2DArrayofImages:Conceptually,anLDIcanbethoughtofasa2Darray of images,
 where each image represents a different layer of the scene.
 ● DepthSlice:Theimagesinthearrayareoftenreferredtoas"depthslices,"
 andtheorderoftheslicescorrespondstothedepthorderingofthelayers.
 Advantages:
 ● EfficientStorage:LDIscanprovidemoreefficientstorageforsceneswith
 transparency compared to traditional methods like z-buffers.
 ● OcclusionHandling:LDIsnaturallyhandleocclusionsandtransparency,
 makingthemsuitableforrenderingsceneswithcomplexlayeringeffects.
B.Tech [AIML/DS]
EAIDS254 – COMPUTER VISION JEPPIAAR UNIVERSITY
 UseCases:
 ● AugmentedReality:LDIsareusedinaugmentedrealityapplicationswhere virtual
 objects need to be integrated seamlessly with the real world, considering
 occlusions and transparency.
 ● ComputerGames:LDIscanbeemployedinvideogamestoefficiently handle
 scenes with transparency effects, such as foliage or glass.
 SceneComposition:
 ● Compositing:Torenderascenefromaparticularviewpoint,theimages
 fromdifferentdepthslicesarecompositedtogether,takingintoaccount the depth
 values to handle transparency and occlusion.
 Challenges:
 ● MemoryUsage:Dependingonthecomplexityofthesceneandthe
 numberofdepthlayers,LDIscanconsumeasignificantamountof memory.
 ● Anti-aliasing:Handlingsmoothtransitionsbetweenlayers,especiallywhen
 dealingwithtransparency,canposechallengesforanti-aliasing.
 Extensions:
 ● Sparse Layered Representations: Some extensions of LDIs involve using
 sparserepresentationstoreducememoryrequirementswhilemaintaining the benefits
 of layered depth information.
 LayeredDepthImagesareparticularlyusefulinscenarioswheretraditionalrendering
 techniques,suchasz-buffer-basedmethods,struggletohandletransparencyand
 complexlayering.Byrepresentingscenesasastackofimages,LDIsprovideamore
 naturalwaytodealwiththechallengesposedbyrenderingsceneswithvaryingdepths and transparency
 effects.
3. LightFieldsandLumigraphs:
LightFields:
 ● Definition:Alightfieldisarepresentationofallthelightraystravelinginall directions
 through every point in a 3D space.
 ● Components:Itconsistsofboththeintensityandthedirectionoflightat each point in
 space.
B.Tech [AIML/DS]
EAIDS254 – COMPUTER VISION JEPPIAAR UNIVERSITY
 ● Capture:Lightfieldscanbecapturedusinganarrayofcamerasor
 specializedcamerasetupstorecordtheraysoflightfromdifferent perspectives.
 ● Applications:Usedincomputergraphicsforrealisticrendering,virtual
 reality,andpost-capturerefocusingwherethefocuspointcanbeadjusted after the image
 is captured.
 ●
 Lumigraphs:
 ● Definition:Alumigraphisatypeoflightfieldthatrepresentsthevisual information
 in a scene as a function of both space and direction.
 ● Capture:Lumigraphsaretypicallycapturedusingasetofimagesfroma dense
 camera array, capturing the scene from various viewpoints.
 ● Components:Similartolightfields,theyincludeinformationaboutthe intensity
 and direction of light at different points in space.
 ● Applications:Primarilyusedincomputergraphicsandcomputervisionfor 3D
 reconstruction, view interpolation, and realistic rendering of complex
 scenes.
 Comparison:
 ● Difference:Whilethetermsareoftenusedinterchangeably,alightfield
 generallyreferstothecompletesetofraysin4Dspace,whilealumigraph specifically
 refers to a light field in 3D space and direction.
 ● Similarities:Bothlightfieldsandlumigraphsaimtocapturea
 comprehensivesetofvisualinformationaboutascenetoenablerealistic rendering and
 various computational photography applications.
 Advantages:
 ● Realism:Lightfieldsandlumigraphscontributetorealisticrenderingby capturing
 the full complexity of how light interacts with a scene.
B.Tech [AIML/DS]
EAIDS254 – COMPUTER VISION JEPPIAAR UNIVERSITY
 ● Flexibility:Theyallowforpost-capturemanipulation,suchaschangingthe viewpoint or
 adjusting focus, providing more flexibility in the rendering
 process.
 Challenges:
 ● DataSize:Lightfieldsandlumigraphscangeneratelargeamountsofdata, requiring
 significant storage and processing capabilities.
 ● CaptureSetup:Acquiringahigh-qualitylightfieldorlumigraphoften requires
 specialized camera arrays or complex setups.
 Applications:
 ● VirtualReality:Usedtoenhancetherealismofvirtualenvironmentsby providing
 a more immersive visual experience.
 ● 3DReconstruction:Appliedincomputervisionforreconstructing3D scenes
 and objects from multiple viewpoints.
 FutureDevelopments:
 ● ComputationalPhotography:Ongoingresearchexploresadvanced
 computational photography techniques leveraging light fields for
 applicationslikerefocusing, depthestimation,and novelviewsynthesis.
 ● HardwareAdvances:Continuedimprovementsincameratechnologymay lead to
 more accessible methods for capturing high-quality light fields.
 Lightfieldsandlumigraphsarepowerfulconceptsincomputergraphicsandcomputer
 vision,offeringarichrepresentationofvisualinformationthatopensuppossibilitiesfor creating more
 immersive and realistic virtual experiences.
4. EnvironmentMattes:
Definition:
B.Tech [AIML/DS]
EAIDS254 – COMPUTER VISION JEPPIAAR UNIVERSITY
 Techniques:
 ● ChromaKeying:Commonlyusedinfilmandtelevision,chromakeying
 involvesshootingthesubjectagainstauniformlycoloredbackground (often green
 or blue) that can be easily removed in post-production.
 ● Rotoscoping: Involves manually tracing the outlines of the subject frame
 byframe,providingprecisecontroloverthemattebutrequiringsignificant labor.
 ● Depth-basedMattes:In3Dapplications,depthinformationcanbeusedto create a
 matte, allowing for more accurate separation of foreground and background
 elements.
 Applications:
 ● FilmandTelevisionProduction:Widelyusedintheentertainmentindustry
 tocreatespecialeffects,insertvirtualbackgrounds,orcompositeactors into different
 scenes.
 ● VirtualStudios:Invirtualproductionsetups,environmentmattesare crucial for
 seamlessly integrating live-action footage with
 computer-generatedbackgrounds.
 Challenges:
 ● Soft Edges: Achieving smooth and natural transitions between the
 foregroundandbackgroundischallenging,especiallywhendealingwith fine details
 like hair or transparent objects.
 ● MotionDynamics:Handlingdynamicsceneswithmovingsubjectsor
 dynamiccameramovementsrequiresadvancedtechniquestomaintain accurate mattes.
 SpillSuppression:
B.Tech [AIML/DS]
EAIDS254 – COMPUTER VISION JEPPIAAR UNIVERSITY
 ● Definition:Spillreferstotheunwantedinfluenceofthebackgroundcolor
 ontheforegroundsubject.Spillsuppressiontechniquesareemployedto minimize
 this effect.
 ● Importance:Ensuresthattheforegroundsubjectlooksnaturalwhen placed
 against a new background.
 Foreground-BackgroundIntegration:
 ● LightingandReflection Matching:Forrealisticresults,it'sessentialto
 matchthelightingandreflectionsbetweentheforegroundandthenew background.
 ● Shadow Casting: Consideration of shadows cast by the foreground
 elementstoensuretheyalignwiththelightingconditionsofthenew background.
 AdvancedTechniques:
 ● MachineLearning:Advancedmachinelearningtechniques,including
 semanticsegmentationanddeeplearning,areincreasinglybeingapplied to automate
 and enhance the environment matte creation process.
 ● Real-timeCompositing:Insomeapplications,especiallyinliveeventsor broadcasts,
 real-time compositing technologies are used to create
 environmentmattesonthefly.
 EvolutionwithTechnology:
 ● HDRand3DCapture:HighDynamicRange(HDR)imagingand3Dcapture
 technologies contribute to more accurate and detailed environment
 mattes.
 ● Real-timeProcessing:Advancesinreal-timeprocessingenablemore efficient
 and immediate creation of environment mattes, reducing
 post-productiontime.
 Environmentmattesplayacrucialroleinmodernvisualeffectsandvirtualproduction, allowing
 filmmakers and content creators to seamlessly integrate real and virtual elements to tell
 compelling stories.
5. Video-basedRendering:
Definition:
B.Tech [AIML/DS]
EAIDS254 – COMPUTER VISION JEPPIAAR UNIVERSITY
 ● Video-basedRendering(VBR)referstotheprocessofgenerating
 novelviewsorframesofascenebyutilizinginformationfromaset of input
 video sequences.
CaptureTechniques:
 ● ViewSynthesis:Thecoreobjectiveofvideo-basedrenderingisto
 synthesizenewviewsorframesthatwerenotoriginallycapturedbut can be
 realistically generated from the available footage.
 ● Image-BasedRendering(IBR):Techniquessuchasimage-based
 rendering,whichusecapturedimagesorvideoframesasthebasis for view
 synthesis.
 Applications:
B.Tech [AIML/DS]
EAIDS254 – COMPUTER VISION JEPPIAAR UNIVERSITY
 ● VirtualReality(VR):VBRisusedinVRapplicationstoprovideamore
 immersive experience by allowing users to explore scenes from various
 perspectives.
 ● Free-Viewpoint Video: VBR techniques enable the creation of free-
 viewpointvideo,allowinguserstointeractivelychoosetheir viewpoint
 within a scene.
 ViewSynthesisChallenges:
 ● Occlusions:Handlingocclusionsandensuringthatsynthesized
 viewsaccountforobjectsobstructingthelineofsightisasignificant challenge.
 ● Consistency:Ensuringvisualconsistencyandcoherenceacross
 synthesized views to avoid artifacts or discrepancies.
 3DReconstruction:
 ● DepthEstimation:Somevideo-basedrenderingapproachesinvolve estimating
 depth information from the input video sequences, enabling more
 accurate view synthesis.
 ● Multi-ViewStereo(MVS):Utilizingmultipleviewpointsfor3D
 reconstructiontoenhancethequalityofsynthesizedviews.
 Real-timeVideo-basedRendering:
 ● LiveEvents:Incertainscenarios,real-timevideo-basedrenderingis employed
 for live events, broadcasts, or interactive applications.
 ● LowLatency:Minimizinglatencyiscrucialforapplicationswherethe rendered
 views need to be presented in real-time.
 EmergingTechnologies:
B.Tech [AIML/DS]
EAIDS254 – COMPUTER VISION JEPPIAAR UNIVERSITY
 ● DeepLearning:Advancesindeeplearning,particularlyconvolutional neural
 networks (CNNs) and generative models, have been applied tovideo-
 basedrenderingtasks,enhancingthequalityofsynthesized views.
 ● NeuralRendering:Techniqueslikeneuralrenderingleverageneural
 networkstogeneraterealisticnovelviews,addressingchallenges like
 specular reflections and complex lighting conditions.
 HybridApproaches:
 ● CombiningTechniques:Somevideo-basedrenderingmethods
 combinetraditionalcomputergraphicsapproacheswithmachine learning
 techniques for improved results.
 ● IncorporatingVR/AR:VBRisoftenintegratedwithvirtualreality(VR)
 andaugmentedreality(AR)systemstoprovidemoreimmersiveand interactive
 experiences.
 FutureDirections:
 ● ImprovedRealism:Ongoingresearchaimstoenhancetherealismof
 synthesizedviews,addressingchallengesrelatedtocomplexscene
 dynamics,lightingvariations,andrealisticmaterialrendering.
 ● ApplicationsBeyondEntertainment:Video-basedrenderingis
 expandingintofieldslikeremotecollaboration,telepresence,and
 interactive content creation.
 Video-basedrenderingisadynamicfieldthatplaysacrucialroleinshaping immersive
 experiences across various domains, including entertainment,
B.Tech [AIML/DS]
EAIDS254 – COMPUTER VISION JEPPIAAR UNIVERSITY
 communication,andvirtualexploration.Advancesintechnologyandresearch
 continuetopushtheboundariesofwhatisachievableintermsofrealisticview synthesis.
6. Object Detection:
Definition:
 ● ObjectDetectionisacomputervisiontaskthatinvolvesidentifyingand
 locatingobjectswithinanimageorvideo.Thegoalistodrawbounding
 boxesaroundthedetectedobjectsandassignalabeltoeachidentified object.
 ObjectLocalizationvs.ObjectRecognition:
 ● ObjectLocalization:Inadditiontoidentifyingobjects,objectdetectionalso involves
 providing precise coordinates (bounding box) for the location of each detected
 object within the image.
 ● Object Recognition: While object detection includes localization, the term
 isoftenusedinconjunctionwithrecognizingandcategorizingtheobjects.
 Methods:
B.Tech [AIML/DS]
EAIDS254 – COMPUTER VISION JEPPIAAR UNIVERSITY
 ● Two-StageDetectors:Thesemethodsfirstproposeregionsintheimage
 thatmightcontainobjectsandthenclassifyandrefinethoseproposals. Examples
 include Faster R-CNN.
 ● One-Stage Detectors: These methods simultaneously predict object
 boundingboxesandclasslabelswithoutaseparateproposalstage.
 ExamplesincludeYOLO(YouOnlyLookOnce)andSSD(SingleShot
 Multibox Detector).
 ● Anchor-basedandAnchor-freeApproaches:Somemethodsuseanchor
 boxestopredictobjectlocationsandsizes,whileothersadoptanchor-free strategies.
 Applications:
 ● AutonomousVehicles:Objectdetectioniscrucialforautonomousvehicles to identify
 pedestrians, vehicles, and other obstacles.
 ● SurveillanceandSecurity:Usedinsurveillancesystemstodetectand track
 objects or individuals of interest.
 ● Retail:Appliedinretailforinventorymanagementandcustomerbehavior analysis.
 ● MedicalImaging:Objectdetectionisusedtoidentifyandlocate
 abnormalities in medical images.
 ● AugmentedReality:UtilizedforrecognizingandtrackingobjectsinAR
 applications.
 Challenges:
 ● ScaleVariations:Objectscanappearatdifferentscalesinimages, requiring
 detectors to be scale-invariant.
 ● Occlusions:Handlingsituationswhereobjectsarepartiallyorfully occluded
 by other objects.
 ● Real-timeProcessing:Achievingreal-timeperformanceforapplications like video
 analysis and robotics.
 EvaluationMetrics:
 ● IntersectionoverUnion(IoU):Measurestheoverlapbetweenthepredicted and ground
 truth bounding boxes.
 ● PrecisionandRecall:Metricstoevaluatethetrade-offbetweencorrectly detected
 objects and false positives.
 DeepLearninginObjectDetection:
 ● ConvolutionalNeuralNetworks(CNNs):Deeplearning,especiallyCNNs, has
 significantly improved object detection accuracy.
 ● Region-basedCNNs(R-CNN):Introducedtheideaofregionproposal networks
 to improve object localization.
B.Tech [AIML/DS]
EAIDS254 – COMPUTER VISION JEPPIAAR UNIVERSITY
 ● SingleShotMultiboxDetector(SSD),YouOnlyLookOnce(YOLO):
 One-stagedetectorsthatarefasterandsuitableforreal-timeapplications.
 TransferLearning:
 ● Pre-trainedModels:Transferlearninginvolvesusingpre-trainedmodelson large
 datasets and fine-tuning them for specific object detection tasks.
 ● PopularArchitectures:ModelslikeResNet,VGG,andMobileNetareoften used as
 backbone architectures for object detection.
 RecentAdvancements:
 ● EfficientDet:Anefficientobjectdetectionmodelthatbalancesaccuracy and
 efficiency.
 ● CenterNet:Focusesonpredictingobjectcentersandregressingbounding box
 parameters.
 ObjectDetectionDatasets:
 ● COCO(CommonObjectsinContext):Widelyusedforevaluatingobject detection
 algorithms.
 ● PASCALVOC(VisualObjectClasses):Anotherbenchmarkdatasetfor object
 detection tasks.
 ● ImageNet:Originallyknownforimageclassification,ImageNethasalso been used
 for object detection challenges.
 Objectdetectionisafundamentaltaskincomputervisionwithwidespreadapplications
 acrossvariousindustries.Advancesindeeplearningandtheavailabilityoflarge-scale datasets have
 significantly improved the accuracy and efficiency of object detection models in recent years.
7. FaceRecognition:
Definition:
B.Tech [AIML/DS]
EAIDS254 – COMPUTER VISION JEPPIAAR UNIVERSITY
 ● FeatureExtraction:Capturingdistinctivefeaturesoftheface,suchasthe distances
 between eyes, nose, and mouth, and creating a unique
 representation.
 ● MatchingAlgorithm:Comparingtheextractedfeatureswithpre-existing templates
 to identify or verify a person.
 Methods:
 ● Eigenfaces:Atechniquethatrepresentsfacesaslinearcombinationsof principal
 components.
 ● LocalBinaryPatterns(LBP):Atexture-basedmethodthatcaptures patterns of
 pixel intensities in local neighborhoods.
 ● Deep Learning: Convolutional Neural Networks (CNNs) have significantly
 improvedfacerecognitionaccuracy,witharchitectureslikeFaceNetand VGGFace.
 Applications:
 ● SecurityandAccessControl:Commonlyusedinsecureaccesssystems, unlocking
 devices, and building access.
 ● LawEnforcement:Appliedforidentifyingindividualsincriminal
 investigations and monitoring public spaces.
 ● Retail:Usedforcustomeranalytics,personalizedadvertising,and enhancing
 customer experiences.
 ● Human-ComputerInteraction:Implementedinapplicationsforfacial expression
 analysis, emotion recognition, and virtual avatars.
 Challenges:
 ● VariabilityinPose:Recognizingfacesunderdifferentposesand
 orientations.
 ● IlluminationChanges:Handlingvariationsinlightingconditionsthatcan affect the
 appearance of faces.
B.Tech [AIML/DS]
EAIDS254 – COMPUTER VISION JEPPIAAR UNIVERSITY
 ● AgingandEnvironmentalFactors:Adaptingtochangesinappearancedue to aging,
 facial hair, or accessories.
 PrivacyandEthicalConsiderations:
 ● DataPrivacy:Concernsaboutthecollectionandstorageoffacialdataand the potential
 misuse of such information.
 ● Bias and Fairness: Ensuring fairness and accuracy, particularly across
 diversedemographicgroups,toavoidbiasesinfacerecognitionsystems.
 LivenessDetection:
 ● Definition:Atechniqueusedtodeterminewhetherthepresentedfaceis from a live
 person or a static image.
 ● Importance:Preventsunauthorizedaccessusingphotosorvideostotrick the system.
 MultimodalBiometrics:
 ● Fusionwith OtherModalities: Combining facerecognition with other
 biometricmethods,suchasfingerprintoririsrecognition,forimproved accuracy.
 Real-time FaceRecognition:
 ● Applications:Real-timefacerecognitionisessentialforapplicationslike video
 surveillance, access control, and human-computer interaction.
 ● Challenges:Ensuringlowlatencyandhighaccuracyinreal-timescenarios. Benchmark
 Datasets:
 ● LabeledFacesintheWild(LFW):Apopulardatasetforfacerecognition, containing
 images collected from the internet.
 ● CelebA:Datasetwithcelebrityfacesfortrainingandevaluation.
 ● MegaFace:Benchmarkforevaluatingtheperformanceoffacerecognition systems at
 a large scale.
 Facerecognitionisarapidlyevolvingfieldwithnumerousapplicationsandongoing
 researchtoaddresschallengesandenhanceitscapabilities.Itplaysacrucialrolein various industries,
 from security to personalized services, contributing to the advancement of biometric
 technologies.
8. Instance Recognition:
Definition:
B.Tech [AIML/DS]
EAIDS254 – COMPUTER VISION JEPPIAAR UNIVERSITY
 ●
 ObjectRecognition vs. Instance Recognition:
 ● ObjectRecognition:Identifiesobjectcategoriesinanimagewithout
 distinguishing between different instances of the same category.
 ● InstanceRecognition:Assignsuniqueidentifiers toindividualinstancesof objects,
 allowing for differentiation between multiple occurrences of the same category.
 SemanticSegmentationandInstanceSegmentation:
 ● Semantic Segmentation: Assigns a semantic label to each pixel in an
 image,indicatingthecategorytowhichitbelongs(e.g.,road,person,car).
 ● InstanceSegmentation:Extendssemanticsegmentationbyassigninga unique
 identifier to each instance of an object, enabling differentiation between
 separate objects of the same category.
 Methods:
 ● MaskR-CNN:Apopularinstancesegmentationmethodthatextendsthe FasterR-
 CNNarchitecturetoprovidepixel-levelmasksforeachdetected object instance.
 ● Point-basedMethods:Someinstancerecognitionapproachesoperateon point clouds
 or 3D data to identify and distinguish individual instances.
 ● FeatureEmbeddings:Utilizingdeeplearningmethodstolearn
 discriminative feature embeddings for different instances.
 Applications:
 ● AutonomousVehicles:Instancerecognitioniscrucialfordetectingand tracking
 individual vehicles, pedestrians, and other objects in the
 environment.
B.Tech [AIML/DS]
EAIDS254 – COMPUTER VISION JEPPIAAR UNIVERSITY
 ● Robotics:Usedforobjectmanipulation,navigation,andscene
 understanding in robotics applications.
 ● AugmentedReality:Enablestheaccurateoverlayofvirtualobjectsonto the real
 world by recognizing and tracking specific instances.
 ● MedicalImaging:Identifyinganddistinguishingindividualstructuresor anomalies
 in medical images.
 Challenges:
 ● Occlusions:Handlingsituationswhereobjectspartiallyorfullyocclude each
 other.
 ● ScaleVariations:Recognizinginstancesatdifferentscaleswithinthe same
 image or scene.
 ● ComplexBackgrounds:Dealingwithclutteredorcomplexbackgrounds that may
 interfere with instance recognition.
 Datasets:
 ● COCO(CommonObjectsinContext):Whileprimarilyusedforobject
 detectionandsegmentation,COCOalsocontainsinstancesegmentation annotations.
 ● Cityscapes:Adatasetdesignedforurbansceneunderstanding,including pixel-level
 annotations for object instances.
 ● ADE20K:Alarge-scaledatasetforsemanticandinstancesegmentationin diverse
 scenes.
 EvaluationMetrics:
 ● IntersectionoverUnion(IoU):Measurestheoverlapbetweenpredicted and
 ground truth masks.
 ● MeanAveragePrecision(mAP):Commonlyusedforevaluatingthe precision
 of instance segmentation algorithms.
 Real-timeInstanceRecognition:
 ● Applications:Inscenarioswherereal-timeprocessingiscrucial,suchas robotics,
 autonomous vehicles, and augmented reality.
 ● Challenges:Balancingaccuracywithlow-latencyrequirementsfor real-time
 performance.
 FutureDirections:
 ● WeaklySupervised Learning: Exploring methodsthat require less
 annotationeffort,suchasweaklysupervisedorself-supervisedlearning for instance
 recognition.
 ● Cross-ModalInstanceRecognition:Extendinginstancerecognitionto
 operateacrossdifferentmodalities,suchascombiningvisualandtextual information
 for more comprehensive recognition.
B.Tech [AIML/DS]
EAIDS254 – COMPUTER VISION JEPPIAAR UNIVERSITY
 Instancerecognitionisafundamentaltaskincomputervisionthatenhancesourability
 tounderstandandinteractwiththevisualworldbyprovidingdetailedinformationabout individual
 instances of objects or entities within a scene.
9. CategoryRecognition:
Definition:
 ● CategoryRecognition,alsoknownasobjectcategoryrecognitionorimage
 categorization, involves assigning a label or category to an entire image
 based on the objects or scenes it contains. The goal is to identify the
 overallcontentorthemeofanimagewithoutnecessarilydistinguishing individual
 instances or objects within it.
 Scope:
 ● Whole-ImageRecognition:Categoryrecognitionfocusesonrecognizing and
 classifying the entire content of an image rather than identifying
 specificinstancesordetailswithintheimage.
 ●
 Methods:
 ● ConvolutionalNeuralNetworks(CNNs):Deeplearningmethods,
 particularlyCNNs,haveshownsignificantsuccessinimagecategorization tasks,
 learning hierarchical features.
 ● Bag-of-Visual-Words:Traditionalcomputervisionapproachesthat
 representimagesashistogramsofvisualwordsbasedonlocalfeatures.
 ● TransferLearning:Leveragingpre-trainedmodelsonlargedatasetsand fine-tuning
 them for specific category recognition tasks.
 Applications:
 ● ImageTagging:Automaticallyassigningrelevanttagsorlabelstoimages for
 organization and retrieval.
B.Tech [AIML/DS]
EAIDS254 – COMPUTER VISION JEPPIAAR UNIVERSITY
 ● Content-BasedImageRetrieval(CBIR):Enablingtheretrievalofimages based on
 their content rather than textual metadata.
 ● VisualSearch:Poweringapplicationswhereuserscansearchforsimilar images by
 providing a sample image.
 Challenges:
 ● Intra-classVariability:Dealingwithvariationswithinthesamecategory, such as
 different poses, lighting conditions, or object appearances.
 ● Fine-grainedCategorization:Recognizingsubtledifferencesbetween closely
 related categories.
 ● HandlingClutter:Recognizingthemaincategoryinimageswithcomplex
 backgrounds or multiple objects.
 Datasets:
 ● ImageNet:Alarge-scaledatasetcommonlyusedforimageclassification tasks,
 consisting of a vast variety of object categories.
 ● CIFAR-10andCIFAR-100:Datasetswithsmallerimagesandmultiple
 categories,oftenusedforbenchmarkingimagecategorizationmodels.
 ● OpenImages:Adatasetwithalargenumberofannotatedimages covering
 diverse categories.
 EvaluationMetrics:
 ● Top-kAccuracy:Measurestheproportionofimagesforwhichthecorrect category is
 among the top-k predicted categories.
 ● ConfusionMatrix:Providesadetailedbreakdownofcorrectandincorrect predictions
 across different categories.
 Multi-LabelCategorization:
 ● Definition:Extendscategoryrecognitiontohandlecaseswhereanimage may belong
 to multiple categories simultaneously.
 ● Applications:Usefulinscenarioswhereimagescanhavecomplexcontent that falls
 into multiple distinct categories.
 Real-worldApplications:
 ● E-commerce:Categorizingproductimagesforonlineshoppingplatforms.
 ● ContentModeration:Identifyingandcategorizingcontentformoderation purposes,
 such as detecting inappropriate or unsafe content.
 ● AutomatedTagging:Automaticallycategorizingandtaggingimagesin digital
 libraries or social media platforms.
 FutureTrends:
 ● WeaklySupervised Learning: Exploring methodsthat require less
 annotateddatafortraining,suchasweaklysupervisedorself-supervised learning for
 category recognition.
B.Tech [AIML/DS]
EAIDS254 – COMPUTER VISION JEPPIAAR UNIVERSITY
 ● InterpretableModels:Developingmodelsthatprovideinsightsintothe decision-
 makingprocessforbetterinterpretabilityandtrustworthiness.
 Categoryrecognitionformsthebasisforvariousapplicationsinimageunderstanding
 andretrieval,providingawaytoorganizeandinterpretvisualinformationatabroader
 level.Advancesindeeplearningandtheavailabilityoflarge-scaledatasetscontinueto drive
 improvements in the accuracy and scalability of category recognition models.
10. ContextandSceneUnderstanding:
Definition:
 ● ContextandSceneUnderstandingincomputervisioninvolves
 comprehendingtheoverallcontextofascene,recognizingrelationships
 betweenobjects,andunderstandingthesemanticmeaningofthevisual elements
 within an image or a sequence of images.
 SceneUnderstandingvs.ObjectRecognition:
 ● ObjectRecognition:Focusesonidentifyingandcategorizingindividual objects
 within an image.
 ● Scene Understanding: Encompasses a broader understanding of the
 relationships,interactions,andcontextualinformationthatcharacterize the overall
 scene.
 ElementsofContextandSceneUnderstanding:
 ● SpatialRelationships:Understandingthespatialarrangementandrelative positions of
 objects within a scene.
 ● TemporalContext:Incorporatinginformationfromasequenceofimages or frames
 to understand changes and dynamics over time.
 ● SemanticContext:Recognizingthesemanticrelationshipsandmeanings associated
 with objects and their interactions.
 Methods:
 ● Graph-based Representations: Modeling scenes as graphs, where nodes
 representobjectsandedgesrepresentrelationships,tocapturecontextual information.
 ● RecurrentNeuralNetworks(RNNs)andLongShort-TermMemory(LSTM):
 Utilizingrecurrentarchitecturesforprocessingsequencesofimagesand capturing
 temporal context.
B.Tech [AIML/DS]
EAIDS254 – COMPUTER VISION JEPPIAAR UNIVERSITY
 ● GraphNeuralNetworks(GNNs):ApplyingGNNstomodelcomplex
 relationships and dependencies in scenes.
 Applications:
 ● AutonomousVehicles:Sceneunderstandingiscriticalforautonomous
 navigation,asitinvolvescomprehendingtheroad,traffic,anddynamic elements in
 the environment.
 ● Robotics:Enablingrobotstounderstandandnavigatethroughindoorand outdoor
 environments.
 ● AugmentedReality:Integratingvirtualobjectsintotherealworldinaway that
 considers the context and relationships with the physical
 environment.
 ● SurveillanceandSecurity:Enhancingtheanalysisofsurveillancefootage by
 understanding activities and anomalies in scenes.
 Challenges:
 ● Ambiguity:Scenescanbeambiguous,andobjectsmayhavemultiple
 interpretations depending on context.
 ● ScaleandComplexity:Handlinglarge-scalesceneswithnumerousobjects and complex
 interactions.
 ● DynamicEnvironments:Adaptingtochangesinscenesovertime, especially
 in dynamic and unpredictable environments.
 SemanticSegmentationandSceneParsing:
 ● SemanticSegmentation:Assigningsemanticlabelstoindividualpixelsin an image,
 providing a detailed understanding of object boundaries.
 ● SceneParsing:Extendingsemanticsegmentationtorecognizeand understand
 the overall scene layout and context.
 HierarchicalRepresentations:
 ● MultiscaleRepresentations:Capturinginformationatmultiplescales,from individual
 objects to the overall scene layout.
 ● HierarchicalModels:Employinghierarchicalstructurestorepresent objects,
 sub-scenes, and the global context.
 Context-AwareObjectRecognition:
 ● Definition:Enhancingobjectrecognitionbyconsideringthecontextual
 information surrounding objects.
 ● Example:Understandingthata"bat"inascenewithaballandagloveis likely
 associated with the sport of baseball.
 FutureDirections:
 ● Cross-Modal Understanding: Integrating information from different
 modalities,suchascombiningvisualandtextualinformationforamore
 comprehensive understanding.
B.Tech [AIML/DS]
EAIDS254 – COMPUTER VISION JEPPIAAR UNIVERSITY
 ● ExplainabilityandInterpretability:Developingmodelsthatcanprovide
 explanations for their decisions to enhance transparency and trust.
11. RecognitionDatabasesandTestSets:
 Recognitiondatabasesandtestsetsplayacrucialroleinthedevelopmentand evaluation of
 computer vision algorithms, providing standardized datasets for
 training,validating,andbenchmarkingvariousrecognitiontasks.Thesedatasets
 oftencoverawiderangeofdomains,fromobjectrecognitiontoscene
 understanding.Herearesomecommonlyusedrecognitiondatabasesandtest sets:
 ImageNet:
 ● Task:ImageClassification,Object Recognition
 ● Description:ImageNetLargeScaleVisualRecognitionChallenge(ILSVRC) is a
 widely used dataset for image classification and object detection. It includes
 millions of labeled images across thousands of categories.
 COCO(CommonObjectsinContext):
 ● Tasks:ObjectDetection,InstanceSegmentation,KeypointDetection
 ● Description:COCOisalarge-scaledatasetthatincludescomplexscenes with
 multiple objects and diverse annotations. It is commonly used for evaluating
 algorithms in object detection and segmentation tasks.
 PASCALVOC(VisualObjectClasses):
 ● Tasks:Object Detection,Image Segmentation, ObjectRecognition
 ● Description:PASCALVOCdatasetsprovideannotatedimageswithvarious
 objectcategories.Theyarewidelyusedforbenchmarkingobjectdetection and
 segmentation algorithms.
 MOT(Multiple Object Tracking) Datasets:
 ● Task:MultipleObjectTracking
 ● Description:MOTdatasetsfocusontrackingmultipleobjectsinvideo sequences.
 They include challenges related to object occlusion,
 appearancechanges,andinteractions.
B.Tech [AIML/DS]
EAIDS254 – COMPUTER VISION JEPPIAAR UNIVERSITY
 KITTIVisionBenchmarkSuite:
 ● Tasks:ObjectDetection,Stereo,VisualOdometry
 ● Description:KITTI dataset isdesigned for autonomousdriving research
 andincludestaskssuchasobjectdetection,stereoestimation,andvisual odometry
 using data collected from a car.
 ADE20K:
 ● Tasks:SceneParsing,SemanticSegmentation
 ● Description:ADE20Kisadatasetforsemanticsegmentationandscene
 parsing.Itcontainsimageswithdetailedannotationsforpixel-levelobject categories
 and scene labels.
 Cityscapes:
 ● Tasks:SemanticSegmentation,InstanceSegmentation
 ● Description:Cityscapes dataset focuseson urban scenesand is
 commonlyusedforsemanticsegmentationandinstancesegmentation tasks in the
 context of autonomous driving and robotics.
 CelebA:
 ● Tasks:FaceRecognition,AttributeRecognition
 ● Description:CelebAisadatasetcontainingimagesofcelebritieswith annotations
 for face recognition and attribute recognition tasks.
 LFW(LabeledFacesintheWild):
 ● Task:FaceVerification
 ● Description: LFW dataset is widely used for face verification tasks,
 consistingofimagesoffacescollectedfromtheinternetwithlabeled pairs of
 matching and non-matching faces.
 OpenImagesDataset:
 ● Tasks:ObjectDetection,ImageClassification
 ● Description:OpenImagesDatasetisalarge-scaledatasetthatincludes
 imageswithannotationsforobjectdetection,imageclassification,and visual
 relationship prediction.
 Theserecognitiondatabasesandtestsetsserveasbenchmarksforevaluatingthe
 performanceofcomputervisionalgorithms.Theyprovidestandardizedanddiverse
 data,allowingresearchersanddeveloperstocomparetheeffectivenessofdifferent approaches
 across a wide range of tasks and applications
B.Tech [AIML/DS]