As part of my studies in the School of Spatial Planning & Development I authored the thesis “Land Cover Analysis using OBIA: a case study of Oraiokastro, Chalkidona and Delta” , which I defended in 2020.
The term Object-Based in Image Analysis refers to the scientific notion of analyzing an image in generally meaningful chunks or pieces (i.e. groups of pixels), by first segmenting it. Image segmentation as a preprocessing step in image analysis was an almost natural evolution of the classic pixel-based approach long before Convolutional Neural Networks became accessible.
There are plenty of algorithms for segmenting an image into meaningful objects, depending on the usecase, but they can be generalized in two groups of strategies - spectral (or color) and spatial. The former refers to grouping pixels of similar color (e.g. KMeans clustering), while the latter refers to grouping according to certain shape or spatial relations (e.g. image gridding, grouping pixels in $n \times n$ cells). In my thesis I focused on a rather computationally expensive algorithm developed by Trimble called multiresolution segmentation, which tries to creatively combine these two aspects of image segmentation.
Multiresolution segmentation is an iterative algorithm that aims to build image objects from the ground up according to color and shape rules until a desired threshold is reached. This means that segmenting an image depends on trial and error to find the optimal parameters. Specifically, the algorithm defines two kinds of heterogeneity contained within the objects, one of color and one of shape, and the only required parameter it needs to function is that of scale. Scale is a scalar effectively describing the size of the desired image objects to be generated, but in reality it is nothing more than the average heterogeneity of the objects. Because of that, an appropriate value for scale depends on the radiometry of a specific scene and is not transferable to another. Starting from single pixels, this algorithm tries to form objects following the criterion of maintaining object heterogeneity minimal in every step throughout the scene.
An approximation of the merging criterion for a current object $c$ against an $m$ amount of object candidates can be defined as follows:
\begin{align}
\\
\hline
&\textbf{Declare } \Delta H[m]; \\
\hline
&\textbf{for} \; i = 1 \; \textbf{to} \;m\; \textbf{do} \\
&\hspace{5mm} \Delta h_i \quad \gets h_\text{merged} - (h_c + h_i) \\
&\hspace{5mm} \Delta H[i] \gets \Delta h_i \\
&\textbf{if} \operatorname{min} (\Delta H) < scale \\
&\hspace{5mm} candidate \gets \operatorname{argmin} (\Delta H) \\
&\hspace{5mm} \operatorname{merge} (candidate)
\end{align}
The color heterogeneity of the algorithm is defined as follows and it increases with object size and contrast of contents:
\begin{align}
\\
\hline
&n: \text{number of object pixels} \\
&\sigma: \text{standard deviation of pixel intensities} \\
\hline
&h_c = n \cdot \sigma \tag{1} \\
\end{align}
The algorithm goes on to define an object shape heterogeneity that consists of two components:
\begin{align}
\\
\hline
&l : \text{object perimeter length}\\
&b : \text{perimeter length of object’s bounding box}\\
&n : \text{size of object in pixels}\\
\hline
&compactness =\frac{l}{\sqrt{n}}& \tag{2} \\
&smoothness = \frac{l}{b}& \tag{3} \\
&h_s = \alpha \cdot compactness + (1 - \alpha) \cdot smoothness,
\; \alpha \in [0, 1]& \tag{4}
\end{align}
The $compactness$ scalar aims to keep the perimeter length low relative to the object size, while the $smoothness$ value keeps a low perimeter relative to the bounding box. Both of those magnitudes keep the object’s shape from deviating quickly.
Finally, a weighted average of $h_c$ and $h_s$ is used as the criterion for merging.