Convolutions on Images

For this section, we will no longer be focusing on signals, but instead images (arrays filled with elements of red, green, and blue values). That said, for the code examples, greyscale images may be used such that each array element is composed of some floating-point value instead of color. In addition, we will not be discussing boundary conditions too much in this chapter and will instead be using the simple boundaries introduced in the section on one-dimensional convolutions.

The extension of one-dimensional convolutions to two dimensions requires a little thought about indexing and the like, but is ultimately the same operation. Here is an animation of a convolution for a two-dimensional image:

In this case, we convolved the image with a 3x3 square filter, all filled with values of $\frac{1}{9}$ . This created a simple blurring effect, which is somewhat expected from the discussion in the previous section. In code, a two-dimensional convolution might look like this:

function convolve_linear(signal::Array{T, 2}, filter::Array{T, 2},
                         output_size) where {T <: Number}

    # convolutional output
    out = Array{Float64,2}(undef, output_size)
    sum = 0

    for i = 1:output_size[1]
        for j = 1:output_size[2]
            for k = max(1, i-size(filter)[1]):i
                for l = max(1, j-size(filter)[2]):j
                    if k <= size(signal)[1] && i-k+1 <= size(filter)[1] &&
                       l <= size(signal)[2] && j-l+1 <= size(filter)[2]
                        sum += signal[k,l] * filter[i-k+1, j-l+1]
                    end
                end
            end

            out[i,j] = sum
            sum = 0
        end
    end

    return out
end

def convolve_linear(signal, filter, output_size):
    out = np.zeros(output_size)
    sum = 0

    for i in range(output_size[0]):
        for j in range(output_size[1]):
            for k in range(max(0, i-filter.shape[0]), i+1):
                for l in range(max(0, j-filter.shape[1]), j+1):
                    with suppress(IndexError):
                        sum += signal[k, l] * filter[i-k, j-l]
            out[i, j] = sum
            sum = 0

    return out

This is very similar to what we have shown in previous sections; however, it essentially requires four iterable dimensions because we need to iterate through each axis of the output domain and the filter.

At this stage, it is worth highlighting common filters used for convolutions of images. In particular, we will further discuss the Gaussian filter introduced in the previous section, and then introduce another set of kernels known as Sobel operators, which are used for naïve edge detection or image derivatives.

The Gaussian kernel

The Gaussian kernel serves as an effective blurring operation for images. As a reminder, the formula for any Gaussian distribution is

$g(x,y) = \frac{1}{2\pi\sigma^2}e^{-\frac{x^2+y^2}{2\sigma^2}},$

where $\sigma$ is the standard deviation and is a measure of the width of the Gaussian. A larger $\sigma$ means a larger Gaussian; however, remember that the Gaussian must fit onto the filter, otherwise it will be cut off! For example, if you are using a $3\times 3$ filter, you should not be using $\sigma = 10$ . Some definitions of $\sigma$ allow users to have a separate deviation in $x$ and $y$ to create an ellipsoid Gaussian, but for the purposes of this chapter, we will assume $\sigma_x = \sigma_y$ . As a general rule of thumb, the larger the filter and standard deviation, the more "smeared" the final convolution will be.

At this stage, it is important to write some code, so we will generate a simple function that returns a Gaussian kernel with a specified standard deviation and filter size.

function create_gaussian_kernel(kernel_size)

    kernel = zeros(kernel_size, kernel_size)

    # The center must be offset by 0.5 to find the correct index
    center = kernel_size * 0.5 + 0.5

    sigma = sqrt(0.1*kernel_size)

    for i = 1:kernel_size
        for j = 1:kernel_size
            kernel[i,j] = exp(-((i-center)^2 + (j-center)^2) / (2*sigma^2))
        end
    end

    return normalize(kernel)

end

def create_gaussian_kernel(kernel_size):
    kernel = np.zeros((kernel_size, kernel_size))

    # The center must be offset by 0.5 to find the correct index
    center = kernel_size*0.5 + 0.5

    sigma = np.sqrt(0.1*kernel_size)

    def kernel_function(x, y):
        return np.exp(-((x-center+1)**2 + (y-center+1)**2)/(2*sigma**2))

    kernel = np.fromfunction(kernel_function, (kernel_size, kernel_size))
    return kernel / np.linalg.norm(kernel)

Though it is entirely possible to create a Gaussian kernel whose standard deviation is independent on the kernel size, we have decided to enforce a relation between the two in this chapter. As always, we encourage you to play with the code and create your own Gaussian kernels any way you want! As a note, all the kernels will be scaled (normalized) at the end by the sum of all internal elements. This ensures that the output of the convolution will not have an obnoxious scale factor associated with it.

Below are a few images generated by applying a kernel generated with the code above to a black and white image of a circle.

In (a), we show the original image, which is just a white circle at the center of a $50\times 50$ grid. In (b), we show the image after convolution with a $3\times 3$ kernel. In (c), we show the image after convolution with a $20\times 20$ kernel. Here, we see that (c) is significantly fuzzier than (b), which is a direct consequence of the kernel size.

There is a lot more that we could talk about, but now is a good time to move on to a slightly more complicated convolutional method: the Sobel operator.

The Sobel operator

The Sobel operator effectively performs a gradient operation on an image by highlighting areas where a large change has been made. In essence, this means that this operation can be thought of as a naïve edge detector. Essentially, the $n$ -dimensional Sobel operator is composed of $n$ separate gradient convolutions (one for each dimension) that are then combined together into a final output array. Again, for the purposes of this chapter, we will stick to two dimensions, which will be composed of two separate gradients along the $x$ and $y$ directions. Each gradient will be created by convolving our image with their corresponding Sobel operator:

$\begin{align} S_x &= \left(\begin{bmatrix} 1 \\ 2 \\ 1 \\ \end{bmatrix} \otimes [1~0~-1] \right) = \begin{bmatrix} 1 & 0 & -1 \\ 2 & 0 & -2 \\ 1 & 0 & -1 \\ \end{bmatrix}\\ S_y &= \left( \begin{bmatrix} 1 \\ 0 \\ -1 \\ \end{bmatrix} \otimes [1~2~1] \right) = \begin{bmatrix} 1 & 2 & 1 \\ 0 & 0 & 0 \\ -1 & -2 & -1 \\ \end{bmatrix}. \end{align}$

The gradients can then be found with a convolution, such that:

$\begin{align} G_x &= S_x*A \\ G_y &= S_y*A. \end{align}$

Here, $A$ is the input array or image. Finally, these gradients can be summed in quadrature to find the total Sobel operator or image gradient:

$G_{\text{total}} = \sqrt{G_x^2 + G_y^2}$

So let us now show what it does in practice:

In this diagram, we start with the circle image on the right, and then convolve it with the $S_x$ and $S_y$ operators to find the gradients along $x$ and $y$ before summing them in quadrature to get the final image gradient. Here, we see that the edges of our input image have been highlighted, showing outline of our circle. This is why the Sobel operator is also known as naïve edge detection and is an integral component to many more sophisticated edge detection methods like one proposed by Canny [1].

In code, the Sobel operator involves first finding the operators in $x$ and $y$ and then applying them with a traditional convolution:

function create_sobel_operators()
    Sx = [1.0, 2.0, 1.0]*[-1.0 0.0 1.0] / 9
    Sy = [-1.0, 0.0, 1.0]*[1.0 2.0 1.0] / 9

    return Sx, Sy
end

function compute_sobel(signal)
    Sx, Sy = create_sobel_operators()

    Gx = convolve_linear(signal, Sx, size(signal) .+ size(Sx))
    Gy = convolve_linear(signal, Sy, size(signal) .+ size(Sy))

    return sqrt.(Gx.^2 .+ Gy.^2)
end

def create_sobel_operators():
    Sx = np.dot([[1.0], [2.0], [1.0]], [[-1.0, 0.0, 1.0]]) / 9
    Sy = np.dot([[-1.0], [0.0], [1.0]], [[1.0, 2.0, 1.0]]) / 9

    return Sx, Sy

def sum_matrix_dimensions(mat1, mat2):
    return (mat1.shape[0] + mat2.shape[0], 
            mat1.shape[1] + mat2.shape[1])

def compute_sobel(signal):
    Sx, Sy = create_sobel_operators()

    Gx = convolve_linear(signal, Sx, sum_matrix_dimensions(signal, Sx))
    Gy = convolve_linear(signal, Sy, sum_matrix_dimensions(signal, Sy))

    return np.sqrt(np.power(Gx, 2) + np.power(Gy, 2))

With that, I believe we are at a good place to stop discussions on two-dimensional convolutions. We will definitely return to this topic in the future as new algorithms require more information.

Example Code

For the code in this section, we have modified the visualizations from the one-dimensional convolution chapter to add a two-dimensional variant for blurring an image of random white noise. We have also added code to create the Gaussian kernel and Sobel operator and apply it to the circle, as shown in the text.

using DelimitedFiles
using LinearAlgebra

function convolve_linear(signal::Array{T, 2}, filter::Array{T, 2},
                         output_size) where {T <: Number}

    # convolutional output
    out = Array{Float64,2}(undef, output_size)
    sum = 0

    for i = 1:output_size[1]
        for j = 1:output_size[2]
            for k = max(1, i-size(filter)[1]):i
                for l = max(1, j-size(filter)[2]):j
                    if k <= size(signal)[1] && i-k+1 <= size(filter)[1] &&
                       l <= size(signal)[2] && j-l+1 <= size(filter)[2]
                        sum += signal[k,l] * filter[i-k+1, j-l+1]
                    end
                end
            end

            out[i,j] = sum
            sum = 0
        end
    end

    return out
end

function create_gaussian_kernel(kernel_size)

    kernel = zeros(kernel_size, kernel_size)

    # The center must be offset by 0.5 to find the correct index
    center = kernel_size * 0.5 + 0.5

    sigma = sqrt(0.1*kernel_size)

    for i = 1:kernel_size
        for j = 1:kernel_size
            kernel[i,j] = exp(-((i-center)^2 + (j-center)^2) / (2*sigma^2))
        end
    end

    return normalize(kernel)

end

function create_sobel_operators()
    Sx = [1.0, 2.0, 1.0]*[-1.0 0.0 1.0] / 9
    Sy = [-1.0, 0.0, 1.0]*[1.0 2.0 1.0] / 9

    return Sx, Sy
end

function compute_sobel(signal)
    Sx, Sy = create_sobel_operators()

    Gx = convolve_linear(signal, Sx, size(signal) .+ size(Sx))
    Gy = convolve_linear(signal, Sy, size(signal) .+ size(Sy))

    return sqrt.(Gx.^2 .+ Gy.^2)
end

# Simple function to create a square grid with a circle embedded inside of it
function create_circle(image_resolution, grid_extents, radius)
    out = zeros(image_resolution, image_resolution)

    for i = 1:image_resolution
        x_position = ((i-1)*grid_extents/image_resolution)-0.5*grid_extents
        for j = 1:image_resolution
            y_position = ((j-1)*grid_extents/image_resolution)-0.5*grid_extents
            if x_position^2 + y_position^2 <= radius^2
                out[i,j] = 1.0
            end
        end
    end 

    return out
end

function main()

    # Random distribution in x
    x = rand(100, 100)

    # Gaussian signals
    y = [exp(-(((i-50)/100)^2 + ((j-50)/100)^2)/.01) for i = 1:100, j=1:100]

    # Normalization is not strictly necessary, but good practice
    normalize!(x)
    normalize!(y)

    # full convolution, output will be the size of x + y
    full_linear_output = convolve_linear(x, y, size(x) .+ size(y))

    # simple boundaries
    simple_linear_output = convolve_linear(x, y, size(x))

    # outputting convolutions to different files for plotting in external code
    writedlm("full_linear.dat", full_linear_output)
    writedlm("simple_linear.dat", simple_linear_output)

    # creating simple circle and 2 different Gaussian kernels
    circle = create_circle(50,2,0.5)

    normalize!(circle)

    small_kernel = create_gaussian_kernel(3)
    large_kernel = create_gaussian_kernel(25)

    small_kernel_output = convolve_linear(circle, small_kernel,
                                          size(circle).+size(small_kernel))
    large_kernel_output = convolve_linear(circle, large_kernel,
                                          size(circle).+size(large_kernel))

    writedlm("small_kernel.dat", small_kernel_output)
    writedlm("large_kernel.dat", large_kernel_output)

    # Using the circle for Sobel operations as well
    sobel_output = compute_sobel(circle)

    writedlm("sobel_output.dat", sobel_output)

end

import numpy as np
from contextlib import suppress


def convolve_linear(signal, filter, output_size):
    out = np.zeros(output_size)
    sum = 0

    for i in range(output_size[0]):
        for j in range(output_size[1]):
            for k in range(max(0, i-filter.shape[0]), i+1):
                for l in range(max(0, j-filter.shape[1]), j+1):
                    with suppress(IndexError):
                        sum += signal[k, l] * filter[i-k, j-l]
            out[i, j] = sum
            sum = 0

    return out


def create_gaussian_kernel(kernel_size):
    kernel = np.zeros((kernel_size, kernel_size))

    # The center must be offset by 0.5 to find the correct index
    center = kernel_size*0.5 + 0.5

    sigma = np.sqrt(0.1*kernel_size)

    def kernel_function(x, y):
        return np.exp(-((x-center+1)**2 + (y-center+1)**2)/(2*sigma**2))

    kernel = np.fromfunction(kernel_function, (kernel_size, kernel_size))
    return kernel / np.linalg.norm(kernel)


def create_sobel_operators():
    Sx = np.dot([[1.0], [2.0], [1.0]], [[-1.0, 0.0, 1.0]]) / 9
    Sy = np.dot([[-1.0], [0.0], [1.0]], [[1.0, 2.0, 1.0]]) / 9

    return Sx, Sy

def sum_matrix_dimensions(mat1, mat2):
    return (mat1.shape[0] + mat2.shape[0], 
            mat1.shape[1] + mat2.shape[1])

def compute_sobel(signal):
    Sx, Sy = create_sobel_operators()

    Gx = convolve_linear(signal, Sx, sum_matrix_dimensions(signal, Sx))
    Gy = convolve_linear(signal, Sy, sum_matrix_dimensions(signal, Sy))

    return np.sqrt(np.power(Gx, 2) + np.power(Gy, 2))


def create_circle(image_resolution, grid_extents, radius):
    out = np.zeros((image_resolution, image_resolution))

    for i in range(image_resolution):
        x_position = ((i * grid_extents / image_resolution)
                      - 0.5 * grid_extents)
        for j in range(image_resolution):
            y_position = ((j * grid_extents / image_resolution)
                          - 0.5 * grid_extents)
            if x_position ** 2 + y_position ** 2 <= radius ** 2:
                out[i, j] = 1.0

    return out


def main():

    # Random distribution in x
    x = np.random.rand(100, 100)

    # Gaussian signals
    def create_gaussian_signals(i, j):
        return np.exp(-(((i-50)/100) ** 2 +
                        ((j-50)/100) ** 2) / .01)
    y = np.fromfunction(create_gaussian_signals, (100, 100))

    # Normalization is not strictly necessary, but good practice
    x /= np.linalg.norm(x)
    y /= np.linalg.norm(y)

    # full convolution, output will be the size of x + y
    full_linear_output = convolve_linear(x, y, sum_matrix_dimensions(x, y))

    # simple boundaries
    simple_linear_output = convolve_linear(x, y, x.shape)

    np.savetxt("full_linear.dat", full_linear_output)
    np.savetxt("simple_linear.dat", simple_linear_output)

    # creating simple circle and 2 different Gaussian kernels
    circle = create_circle(50, 2, 0.5)

    circle = circle / np.linalg.norm(circle)

    small_kernel = create_gaussian_kernel(3)
    large_kernel = create_gaussian_kernel(25)

    small_kernel_output = convolve_linear(circle, small_kernel,
                                          sum_matrix_dimensions(circle,
                                                                small_kernel))

    large_kernel_output = convolve_linear(circle, large_kernel,
                                          sum_matrix_dimensions(circle,
                                                                large_kernel))

    np.savetxt("small_kernel.dat", small_kernel_output)
    np.savetxt("large_kernel.dat", large_kernel_output)

    circle = create_circle(50, 2, 0.5)

    # Normalization
    circle = circle / np.linalg.norm(circle)

    # using the circle for sobel operations as well
    sobel_output = compute_sobel(circle)

    np.savetxt("sobel_output.dat", sobel_output)

Bibliography

Canny, John, A computational approach to edge detection, Ieee, 1986.

License

Code Examples

The code examples are licensed under the MIT license (found in LICENSE.md).

Images/Graphics

The image "8bit Heart" was created by James Schloss and is licensed under the Creative Commons Attribution-ShareAlike 4.0 International License.
The image "Circle Blur" was created by James Schloss and is licensed under the Creative Commons Attribution-ShareAlike 4.0 International License.
The image "Sobel Filters" was created by James Schloss and is licensed under the Creative Commons Attribution-ShareAlike 4.0 International License.
The video "2D Convolution" was created by James Schloss and Grant Sanderson and is licensed under the Creative Commons Attribution-ShareAlike 4.0 International License.

Text

The text of this chapter was written by James Schloss and is licensed under the Creative Commons Attribution-ShareAlike 4.0 International License.

Pull Requests

After initial licensing (#560), the following pull requests have modified the text or graphics of this chapter:

none

Convolutions of Images (2D)