Convolutions on Images
For this section, we will no longer be focusing on signals, but instead images (arrays filled with elements of red, green, and blue values). That said, for the code examples, greyscale images may be used such that each array element is composed of some floating-point value instead of color. In addition, we will not be discussing boundary conditions too much in this chapter and will instead be using the simple boundaries introduced in the section on one-dimensional convolutions.
The extension of one-dimensional convolutions to two dimensions requires a little thought about indexing and the like, but is ultimately the same operation. Here is an animation of a convolution for a two-dimensional image:
In this case, we convolved the image with a 3x3 square filter, all filled with values of . This created a simple blurring effect, which is somewhat expected from the discussion in the previous section. In code, a two-dimensional convolution might look like this:
function convolve_linear(signal::Array{T, 2}, filter::Array{T, 2},
output_size) where {T <: Number}
# convolutional output
out = Array{Float64,2}(undef, output_size)
sum = 0
for i = 1:output_size[1]
for j = 1:output_size[2]
for k = max(1, i-size(filter)[1]):i
for l = max(1, j-size(filter)[2]):j
if k <= size(signal)[1] && i-k+1 <= size(filter)[1] &&
l <= size(signal)[2] && j-l+1 <= size(filter)[2]
sum += signal[k,l] * filter[i-k+1, j-l+1]
end
end
end
out[i,j] = sum
sum = 0
end
end
return out
end
def convolve_linear(signal, filter, output_size):
out = np.zeros(output_size)
sum = 0
for i in range(output_size[0]):
for j in range(output_size[1]):
for k in range(max(0, i-filter.shape[0]), i+1):
for l in range(max(0, j-filter.shape[1]), j+1):
with suppress(IndexError):
sum += signal[k, l] * filter[i-k, j-l]
out[i, j] = sum
sum = 0
return out
This is very similar to what we have shown in previous sections; however, it essentially requires four iterable dimensions because we need to iterate through each axis of the output domain and the filter.
At this stage, it is worth highlighting common filters used for convolutions of images. In particular, we will further discuss the Gaussian filter introduced in the previous section, and then introduce another set of kernels known as Sobel operators, which are used for naïve edge detection or image derivatives.
The Gaussian kernel
The Gaussian kernel serves as an effective blurring operation for images. As a reminder, the formula for any Gaussian distribution is
where is the standard deviation and is a measure of the width of the Gaussian. A larger means a larger Gaussian; however, remember that the Gaussian must fit onto the filter, otherwise it will be cut off! For example, if you are using a filter, you should not be using . Some definitions of allow users to have a separate deviation in and to create an ellipsoid Gaussian, but for the purposes of this chapter, we will assume . As a general rule of thumb, the larger the filter and standard deviation, the more "smeared" the final convolution will be.
At this stage, it is important to write some code, so we will generate a simple function that returns a Gaussian kernel with a specified standard deviation and filter size.
function create_gaussian_kernel(kernel_size)
kernel = zeros(kernel_size, kernel_size)
# The center must be offset by 0.5 to find the correct index
center = kernel_size * 0.5 + 0.5
sigma = sqrt(0.1*kernel_size)
for i = 1:kernel_size
for j = 1:kernel_size
kernel[i,j] = exp(-((i-center)^2 + (j-center)^2) / (2*sigma^2))
end
end
return normalize(kernel)
end
def create_gaussian_kernel(kernel_size):
kernel = np.zeros((kernel_size, kernel_size))
# The center must be offset by 0.5 to find the correct index
center = kernel_size*0.5 + 0.5
sigma = np.sqrt(0.1*kernel_size)
def kernel_function(x, y):
return np.exp(-((x-center+1)**2 + (y-center+1)**2)/(2*sigma**2))
kernel = np.fromfunction(kernel_function, (kernel_size, kernel_size))
return kernel / np.linalg.norm(kernel)
Though it is entirely possible to create a Gaussian kernel whose standard deviation is independent on the kernel size, we have decided to enforce a relation between the two in this chapter. As always, we encourage you to play with the code and create your own Gaussian kernels any way you want! As a note, all the kernels will be scaled (normalized) at the end by the sum of all internal elements. This ensures that the output of the convolution will not have an obnoxious scale factor associated with it.
Below are a few images generated by applying a kernel generated with the code above to a black and white image of a circle.
In (a), we show the original image, which is just a white circle at the center of a grid. In (b), we show the image after convolution with a kernel. In (c), we show the image after convolution with a kernel. Here, we see that (c) is significantly fuzzier than (b), which is a direct consequence of the kernel size.
There is a lot more that we could talk about, but now is a good time to move on to a slightly more complicated convolutional method: the Sobel operator.
The Sobel operator
The Sobel operator effectively performs a gradient operation on an image by highlighting areas where a large change has been made. In essence, this means that this operation can be thought of as a naïve edge detector. Essentially, the -dimensional Sobel operator is composed of separate gradient convolutions (one for each dimension) that are then combined together into a final output array. Again, for the purposes of this chapter, we will stick to two dimensions, which will be composed of two separate gradients along the and directions. Each gradient will be created by convolving our image with their corresponding Sobel operator:
The gradients can then be found with a convolution, such that:
Here, is the input array or image. Finally, these gradients can be summed in quadrature to find the total Sobel operator or image gradient:
So let us now show what it does in practice:
In this diagram, we start with the circle image on the right, and then convolve it with the and operators to find the gradients along and before summing them in quadrature to get the final image gradient. Here, we see that the edges of our input image have been highlighted, showing outline of our circle. This is why the Sobel operator is also known as naïve edge detection and is an integral component to many more sophisticated edge detection methods like one proposed by Canny [1].
In code, the Sobel operator involves first finding the operators in and and then applying them with a traditional convolution:
function create_sobel_operators()
Sx = [1.0, 2.0, 1.0]*[-1.0 0.0 1.0] / 9
Sy = [-1.0, 0.0, 1.0]*[1.0 2.0 1.0] / 9
return Sx, Sy
end
function compute_sobel(signal)
Sx, Sy = create_sobel_operators()
Gx = convolve_linear(signal, Sx, size(signal) .+ size(Sx))
Gy = convolve_linear(signal, Sy, size(signal) .+ size(Sy))
return sqrt.(Gx.^2 .+ Gy.^2)
end
def create_sobel_operators():
Sx = np.dot([[1.0], [2.0], [1.0]], [[-1.0, 0.0, 1.0]]) / 9
Sy = np.dot([[-1.0], [0.0], [1.0]], [[1.0, 2.0, 1.0]]) / 9
return Sx, Sy
def sum_matrix_dimensions(mat1, mat2):
return (mat1.shape[0] + mat2.shape[0],
mat1.shape[1] + mat2.shape[1])
def compute_sobel(signal):
Sx, Sy = create_sobel_operators()
Gx = convolve_linear(signal, Sx, sum_matrix_dimensions(signal, Sx))
Gy = convolve_linear(signal, Sy, sum_matrix_dimensions(signal, Sy))
return np.sqrt(np.power(Gx, 2) + np.power(Gy, 2))
With that, I believe we are at a good place to stop discussions on two-dimensional convolutions. We will definitely return to this topic in the future as new algorithms require more information.
Example Code
For the code in this section, we have modified the visualizations from the one-dimensional convolution chapter to add a two-dimensional variant for blurring an image of random white noise. We have also added code to create the Gaussian kernel and Sobel operator and apply it to the circle, as shown in the text.
using DelimitedFiles
using LinearAlgebra
function convolve_linear(signal::Array{T, 2}, filter::Array{T, 2},
output_size) where {T <: Number}
# convolutional output
out = Array{Float64,2}(undef, output_size)
sum = 0
for i = 1:output_size[1]
for j = 1:output_size[2]
for k = max(1, i-size(filter)[1]):i
for l = max(1, j-size(filter)[2]):j
if k <= size(signal)[1] && i-k+1 <= size(filter)[1] &&
l <= size(signal)[2] && j-l+1 <= size(filter)[2]
sum += signal[k,l] * filter[i-k+1, j-l+1]
end
end
end
out[i,j] = sum
sum = 0
end
end
return out
end
function create_gaussian_kernel(kernel_size)
kernel = zeros(kernel_size, kernel_size)
# The center must be offset by 0.5 to find the correct index
center = kernel_size * 0.5 + 0.5
sigma = sqrt(0.1*kernel_size)
for i = 1:kernel_size
for j = 1:kernel_size
kernel[i,j] = exp(-((i-center)^2 + (j-center)^2) / (2*sigma^2))
end
end
return normalize(kernel)
end
function create_sobel_operators()
Sx = [1.0, 2.0, 1.0]*[-1.0 0.0 1.0] / 9
Sy = [-1.0, 0.0, 1.0]*[1.0 2.0 1.0] / 9
return Sx, Sy
end
function compute_sobel(signal)
Sx, Sy = create_sobel_operators()
Gx = convolve_linear(signal, Sx, size(signal) .+ size(Sx))
Gy = convolve_linear(signal, Sy, size(signal) .+ size(Sy))
return sqrt.(Gx.^2 .+ Gy.^2)
end
# Simple function to create a square grid with a circle embedded inside of it
function create_circle(image_resolution, grid_extents, radius)
out = zeros(image_resolution, image_resolution)
for i = 1:image_resolution
x_position = ((i-1)*grid_extents/image_resolution)-0.5*grid_extents
for j = 1:image_resolution
y_position = ((j-1)*grid_extents/image_resolution)-0.5*grid_extents
if x_position^2 + y_position^2 <= radius^2
out[i,j] = 1.0
end
end
end
return out
end
function main()
# Random distribution in x
x = rand(100, 100)
# Gaussian signals
y = [exp(-(((i-50)/100)^2 + ((j-50)/100)^2)/.01) for i = 1:100, j=1:100]
# Normalization is not strictly necessary, but good practice
normalize!(x)
normalize!(y)
# full convolution, output will be the size of x + y
full_linear_output = convolve_linear(x, y, size(x) .+ size(y))
# simple boundaries
simple_linear_output = convolve_linear(x, y, size(x))
# outputting convolutions to different files for plotting in external code
writedlm("full_linear.dat", full_linear_output)
writedlm("simple_linear.dat", simple_linear_output)
# creating simple circle and 2 different Gaussian kernels
circle = create_circle(50,2,0.5)
normalize!(circle)
small_kernel = create_gaussian_kernel(3)
large_kernel = create_gaussian_kernel(25)
small_kernel_output = convolve_linear(circle, small_kernel,
size(circle).+size(small_kernel))
large_kernel_output = convolve_linear(circle, large_kernel,
size(circle).+size(large_kernel))
writedlm("small_kernel.dat", small_kernel_output)
writedlm("large_kernel.dat", large_kernel_output)
# Using the circle for Sobel operations as well
sobel_output = compute_sobel(circle)
writedlm("sobel_output.dat", sobel_output)
end
import numpy as np
from contextlib import suppress
def convolve_linear(signal, filter, output_size):
out = np.zeros(output_size)
sum = 0
for i in range(output_size[0]):
for j in range(output_size[1]):
for k in range(max(0, i-filter.shape[0]), i+1):
for l in range(max(0, j-filter.shape[1]), j+1):
with suppress(IndexError):
sum += signal[k, l] * filter[i-k, j-l]
out[i, j] = sum
sum = 0
return out
def create_gaussian_kernel(kernel_size):
kernel = np.zeros((kernel_size, kernel_size))
# The center must be offset by 0.5 to find the correct index
center = kernel_size*0.5 + 0.5
sigma = np.sqrt(0.1*kernel_size)
def kernel_function(x, y):
return np.exp(-((x-center+1)**2 + (y-center+1)**2)/(2*sigma**2))
kernel = np.fromfunction(kernel_function, (kernel_size, kernel_size))
return kernel / np.linalg.norm(kernel)
def create_sobel_operators():
Sx = np.dot([[1.0], [2.0], [1.0]], [[-1.0, 0.0, 1.0]]) / 9
Sy = np.dot([[-1.0], [0.0], [1.0]], [[1.0, 2.0, 1.0]]) / 9
return Sx, Sy
def sum_matrix_dimensions(mat1, mat2):
return (mat1.shape[0] + mat2.shape[0],
mat1.shape[1] + mat2.shape[1])
def compute_sobel(signal):
Sx, Sy = create_sobel_operators()
Gx = convolve_linear(signal, Sx, sum_matrix_dimensions(signal, Sx))
Gy = convolve_linear(signal, Sy, sum_matrix_dimensions(signal, Sy))
return np.sqrt(np.power(Gx, 2) + np.power(Gy, 2))
def create_circle(image_resolution, grid_extents, radius):
out = np.zeros((image_resolution, image_resolution))
for i in range(image_resolution):
x_position = ((i * grid_extents / image_resolution)
- 0.5 * grid_extents)
for j in range(image_resolution):
y_position = ((j * grid_extents / image_resolution)
- 0.5 * grid_extents)
if x_position ** 2 + y_position ** 2 <= radius ** 2:
out[i, j] = 1.0
return out
def main():
# Random distribution in x
x = np.random.rand(100, 100)
# Gaussian signals
def create_gaussian_signals(i, j):
return np.exp(-(((i-50)/100) ** 2 +
((j-50)/100) ** 2) / .01)
y = np.fromfunction(create_gaussian_signals, (100, 100))
# Normalization is not strictly necessary, but good practice
x /= np.linalg.norm(x)
y /= np.linalg.norm(y)
# full convolution, output will be the size of x + y
full_linear_output = convolve_linear(x, y, sum_matrix_dimensions(x, y))
# simple boundaries
simple_linear_output = convolve_linear(x, y, x.shape)
np.savetxt("full_linear.dat", full_linear_output)
np.savetxt("simple_linear.dat", simple_linear_output)
# creating simple circle and 2 different Gaussian kernels
circle = create_circle(50, 2, 0.5)
circle = circle / np.linalg.norm(circle)
small_kernel = create_gaussian_kernel(3)
large_kernel = create_gaussian_kernel(25)
small_kernel_output = convolve_linear(circle, small_kernel,
sum_matrix_dimensions(circle,
small_kernel))
large_kernel_output = convolve_linear(circle, large_kernel,
sum_matrix_dimensions(circle,
large_kernel))
np.savetxt("small_kernel.dat", small_kernel_output)
np.savetxt("large_kernel.dat", large_kernel_output)
circle = create_circle(50, 2, 0.5)
# Normalization
circle = circle / np.linalg.norm(circle)
# using the circle for sobel operations as well
sobel_output = compute_sobel(circle)
np.savetxt("sobel_output.dat", sobel_output)
Bibliography
License
Code Examples
The code examples are licensed under the MIT license (found in LICENSE.md).
Images/Graphics
- The image "8bit Heart" was created by James Schloss and is licensed under the Creative Commons Attribution-ShareAlike 4.0 International License.
- The image "Circle Blur" was created by James Schloss and is licensed under the Creative Commons Attribution-ShareAlike 4.0 International License.
- The image "Sobel Filters" was created by James Schloss and is licensed under the Creative Commons Attribution-ShareAlike 4.0 International License.
- The video "2D Convolution" was created by James Schloss and Grant Sanderson and is licensed under the Creative Commons Attribution-ShareAlike 4.0 International License.
Text
The text of this chapter was written by James Schloss and is licensed under the Creative Commons Attribution-ShareAlike 4.0 International License.
Pull Requests
After initial licensing (#560), the following pull requests have modified the text or graphics of this chapter:
- none