Teach you to apply linear filters on real image data

Release time: 2020-04-28

Visualizing the learned features and their changes over time can provide some effective information about how the network learns. In fact, the network structure is much more than just a few layers of networks, and the large number of convolution kernels make it difficult to visually interpret and analyze the learned features.

However, we can demonstrate through controlled experiments how the weights of the convolution kernels evolve in real time as the network learns. Since the characteristics that the network should learn are already known in advance, that is, the processes and parameters that generate the data are fully defined and fully under our control, the learning task can be easily determined. We can do this by building a very simple single-layer convolutional network and training it to use multiple kernels for linear filtering.

In the next experiment, we apply Sobel edge filtering, a traditional edge detection method commonly used in image processing and computer vision, on the dataset, and train our model to perform a similar linear mapping. We also try to use a larger kernel than the Sobel filter to learn some of the more general and arbitrary forms of the filter.

These give us a sense of how convolutional layers in neural networks operate on input data, how the weights of convolutional kernels change during training, and how neural network training is considered a minimization problem.

First, we must process the image data X with a linear filter to obtain the filtered result Y of the original image. Linear filter operations can be summarized in the following form:

Linear filters have well-defined operations for any set of parameters (convolution kernels) or input data we can think.

We can now construct a single-layer, single-core convolutional neural network with approximate linear filtering. The computations that occur in both approaches, linear filters and convolutional neural networks, are exactly the same except for the convolution kernel parameters that we have to learn from the data.