An Achromatic Approach to Compressing CNN Filters Using Pattern-specific Receptive Fields
Team Member: Guanzhong Chen, Shiyu Liu, Guansu(Frances) Niu, Cangcheng Tang, Zhi Wang
GitHub Repo: https://github.com/tangcc35/Canned_Pineapple
Screencast: https://youtu.be/rAFYVpmmK1Q
Final Blog Post: bit.ly/CNN_Compression
Introduction
In studies of image recognition, there are many gray-scale pictures, such as chest radiographs. Currently, the idea of training those images is to apply models that are essentially designed for training color pictures, such as DenseNet. This can cause many redundant parameters during the process. Therefore, this project aimed to discover a methodology to modify the models trained on colored images and to apply them to gray-scale images.
import tensorflow as tfimport kerasfrom keras.datasets import cifar10from keras.models import Model, Sequentialfrom keras.layers import Dense, Dropout, Flatten, Input, AveragePooling2D, merge, Activation, SpatialDropout2Dfrom keras.layers import Conv2D, MaxPooling2D, BatchNormalization, SeparableConv2Dfrom keras.layers import Concatenatefrom keras.optimizers import Adam, RMSprop, SGDfrom keras import regularizersfrom keras import backend as Kimport numpy as npimport pandas as pdimport matplotlib.pyplot as pltimport copyfrom sklearn.preprocessing import MinMaxScalerfrom skimage.metrics import structural_similarity as ssimfrom sklearn.cluster import DBSCAN, SpectralClustering, KMeansfrom sklearn.metrics import mean_squared_errorfrom scipy import spatialimport reimport mathmmscaler = MinMaxScaler()In order to change the architecture of the DenseNet model to fit gray-scale images, this project planned to visualize the filters in each conv layer. Therefore, the original model and dataset were loaded here at the beginning.
1.Load Original DenseNet Model
# Hyperparametersbatch_size = 128num_classes = 10epochs = 100l = 12num_filter = 36 #added 24 more filterscompression = 0.5 dropout_rate = 0.2img_height, img_width, channel = 32, 32, 3# Dense Block# removed the dropoutdef add_denseblock(input, num_filter = 12, dropout_rate = 0.2): global compression temp = input for _ in range(l): BatchNorm = BatchNormalization()(temp) relu = Activation('relu')(BatchNorm) Conv2D_3_3 = Conv2D(int(num_filter*compression), (3,3), use_bias=False ,padding='same')(relu) #if dropout_rate>0: # Conv2D_3_3 = Dropout2D(dropout_rate)(Conv2D_3_3) concat = Concatenate(axis=-1)([temp,Conv2D_3_3]) temp = concat return tempdef add_transition(input, num_filter = 12, dropout_rate = 0.2): global compression BatchNorm = BatchNormalization()(input) relu = Activation('relu')(BatchNorm) Conv2D_BottleNeck = Conv2D(int(num_filter*compression), (1,1), use_bias=False, kernel_regularizer = regularizers.l1() ,padding='same')(relu) #if dropout_rate>0: #Conv2D_BottleNeck = Dropout2D(dropout_rate)(Conv2D_BottleNeck) avg = AveragePooling2D(pool_size=(2,2))(Conv2D_BottleNeck) return avg# converted the last Dense Layer to a Fully Convolution N/w as use of Dense Layer was prohibiteddef output_layer(input): global compression BatchNorm = BatchNormalization()(input) relu = Activation('relu')(BatchNorm) AvgPooling = AveragePooling2D(pool_size=(2,2))(relu) temp = Conv2D(num_classes, kernel_size = (2,2))(AvgPooling) output = Activation('softmax')(temp) flat = Flatten()(output) return flatnum_filter = 36dropout_rate = 0.2l= 12input = Input(shape=(img_height, img_width, channel,))First_Conv2D = Conv2D(num_filter, (3,3), use_bias=False ,padding='same')(input)First_Block = add_denseblock(First_Conv2D, num_filter, dropout_rate)First_Transition = add_transition(First_Block, num_filter, dropout_rate)Second_Block = add_denseblock(First_Transition, num_filter, dropout_rate)Second_Transition = add_transition(Second_Block, num_filter, dropout_rate)Third_Block = add_denseblock(Second_Transition, num_filter, dropout_rate)Third_Transition = add_transition(Third_Block, num_filter, dropout_rate)Last_Block = add_denseblock(Third_Transition, num_filter, dropout_rate)output = output_layer(Last_Block)model = Model(inputs=[input], outputs=[output])model.summary()model.load_weights(path + '/190epochs.h5')# determine Loss function and Optimizermodel.compile(loss='categorical_crossentropy', optimizer=SGD(0.01, momentum = 0.7), metrics=['accuracy'])2.Load Image
# Load CIFAR10 Dataimport numpy as np(x_train, y_train), (x_test, y_test) = cifar10.load_data()x_train = tf.image.rgb_to_grayscale(x_train, name=None)x_train = tf.broadcast_to(x_train, [50000, 32, 32, 3])x_test = tf.image.rgb_to_grayscale(x_test, name=None)x_test = tf.broadcast_to(x_test, [10000, 32, 32, 3])x_train = x_train.numpy()x_test = x_test.numpy()img_height, img_width, channel = x_train.shape[1],x_train.shape[2],x_train.shape[3]# convert to one hot encoing y_train = keras.utils.to_categorical(y_train, num_classes)y_test = keras.utils.to_categorical(y_test, num_classes)img_tensor = x_train[0]img_tensor = np.expand_dims(img_tensor, axis=0)plt.imshow(img_tensor[0])3.Reduce Number of Filters
This section shows the method of reducing the number of filters in the conv layer of DenseNet model.
3.1 Visualize filters
The first step is to visualize the filters in each conv layer. The filters were represented by their visualization using gradient ascent in input space. Such visualization shows the image that can light up each filter the most.
This notebook only shows an example of the process in the first conv layer of the model. In the actual project, the team reduced the filters in the first two conv layers, because many of their filters show similarity in grayscale. The deeper the model is, the fewer the number of filters look similar.
The first conv layer was modified to only take a single gray-scale channel as input.
def deprocess_image_grayscale_for_plot(x): # normalize tensor: center on 0., ensure std is 0.1 x -= x.mean() x /= (x.std() + 1e-5) x *= 0.1 # clip to [0, 1] x += 0.5 x = np.clip(x, 0, 1) # convert to RGB array x *= 255 x = np.clip(x, 0, 255).astype('uint8') # convert to grayscale by averaging across 3 channels # x = np.mean(x, axis=2) x = np.dot(x[...,:3], [0.2989, 0.5870, 0.1140]) return xdef deprocess_image_grayscale(x): # normalize tensor: center on 0., ensure std is 0.1 x -= x.mean() x /= (x.std() + 1e-5) x *= 0.1 # clip to [0, 1] x += 0.5 x = np.clip(x, 0, 1) # convert to grayscale by averaging across 3 channels # x = np.mean(x, axis=2) x = np.dot(x[...,:3], [0.2989, 0.5870, 0.1140]) return xdef deprocess_image_color(x): # normalize tensor: center on 0., ensure std is 0.1 x -= x.mean() x /= (x.std() + 1e-5) x *= 0.1 # clip to [0, 1] x += 0.5 x = np.clip(x, 0, 1) # convert to RGB array x *= 255 x = np.clip(x, 0, 255).astype('uint8') return xdef generate_pattern(model, layer_name, filter_index, size=32, grayscale=True, for_plot=False): # Build a loss function that maximizes the activation # of the nth filter of the layer considered. layer_output = model.get_layer(layer_name).output loss = K.mean(layer_output[:, :, :, filter_index]) # Compute the gradient of the input picture wrt this loss grads = K.gradients(loss, model.input)[0] # Normalization trick: we normalize the gradient grads /= (K.sqrt(K.mean(K.square(grads))) + 1e-5) # This function returns the loss and grads given the input picture iterate = K.function([model.input], [loss, grads]) # We start from a gray image with some noise input_img_data = np.random.random((1, size, size, 3)) * 20 + 128. # Run gradient ascent for 40 steps step = 0.5 for i in range(100): loss_value, grads_value = iterate([input_img_data]) input_img_data += grads_value * step img = input_img_data[0] if grayscale: if not for_plot: return deprocess_image_grayscale(img) else: return deprocess_image_grayscale_for_plot(img) if not grayscale: return deprocess_image_color(img)def plot_filters(model_name, layer_list, img_size, mar, row = 3, col = 6, greyscale_flag = True, plot_flag = True, col_customized=None): """Plot all filters in a layer""" for layer_name in layer_list: size = img_size margin = mar for_trans = [] if not greyscale_flag: results = np.zeros((row * size + 7 * margin, col * size + 7 * margin, 3)) else: # This a empty (black) image where we will store our results. results = np.zeros((row * size + 7 * margin, col * size + 7 * margin)) for i in range(row): # iterate over the rows of our results grid for j in range(col): # iterate over the columns of our results grid # Generate the pattern for filter `i + (j * 6)` in `layer_name` filter_img = generate_pattern(model_name, layer_name, j + (i * row), size=size, grayscale=greyscale_flag, for_plot=plot_flag) # Put the result in the square `(i, j)` of the results grid horizontal_start = i * size + i * margin horizontal_end = horizontal_start + size vertical_start = j * size + j * margin vertical_end = vertical_start + size if not greyscale_flag: results[horizontal_start: horizontal_end, vertical_start: vertical_end, :] = filter_img/255 else: results[horizontal_start: horizontal_end, vertical_start: vertical_end] = filter_img / 255 #print(np.amin(filter_img[:]), np.amax(filter_img[:])) print('Finished filter no.', j + (i * row), "in", layer_name, end= '. ') # Display the results grid print(layer_name) plt.figure(figsize=(15, 15)) if col_customized: plt.imshow(results, cmap=col_customized) else: plt.imshow(results) plt.show()3.1.1 All Filters in Color
The figure below shows the filters in the first conv layer. Each square represents a specific filter.
plot_filters(model, ["conv2d_1"], 128, 5, row = 6, col = 6, greyscale_flag = False, plot_flag = True)3.1.2 All Filters in Grayscale
Since the input images are gray-scale images, these representations were converted to grayscale as well by adding linear weights to the RGB channel values [1]. This is the same method as changing color images to gray-scale images used in image preprocessing. The processed images can then represent what each filter is looking for in a gray-scale feature space. Figure below shows the gray-scale filters after the transformation of the figure above. Again, each square represents a specific filter. To increase the contrast for better visualization, the gray-scale filters were represented by blue and yellow.
plot_filters(model, ["conv2d_1"], 128, 5, row = 6, col = 6, greyscale_flag = True, plot_flag = True)3.2 Cluster Filters
To delete filters in charge of color information and reduce the number of filters focusing on similar patterns, we need to cluster them into different groups. First of all, to represent our filters, we continue to use the input that maximizes the response for each of them. In such a way the filters can be represented as 32 by 32 matrices, and further regarded as one observation after flattening it to one dimension. Then we need to calculate the distance between each pair of these images as a measurement of their similarity.
For the first convolutional layer completely focusing on brightness, we used the MSE method, also known as Euclidean distance, to calculate the distance. This is because we are estimating the relative overall brightness difference. For the following layers that focused more on image patterns, we tried Structural Similarity Index and Image Euclidean Distance. These two distances consider the relative pixel position when calculating the value difference between different pixels. Considering pixel positions can alleviate the problem caused by two images having similar patterns but such patterns shifting within a range of pixels.
DBSCAN [2] was used as the clustering method. Previous studies used k-means clustering to cluster the CNN kernels [3]. The team found that k-means always results in evenly distributed kernels across different clusters, being not suitable for this case with only 18 filters for the convolutional layers. Two ways were applied to measure the distances between every two filters in matrix forms for clustering. MSE [4] was applied in the first layer, and IMED [5] was used in the second layer. A threshold was set to select filters that can be grouped together. We were meant to keep slightly more than 50% of the filters for each layer.
# Get filter information to an arraydef get_filter_array(model_name, layer_name, filter_num): filter_collection = [] for i in range( filter_num): filter_img_temp = generate_pattern(model_name, layer_name, i, size=32, grayscale=True, for_plot=False) filter_collection.append(filter_img_temp) filter_collection = np.array(filter_collection) return filter_collectiondef get_G(img_h, r): '''calculate Pixel Distance for IMED''' G_matrix = np.zeros((img_h**2, img_h**2)) for i1 in range(img_h): for i2 in range(img_h): for j1 in range(img_h): for j2 in range(img_h): pixel_dist = 1/(2*math.pi*(r**2)) * math.exp(-((i1-i2)**2 + (j1-j2)**2)/(2*r**2)) G_matrix[img_h*i1+j1, img_h*i2+j2] = pixel_dist return G_matrixG = get_G(32, 1)def get_imed(img1, img2): '''Calculate Image distance''' img1 = img1.flatten() img2 = img2.flatten() distance = math.sqrt(np.dot(np.dot(np.transpose(img1-img2), G), (img1-img2))) return distanceMSE, SSIM, cosine similarity, brightness and IMED are all methodologies to measure the distances between every two matrices. After trying them, the team found that MSE worked the best in the first conv layer which can be seen as a color palette. IMED gave the best result in the second layer which still contained most color information.
def calculate_dist(filter_collection, method="mse"): filter_num = filter_collection.shape[0] print("Calculated Distance Matrix Using", method, "Method.") print("Filter number:", filter_num) if method == "ssim": # calculate distance matrix using ssim filter_distance_matrix_ssim = np.zeros((filter_num, filter_num)) for i in range(filter_num): for j in range(filter_num): filter_distance_matrix_ssim[i, j] = ssim(filter_collection[i], filter_collection[j]) filter_distance_matrix_ssim = 1 - filter_distance_matrix_ssim return filter_distance_matrix_ssim if method == "mse": # calculate distance matrix using mse filter_distance_matrix_mse = np.zeros((filter_num, filter_num)) for i in range(filter_num): for j in range(filter_num): filter_distance_matrix_mse[i, j] = mean_squared_error(filter_collection[i].flatten(), filter_collection[j].flatten()) return filter_distance_matrix_mse if method == "consine": # calculate distance matrix using cosine filter_distance_matrix_cosine = np.zeros((filter_num, filter_num)) for i in range(filter_num): for j in range(filter_num): filter_distance_matrix_cosine[i, j] = \ 1 - spatial.distance.cosine(filter_collection[i].flatten(), filter_collection[j].flatten()) filter_distance_matrix_cosine = 1 - filter_distance_matrix_cosine return filter_distance_matrix_cosine if method == "brightness": # calculate distance matrix using brightness filter_distance_matrix_brightness = np.zeros((filter_num, filter_num)) for i in range(filter_num): for j in range(filter_num): filter_distance_matrix_brightness[i, j] = \ np.sum(filter_collection[i]) - np.sum(filter_collection[j]) filter_distance_matrix_brightness = mmscaler.fit_transform(filter_distance_matrix_brightness.flatten().reshape(-1, 1)).reshape((filter_num, filter_num)) return filter_distance_matrix_brightness if method == "imed": filter_distance_matrix_imed = np.zeros((filter_num, filter_num)) for i in range(filter_num): for j in range(filter_num): filter_distance_matrix_imed[i, j] = get_imed(filter_collection[i].flatten(), filter_collection[j].flatten()) return filter_distance_matrix_imeddef get_cluster_(distance_mat, weight, row, col, min_samp=1): h = len(distance_mat) distance_mat = mmscaler.fit_transform(distance_mat.flatten().reshape(-1,1)).reshape(h, h) std = np.std([distance_mat[i, j] for i in range(h) for j in range(h) if i != j]) clustering = DBSCAN(eps=weight, min_samples=min_samp, metric='precomputed').fit(distance_mat) filter_clusters = clustering.labels_ return filter_clustersconv2d_1 = get_filter_array(model, 'conv2d_1', 36)conv2d_1_mse = calculate_dist(conv2d_1, method="mse")In the first conv layer, there are 36 filters in total, and they were clustered into 20 groups.
filter_clusters = get_cluster_(conv2d_1_mse, weight=0.027, row=6, col=6, min_samp=1)print('Number of clusters:', max(filter_clusters) + 1)filter_clusters.reshape(6, 6)3.3 Merge Filters
Every filter is a three-dimension tensor. As for the convolutional layers that need to be shrunk, the filters were simply averaged within each cluster. However, doing this also reduced the number of output channels and caused shape mismatch. Therefore, for the following convolutional layer, the weights in each filter were added across the third dimension according to how we clustered the previous layer.
def merge_filters(original_weights, cluster_res, cluster_res_prev=np.array([0, 0, 0]), layer_type='conv'): """ Merge original weights, based on clustering results and number of in_channels """ # clean clustering result for outliers, record them as a new cluster max_cluster = max(cluster_res) for i, cls in enumerate(cluster_res): if cls == -1: max_cluster += 1 cluster_res[i] = max_cluster # clean prev clustering result for outliers, record them as a new cluster max_cluster_prev = max(cluster_res_prev) for i, cls in enumerate(cluster_res_prev): if cls == -1: max_cluster_prev += 1 cluster_res_prev[i] = max_cluster_prev if layer_type == 'conv': # average over the 4th dimension # zero tensor to record weights clustered_weights = np.zeros(list(original_weights.shape[:3]) + [max_cluster + 1]) # load new filters with averaged weights for filter_idx in range(max_cluster + 1): # get index for the filters belong to that cluster idx = tf.constant([i for i in range(len(cluster_res)) if cluster_res[i] == filter_idx]) # reduce average over the filters clustered_filter = tf.reduce_mean(tf.gather(original_weights, idx, axis=3), axis=3) clustered_weights[:, :, :, filter_idx] = clustered_filter # sum over the 3rd dimension according to previous cluster results # zero tensor to record weights reduced_clustered_weights = np.zeros(list(original_weights.shape[:2]) + [max_cluster_prev + 1] + [max_cluster + 1]) # load new filters with sumed weights for filter_idx in range(max_cluster_prev + 1): # get index for the filters belong to that cluster idx = tf.constant([i for i in range(len(cluster_res_prev)) if cluster_res_prev[i] == filter_idx]) # reduce average over the filters sumed_filter = tf.reduce_sum(tf.gather(clustered_weights, idx, axis=2), axis=2) reduced_clustered_weights[:, :, filter_idx, :] = sumed_filter return reduced_clustered_weights if layer_type == 'bn': # average the weights across channels # list to record weights clustered_weights = [np.zeros([max_cluster + 1]), np.zeros([max_cluster + 1]), np.zeros([max_cluster + 1]), np.zeros([max_cluster + 1])] # load new filters with averaged weights for bn_idx, bn_weights in enumerate(clustered_weights): for filter_idx in range(max_cluster + 1): # get index for the filters belong to that cluster idx = tf.constant([i for i in range(len(cluster_res)) if cluster_res[i] == filter_idx]) # reduce average over the filters clustered_filter = tf.reduce_mean(tf.gather(original_weights[bn_idx], idx, axis=0), axis=0) bn_weights[filter_idx] = clustered_filter return clustered_weightsThe figure below is an example showing the pipeline of merging the filters of the first convolutional layer in a color model.

Here we are merging the first convolutional layer according to our clustering result filter_clusters. The filters were first averaged within each cluster, then reduce summed in their third dimension to match the single-channel input. This gives us new weights test_conv_1 for the first convolutional layer.
test_conv_1 = merge_filters(model.get_layer('conv2d_1').weights[0], filter_clusters, np.array([0, 0, 0]), layer_type='conv')print('Shape of clustered 1st conv layer:', test_conv_1.shape)Then it's the batch-normal layer, we reduce-average its parameters according to the clustering results of layer 1: test_conv_1, so that the dimensions all match.
test_bn_1 = merge_filters(model.get_layer('batch_normalization_1').weights, filter_clusters, layer_type='bn')print('Shape of clustered 1st bn layer:', test_bn_1[0].shape)Now we reduce -sum the second conv layer's height according to how we clusterd the conv first layer.
test_conv_2 = merge_filters(model.get_layer('conv2d_2').weights[0], np.arange(18), filter_clusters, layer_type='conv')print('Shape of 2nd conv layer:', test_conv_2.shape)3.4 Load Weights
This section loads the preseved weights from the original model and the adjusted weights from the clustering step. These weights are used together for building the new model and comparing results with the old model.
num_filter = 36dropout_rate = 0.2l= 12input = Input(shape=(img_height, img_width, 1,))First_Conv2D = Conv2D(test_conv_1.shape[-1], (3,3), use_bias=False ,padding='same')(input)First_Block = add_denseblock(First_Conv2D, num_filter, dropout_rate)First_Transition = add_transition(First_Block, num_filter, dropout_rate)Second_Block = add_denseblock(First_Transition, num_filter, dropout_rate)Second_Transition = add_transition(Second_Block, num_filter, dropout_rate)Third_Block = add_denseblock(Second_Transition, num_filter, dropout_rate)Third_Transition = add_transition(Third_Block, num_filter, dropout_rate)Last_Block = add_denseblock(Third_Transition, num_filter, dropout_rate)output = output_layer(Last_Block)model_clustered = Model(inputs=[input], outputs=[output])model_clustered.summary()# load previous weightslayer_weight_dic = {}for layer_idx in range(len(model.layers)): layer_name_clustered = model_clustered.layers[layer_idx].name layer_weights_prev = model.layers[layer_idx].get_weights() layer_weight_dic[layer_name_clustered] = layer_weights_prevBecause the conv layers are densely connected in each dense block, we use a loop to change all the downstream layers that got affected.
# update to new weightslayer_weight_dic['conv2d_55'] = [test_conv_1]layer_weight_dic['conv2d_56'] = [test_conv_2]layer_weight_dic['batch_normalization_53'] = test_bn_1bn_count = 54conv_count = 57num_clusters = max(filter_clusters) + 1for i in range(12): # get new height for conv and bn ttl_dim = model_clustered.get_layer(f'conv2d_{conv_count}').weights[0].shape[2] # update conv dimension tmp_wt_conv = merge_filters(model.get_layer(f'conv2d_{conv_count - 54}').weights[0], np.arange(18), np.concatenate((filter_clusters, np.arange(num_clusters, ttl_dim))), layer_type='conv') layer_weight_dic[f'conv2d_{conv_count}'] = [tmp_wt_conv] # update bn dimension tmp_wt_bn = merge_filters(model.get_layer(f'batch_normalization_{bn_count - 52}').weights, np.concatenate((filter_clusters, np.arange(num_clusters, ttl_dim))), layer_type='bn') layer_weight_dic[f'batch_normalization_{bn_count}'] = tmp_wt_bn bn_count += 1 conv_count += 1Now we can load our updated weights with the correct dimensions.
# load new weights to modelfor layer in model_clustered.layers: layer.set_weights(layer_weight_dic[layer.name])And compile the clustered model.
# determine Loss function and Optimizer for the new modelmodel_clustered.compile(loss='categorical_crossentropy', optimizer=SGD(0.01, momentum = 0.7), metrics=['accuracy'])4.Testing
4.1 Raw Results
The original DenseNet model gives an accuracy of 83%, and the initial testing result is 73%. This shows a significant loss of information during the process. The pipeline above preserved most of the image pattern information, but there is another important layer in DenseNet - batch normalization. These layers can be viewed as brightness and contrast adjustment. However, these features are not represented in the gradient ascend results, and they are hard to cluster. Therefore, these parameters were re-estimated by only training the batch normalization layer for one epoch, just to let it know the proper contrast and brightness.
4.1.1 Original Model
_, test_accuracy = model.evaluate(x_test, y_test, verbose=0)print('Original acc:', test_accuracy)4.1.2 Clustered Model
_, test_accuracy_clustered = model_clustered.evaluate( np.expand_dims(x_test[:, :, :, 0], axis=3), y_test, verbose=0)print('Clustered acc:', test_accuracy_clustered)4.2 Only Train BN Layers
for layer in model_clustered.layers: if 'batch_normalization' not in layer.name: layer.trainable = False# determine Loss function and Optimizer for the new modelmodel_clustered.compile(loss='categorical_crossentropy', optimizer=SGD(0.01, momentum = 0.7), metrics=['accuracy'])model_clustered.fit(np.expand_dims(x_train[:, :, :, 0], axis=3), y_train, epochs = 1, verbose=0, validation_data = (np.expand_dims(x_test[:, :, :, 0], axis=3), y_test))_, test_accuracy_clustered = model_clustered.evaluate( np.expand_dims(x_test[:, :, :, 0], axis=3), y_test, verbose=0)print('Clustered acc after bn adjustment:', test_accuracy_clustered)0.85 is the final accuracy of the new gray-scale model, which is slightly higher than the result given by baseline model.
Shortcomings
Depending on the method to merge filters, the information loss during the process can be significant.
Gradient ascent gives slightly different results every time, so uncertainty exists in the process.
Potential Improvements
In order to reduce information loss, the method can be improved in terms of tuning the threshold value and changing the way of distance measurement and filter clustering.
In order to minimize uncertainty due to gradient ascent, average output can be taken over multiple ascent results.
Work Cited
[1] Luma (video). (2019, July 3). Retrieved from https://en.wikipedia.org/wiki/Luma_(video)
[2] 2.3. Clustering. (n.d.). Retrieved from https://scikit-learn.org/stable/modules/clustering.html#dbscan
[3] Son, S., Nah, S., & Lee, K. M. (2018). Clustering Convolutional Kernels to Compress Deep Neural Networks. Computer Vision – ECCV 2018 Lecture Notes in Computer Science, 225–240. doi: 10.1007/978-3-030-01237-3_14
[4] Llvll. (2016, January 19). llvll/imgcluster. Retrieved from https://github.com/llvll/imgcluster
[5] Wang, L., Zhang, Y., & Feng, J. (2005). On the Euclidean distance of images. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(8), 1334–1339. doi: 10.1109/tpami.2005.165