深刻去理解卷积神经网络(CNN)工作原理

  • 时间:
  • 来源:互联网
  • 文章标签:

深刻去理解卷积神经网络(CNN)工作原理

  • 前言
  • 基础介绍以及数据准备
  • 使用Keras建立CNN
  • 使用Tensorflow建立CNN

前言

对于我们半路出家的神经网络学习者,有些会是走捷径先学习如何实现项目,然后再回来学习各种基础。
本文只适合于小白学习,大牛请打转向灯绕行。

基础介绍以及数据准备

卷积神经网络是通过以卷积计算为主,把前一层网络的数据计算生成这一层网络的一种网络结构。它最先使用在图像识别的领域,后来大家觉得它牛B,什么领域都想试试水。

它的历史就不在这里说了,因为我们不是历史研究者。卷积公式这里也不作展示了,因为目前主流的深度学习框架都不用你去编写这个公式,自行去了解一下就好了。
这次实验主要使用到的数据集是mnist,这个是经典的手写数字图像数据集。

# import common
import keras
from keras.datasets import mnist
from keras.utils import to_categorical
import numpy as np

准备一下数据。因为卷积计算图片通常包括通道信息。

# data prepare
X_train = None
y_train = None
X_test = None
y_test = None
def prepare_data():
    global X_train, y_train, X_test, y_test
    (X_train, y_train), (X_test, y_test) = mnist.load_data()
    X_train = np.expand_dims(X_train, 3)
    X_test = np.expand_dims(X_test, 3)
    # y_ data transform if need
    # y_train = to_categorical(y_train)
    # y_test = to_categorical(y_test)
RANDOM_STATE = 2

知识扩展:图像的通常格式
通常图片都是由RGB三个颜色通道组成。用numpy的说法来讲,一张图片的shape是(x, y, channel),前两者是这个图像的尺寸,也可以叫分辨率,单位是pixel。最后一者就是这个图像的通道,即它由多少颜色通道组成,通常为3,就是我们见到的R红G绿B蓝图层。
在这里插入图片描述

使用Keras建立CNN

这一部分应该常常使用到,使用LeNet网络,这里就不详细介绍原理了。可以参考(Keras添加网络层的N种方法)
在Tensorflow部分再作详细介绍CNN。
导入会使用到的库

# import common
import keras
from keras.layers import Conv2D, MaxPool2D, Activation, Flatten, Dense, Dropout
from keras.losses import categorical_crossentropy
from keras.models import Sequential
from keras.optimizers import Adam
from keras.utils import to_categorical
from sklearn.model_selection import train_test_split

预处理数据

# import common
import keras
from keras.layers import Conv2D, MaxPool2D, Activation, Flatten, Dense, Dropout
from keras.losses import categorical_crossentropy
from keras.models import Sequential
from keras.optimizers import Adam
from keras.utils import to_categorical
from sklearn.model_selection import train_test_split

建立神经网络并训练。

def net(input_shape, output_class, activation='tanh', is_compile=True):
    model = Sequential()
    model.add(Conv2D(20, (3, 3), input_shape=input_shape))
    model.add(MaxPool2D((2, 2)))
    model.add(Activation(activation))
    model.add(Conv2D(30, (3, 3)))
    model.add(MaxPool2D((2, 2)))
    model.add(Activation(activation))
    model.add(Dropout(0.5))
    model.add(Flatten())
    model.add(Dense(1024))
    model.add(Dropout(0.5))
    model.add(Dense(output_class))
    model.add(Activation('softmax'))
    model.compile(Adam(), categorical_crossentropy, ['accuracy'])
    return model

def main():
    model = net(X_training.shape[1:], y_training.shape[1])
    model.fit(X_training, y_training, batch_size=1000, epochs=4, validation_data=(X_valing, y_valing))

main()

这里为什么使用Adam,请参考实验(关于Keras里常用各种优化器的实验)
结果:

Train on 48000 samples, validate on 12000 samples
Epoch 1/4
48000/48000 [==============================] - 31s 638us/step - loss: 0.6272 - accuracy: 0.8065 - val_loss: 0.1858 - val_accuracy: 0.9439
Epoch 2/4
48000/48000 [==============================] - 34s 701us/step - loss: 0.2146 - accuracy: 0.9320 - val_loss: 0.1221 - val_accuracy: 0.9625
Epoch 3/4
48000/48000 [==============================] - 35s 724us/step - loss: 0.1638 - accuracy: 0.9484 - val_loss: 0.0994 - val_accuracy: 0.9700
Epoch 4/4
48000/48000 [==============================] - 32s 664us/step - loss: 0.1417 - accuracy: 0.9556 - val_loss: 0.0903 - val_accuracy: 0.9725

使用Tensorflow建立CNN

使用Keras可以很轻松简单搭建神经网络,但我们不知道里面到底做了什么事情,俗话说“纸上得来终觉浅,绝知此事要躬行”,所以我们就自己搭一个神经网络吧。
以下的结构图片是引用自LeNet官方的图片。
在这里插入图片描述
LeNet网络结构可以总结为下面的话:

  1. 输入一张原始的图片,对原始图片进行滑动的卷积运算。
  2. 得出一张深度更深(原来是3层,后来可能变成20层)的图片层。
  3. 对图片按区域进行池化,就是相当于取平均,缩小图像。
  4. 得出一张尺寸小的点的图片,再进行滑动的卷积运算。
  5. 得出一张深度更深的图片层。
  6. 对图片按区域进行池化。
  7. 得出一张尺寸小的点的图片。
  8. 将所得到的图片全部拉成一条长排形状的神经元形式。
  9. 进行全连接前向传播
    10.最后以有多少类别的输出。
    以mnist为例,即28281的形式。
    在这里插入图片描述
    神经网络就是训练神经层之间转换的参数。所以在卷积神经网络里,卷积核是里面的值就是在tensorflow要训练的参数。
    在这里插入图片描述
    如下图,卷积核会滑动地计算每一个区域的卷积值。
    在这里插入图片描述

代码如下:
导入要使用到的库。

# import common
import tensorflow as tf
from tensorflow.nn import conv2d, max_pool
from tensorflow.nn import sparse_softmax_cross_entropy_with_logits
from tensorflow.train import AdamOptimizer
from keras.utils import to_categorical
from sklearn.model_selection import train_test_split
from input_data import DataSet
from sklearn.metrics import accuracy_score

处理数据

# data process
prepare_data()
print(X_test.shape)
y_train = to_categorical(y_train)
y_test = to_categorical(y_test)

X_training, X_valing, y_training, y_valing = train_test_split(X_train, y_train, test_size=0.2, random_state=RANDOM_STATE)

train_data = DataSet(X_training, y_training)
validation_data = DataSet(X_valing, y_valing)
test_data = DataSet(X_test, y_test)
INPUT_NODE = X_train.shape[:]
IMAGE_SIZE = INPUT_NODE[1]
IMAGE_CHANNEL = INPUT_NODE[3]
print(INPUT_NODE, IMAGE_SIZE, IMAGE_CHANNEL)

CONV1_SIZE = 3
CONV1_DEEP = 20
CONV2_SIZE = 3
CONV2_DEEP = 30
MAXPOOL1_SIZE = 2
MAXPOOL2_SIZE = 2
FC_SIZE = 1024

OUTPUT_NODE = y_train.shape[1]

神经网络构建并训练

tf.reset_default_graph()
def inference(input_tensor, is_train=False):
    conv1_weights = tf.get_variable(
        'conv1_weights', 
        shape=[CONV1_SIZE, CONV1_SIZE, IMAGE_CHANNEL, CONV1_DEEP],
        initializer=tf.truncated_normal_initializer(stddev=0.1)
    )
    conv1_biases = tf.get_variable(
        'conv1_biases',
        shape=[CONV1_DEEP],
        initializer=tf.constant_initializer(0.0)
    )
    layer_conv1 = conv2d(
        input_tensor, 
        conv1_weights, 
        strides=[1, 1, 1, 1], 
        padding='SAME'
    )
    layer_relu1 = tf.nn.relu(tf.nn.bias_add(layer_conv1, conv1_biases))
    layer_pool1 = max_pool(
        layer_relu1, 
        [1, MAXPOOL1_SIZE, MAXPOOL1_SIZE, 1],
        [1, 2, 2, 1],
        'SAME'
    )
    
    conv2_weights = tf.get_variable(
        'conv2_weights',
        shape=[CONV2_SIZE, CONV2_SIZE, CONV1_DEEP, CONV2_DEEP],
        initializer=tf.truncated_normal_initializer(stddev=0.1)
    )
    conv2_biases = tf.get_variable(
        'conv2_biases',
        shape=[CONV2_DEEP],
        initializer=tf.constant_initializer(0.1)
    )
    layer_conv2 = conv2d(
        layer_pool1, 
        conv2_weights, 
        strides=[1, 1, 1, 1], 
        padding='SAME'
    )
    layer_relu2 = tf.nn.relu(tf.nn.bias_add(layer_conv2, conv2_biases))
    layer_pool2 = max_pool(
        layer_relu2, 
        [1, MAXPOOL2_SIZE, MAXPOOL2_SIZE, 1],
        [1, 2, 2, 1],
        'SAME'
    )
    
    pool_shape = layer_pool2.get_shape().as_list()  # num * size * size * depth
    nodes = pool_shape[1] * pool_shape[2] * pool_shape[3]
    layer_reshaped = tf.reshape(layer_pool2, [pool_shape[0], nodes])
    fc1_weights = tf.get_variable(
        'fc1_weights', 
        shape=[nodes, FC_SIZE], 
        initializer=tf.truncated_normal_initializer(stddev=0.1)
    )
    fc1_biases = tf.get_variable(
        'fc1_biases', 
        shape=[FC_SIZE],
        initializer=tf.constant_initializer(0.0)
    )
    layer_fc1 = tf.nn.relu(tf.matmul(layer_reshaped, fc1_weights) + fc1_biases)
    if is_train:
        layer_fc1 = tf.nn.dropout(layer_fc1, 0.5)
    fc2_weights = tf.get_variable(
        'fc2_weights', 
        shape=[FC_SIZE, OUTPUT_NODE],
        initializer=tf.truncated_normal_initializer(stddev=0.1)
    )
    fc2_biases = tf.get_variable(
        'fc2_biases', 
        shape=[OUTPUT_NODE],
        initializer=tf.constant_initializer(0.1)
    )
    layer_output = tf.matmul(layer_fc1, fc2_weights) + fc2_biases
    return layer_output

TRAIN_TIMES = 500
BATCH_SIZE = 1000
def train():
    X_input = tf.placeholder(tf.float32, shape=[BATCH_SIZE, IMAGE_SIZE, IMAGE_SIZE, IMAGE_CHANNEL], name='X_input')
    y_output = tf.placeholder(tf.float32, shape=[None, OUTPUT_NODE], name='y_output')
    y_inference = inference(X_input, True)
    
    loss = tf.reduce_mean(sparse_softmax_cross_entropy_with_logits(labels=tf.argmax(y_output, 1), logits=y_inference))
    train_op = AdamOptimizer(0.001).minimize(loss)
    
    with tf.Session() as sess:
        tf.global_variables_initializer().run()
        for i in range(TRAIN_TIMES):
            x, y = train_data.next_batch(BATCH_SIZE)
            x = x.reshape(-1, IMAGE_SIZE, IMAGE_SIZE, IMAGE_CHANNEL)
            _, loss_end = sess.run([train_op, loss], feed_dict={X_input: x, y_output: y})
            if i % 100 == 0:
                print('After {} training, loss is {}'.format(i, loss_end))
                x_v, y_v = validation_data.next_batch(BATCH_SIZE)
                x_v = x_v.reshape(-1, IMAGE_SIZE, IMAGE_SIZE, IMAGE_CHANNEL)
                loss_val = sess.run(loss, feed_dict={X_input: x_v, y_output: y_v})
                y_val_pred = sess.run(y_inference, feed_dict={X_input: x_v})
                y_val_pred = np.argmax(y_val_pred, 1)
                y_v = np.argmax(y_v, 1)
                acc = accuracy_score(y_v, y_val_pred)
                print('After {} training, validation loss is {}, accuracy is {}'.format(i, loss_val, acc))
        acc_mean = 0
        for i in range(test_data.num_examples // BATCH_SIZE):
            x_t, y_t = test_data.next_batch(BATCH_SIZE)
            x_t = x_t.reshape(-1, IMAGE_SIZE, IMAGE_SIZE, IMAGE_CHANNEL)
            y_test_pred = sess.run(y_inference, feed_dict={X_input: x_t})
            y_test_pred = np.argmax(y_test_pred, 1)
            y_t = np.argmax(y_t, 1)
            acc = accuracy_score(y_t, y_test_pred)
            acc_mean = acc_mean + (acc - acc_mean) / (i+1)
        print('End Accuracy {}'.format(acc_mean))
            
train()

结果如下。

After 0 training, loss is 3.286005973815918
After 0 training, validation loss is 3.2123682498931885, accuracy is 0.149
After 100 training, loss is 0.1718430370092392
After 100 training, validation loss is 0.17546264827251434, accuracy is 0.939
After 200 training, loss is 0.11053168773651123
After 200 training, validation loss is 0.07477676123380661, accuracy is 0.977
After 300 training, loss is 0.0553237646818161
After 300 training, validation loss is 0.07718825340270996, accuracy is 0.974
After 400 training, loss is 0.062298230826854706
After 400 training, validation loss is 0.0564616397023201, accuracy is 0.981
End Accuracy 0.9853999999999999

从上面得知,在卷积神经里的卷积层卷积核的值经过一定的优化寻找,可以使得该问题的loss最小。

本文链接http://www.taodudu.cc/news/show-1781807.html