首页 文章

在Python3中使用Keras优化CNN的体系结构

提问于
浏览
2

我正在尝试将CNN的验证准确率从76%(目前)提高到90%以上 . 我将在下面显示有关CNN性能和配置的所有信息 .

从本质上讲,我希望我的CNN能够区分两类mel光谱图:

Class # 1
class # 1
Class # 2
enter image description here
Here is the graph of accuracy vs epoch:

enter image description here

Here is the graph of loss vs. epoch

enter image description here

最后,这是模型架构配置

model = Sequential()
model.add(Conv2D(32, kernel_size=(3, 3),activation='relu',input_shape=(3, 640, 480)))
model.add(Conv2D(64, (3, 3), activation='relu', dim_ordering="th"))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))
model.add(Flatten())
model.add(Dense(128, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(2, activation='softmax'))

这是我对model.compile()和model.fit()的调用

model.compile(loss=keras.losses.categorical_crossentropy,
          optimizer=keras.optimizers.SGD(lr=0.001),
          metrics=['accuracy'])
print("Compiled model")

history = model.fit(X_train, Y_train,
      batch_size=8,
      epochs=50,
      verbose=1,
      validation_data=(X_test, Y_test))

如何更改CNN配置以提高验证准确度分数?

我尝试过的事情:

  • 降低学习率以防止准确性的零星波动 .

  • 将batch_size从64降低到8 .

  • 将时期数增加到50(但不确定这是否足够) .

任何帮助将不胜感激!

UPDATE #1 我将时期数增加到200,并且在让程序一夜之间运行后,我得到了经过验证的准确度得分约为76.31%

我正在发布一张精确度与时代相关的图片以及下面的纪元

enter image description here

enter image description here

还有哪些具体关于我的模型架构可以改变以获得更好的准确性?

1 回答

  • 1

    首先,您必须获取music_tagger_cnn.py并将其放在项目路径中 . 之后,您可以构建您的模型:

    from music_tagger_cnn import *
    input_tensor = Input(shape=(1, 18, 119))
    model =MusicTaggerCNN(input_tensor=input_tensor, include_top=False, weights='msd')
    

    您可以按所需的尺寸更改输入张量...我通常使用Theano暗淡排序,但Tensorflow作为后端,这就是为什么:

    from keras import backend as K
    K.set_image_dim_ordering('th')
    

    使用Theano暗淡排序,您必须考虑到必须更改样本维度的顺序

    X_train = X_train.transpose(0, 3, 2, 1)
    X_val = X_val.transpose(0, 3, 2, 1)
    

    之后,您必须冻结这些您不想更新的图层

    for layer in model.layers: 
         layer.trainable = False
    

    现在您可以设置自己的输出,例如:

    last_layer = model.get_layer('pool3').output
    out = Flatten()(last_layer)
    out = Dense(128, activation='relu', name='fc2')(out)
    out = Dropout(0.5)(out)
    out = Dense(n_classes, activation='softmax', name='fc3')(out)
    model = Model(input=model.input, output=out)
    

    之后你必须能够训练它:

    sgd = SGD(lr=0.01, momentum=0, decay=0.002, nesterov=True)
    model.compile(loss='categorical_crossentropy', optimizer=sgd, metrics=['accuracy'])
    history = model.fit(X_train, labels_train,
                              validation_data=(X_val, labels_val), nb_epoch=100, batch_size=5)
    

    请注意,标签应采用单热编码

    我希望它会有所帮助!!

    更新:发布代码,以便我可以帮助调试这些行并防止崩溃 .

    input_tensor = Input(shape=(3, 640, 480))
    model = MusicTaggerCNN(input_tensor=input_tensor, include_top=False, weights='msd')
    
    for layer in model.layers: 
         layer.trainable = False
    
    
    last_layer = model.get_layer('pool3').output
    out = Flatten()(last_layer)
    out = Dense(128, activation='relu', name='fc2')(out)
    out = Dropout(0.5)(out)
    out = Dense(n_classes, activation='softmax', name='fc3')(out)
    model = Model(input=model.input, output=out)
    
    sgd = SGD(lr=0.01, momentum=0, decay=0.002, nesterov=True)
    model.compile(loss='categorical_crossentropy', optimizer=sgd, metrics=['accuracy'])
    history = model.fit(X_train, labels_train,
                              validation_data=(X_test, Y_test), nb_epoch=100, batch_size=5)
    

    编辑#2

    # -*- coding: utf-8 -*-
    '''MusicTaggerCNN model for Keras.
    
    # Reference:
    
    - [Automatic tagging using deep convolutional neural networks](https://arxiv.org/abs/1606.00298)
    - [Music-auto_tagging-keras](https://github.com/keunwoochoi/music-auto_tagging-keras)
    
    '''
    from __future__ import print_function
    from __future__ import absolute_import
    
    from keras import backend as K
    from keras.layers import Input, Dense
    from keras.models import Model
    from keras.layers import Dense, Dropout, Flatten
    from keras.layers.convolutional import Convolution2D
    from keras.layers.convolutional import MaxPooling2D, ZeroPadding2D
    from keras.layers.normalization import BatchNormalization
    from keras.layers.advanced_activations import ELU
    from keras.utils.data_utils import get_file
    from keras.layers import Input, Dense
    
    TH_WEIGHTS_PATH = 'https://github.com/keunwoochoi/music-auto_tagging-keras/blob/master/data/music_tagger_cnn_weights_theano.h5'
    TF_WEIGHTS_PATH = 'https://github.com/keunwoochoi/music-auto_tagging-keras/blob/master/data/music_tagger_cnn_weights_tensorflow.h5'
    
    
    def MusicTaggerCNN(weights='msd', input_tensor=None,
                       include_top=True):
        '''Instantiate the MusicTaggerCNN architecture,
        optionally loading weights pre-trained
        on Million Song Dataset. Note that when using TensorFlow,
        for best performance you should set
        `image_dim_ordering="tf"` in your Keras config
        at ~/.keras/keras.json.
    
        The model and the weights are compatible with both
        TensorFlow and Theano. The dimension ordering
        convention used by the model is the one
        specified in your Keras config file.
    
        For preparing mel-spectrogram input, see
        `audio_conv_utils.py` in [applications](https://github.com/fchollet/keras/tree/master/keras/applications).
        You will need to install [Librosa](http://librosa.github.io/librosa/)
        to use it.
    
        # Arguments
            weights: one of `None` (random initialization)
                or "msd" (pre-training on ImageNet).
            input_tensor: optional Keras tensor (i.e. output of `layers.Input()`)
                to use as image input for the model.
            include_top: whether to include the 1 fully-connected
                layer (output layer) at the top of the network.
                If False, the network outputs 256-dim features.
    
    
        # Returns
            A Keras model instance.
        '''
        if weights not in {'msd', None}:
            raise ValueError('The `weights` argument should be either '
                             '`None` (random initialization) or `msd` '
                             '(pre-training on Million Song Dataset).')
    
        # Determine proper input shape
        if K.image_dim_ordering() == 'th':
            input_shape = (3, 640, 480)
        else:
            input_shape = (3, 640, 480)
    
        if input_tensor is None:
            melgram_input = Input(shape=input_shape)
        else:
            if not K.is_keras_tensor(input_tensor):
                melgram_input = Input(tensor=input_tensor, shape=input_shape)
            else:
                melgram_input = input_tensor
    
        # Determine input axis
        if K.image_dim_ordering() == 'th':
            channel_axis = 1
            freq_axis = 2
            time_axis = 3
        else:
            channel_axis = 3
            freq_axis = 1
            time_axis = 2
    
        # Input block
        x = BatchNormalization(axis=freq_axis, name='bn_0_freq')(melgram_input)
    
        # Conv block 1
        x = Convolution2D(64, 3, 3, border_mode='same', name='conv1')(x)
        x = BatchNormalization(axis=channel_axis, mode=0, name='bn1')(x)
        x = ELU()(x)
        x = MaxPooling2D(pool_size=(2, 4), name='pool1')(x)
    
        # Conv block 2
        x = Convolution2D(128, 3, 3, border_mode='same', name='conv2')(x)
        x = BatchNormalization(axis=channel_axis, mode=0, name='bn2')(x)
        x = ELU()(x)
        x = MaxPooling2D(pool_size=(2, 4), name='pool2')(x)
    
        # Conv block 3
        x = Convolution2D(128, 3, 3, border_mode='same', name='conv3')(x)
        x = BatchNormalization(axis=channel_axis, mode=0, name='bn3')(x)
        x = ELU()(x)
        x = MaxPooling2D(pool_size=(2, 4), name='pool3')(x)
    
    
    
        # Output
        x = Flatten()(x)
        if include_top:
            x = Dense(50, activation='sigmoid', name='output')(x)
    
        # Create model
        model = Model(melgram_input, x)
        if weights is None:
            return model
        else:
            # Load input
            if K.image_dim_ordering() == 'tf':
                raise RuntimeError("Please set image_dim_ordering == 'th'."
                                   "You can set it at ~/.keras/keras.json")
            model.load_weights('data/music_tagger_cnn_weights_%s.h5' % K._BACKEND,
                               by_name=True)
            return model
    

    编辑#3

    我尝试了使用MusicTaggerCRNN作为melgrams的特征提取器的keras示例 . 然后我训练了一个简单的NN,它有2个密集层和一个二进制输出 . 在我的例子中采集的样本也是一个二元分类器,我使用了 keras==1.2.2tensorflow-gpu==1.0.0 ,对我有用 .

    这是代码:

    from keras.applications.music_tagger_crnn import MusicTaggerCRNN
    from keras.applications.music_tagger_crnn import preprocess_input, decode_predictions
    import numpy as np
    from keras.layers import Input, Dense
    from keras.models import Model
    from keras.layers import Dense, Dropout, Flatten
    from keras.optimizers import SGD
    
    
    model = MusicTaggerCRNN(weights='msd', include_top=False)
    #Samples simulation
    audio_paths_train = ['data/genres/blues/blues.00000.au','data/genres/classical/classical.00000.au','data/genres/classical/classical.00002.au', 'data/genres/blues/blues.00003.au']
    audio_paths_test = ['data/genres/blues/blues.00001.au', 'data/genres/classical/classical.00001.au', 'data/genres/blues/blues.00002.au', 'data/genres/classical/classical.00003.au']
    labels_train = [0,1,1,0]
    labels_test = [0, 1, 0, 1]
    melgrams_train = [preprocess_input(audio_path) for audio_path in audio_paths_train]
    melgrams_test = [preprocess_input(audio_path) for audio_path in audio_paths_test]
    feats_train = [model.predict(np.expand_dims(melgram, axis=0)) for melgram in melgrams_train]
    feats_test = [model.predict(np.expand_dims(melgram, axis=0)) for melgram in melgrams_test]
    feats_train = np.array(feats_train)
    feats_test = np.array(feats_test)
    
    _input = Input(shape=(1,32))
    x = Flatten(name='flatten')(_input)
    x = Dense(128, activation='relu', name='fc6')(x)
    x = Dense(64, activation='relu', name='fc7')(x)
    x = Dense(1, activation='softmax', name='fc8')(x)
    class_model = Model(_input, x)
    
    sgd = SGD(lr=0.01, momentum=0, decay=0.02, nesterov=True)
    class_model.compile(loss='binary_crossentropy', optimizer=sgd, metrics=['accuracy'])
    history = class_model.fit(feats_train, labels_train, validation_data=(feats_test, labels_test), nb_epoch=100, batch_size=5, class_weight='auto')
    print(history.history['acc'])
    
    # Final evaluation of the model
    scores = class_model.evaluate(feats_test, labels_test, verbose=0)
    print("Accuracy: %.2f%%" % (scores[1] * 100))
    

相关问题