畳み込みニューラルネットの深層学習:MNIST手書き数字の分類 (MATLAB)
新井仁之(早稲田大学)
Ver.1.3. 2025年1月12日 文章の修正
Ver.1.2 2025年1月6日 文章の修正
Ver. 1.1 2025年1月5日
標記のプログラムを MATLAB で書いたものである.本プログラムを書くに当たっては,末尾にある「プログラムの参考文献」に記載の [1] ~ [4] (特に [3]) の Python コードに基づいて作成したが,ただしここでは MATLAB のセル構造を使い層を自由に増やせるようにした.また,モメンタム法を採用し,ドロップアウトも付け加えるようにした.教育上の目的から,Deep learning toolbox は用いていない.
なお,畳み込み Cnv2,Im2Col,Col2Im,最大プーリング,逆プーリングについては,詳しいチュートリアルも書いたので,
を必要に応じて参照してほしい.
以下では,畳み込みは same型である.
MNISTイメージ読み込み
*)ここでは予め TrainImages,TrainLabels,TestImages,TestLabels をダウンロードし mat ファイルにした。
combinedData = load('MNIST.mat');
TrainImages = combinedData.TrainImages_file1;
TrainLabels = combinedData.TrainLabels_file2;
TestImages = combinedData.TestImages_file3;
TestLabels = combinedData.TestLabels_file4;
各データのサイズ
whos TrainImages
Name Size Bytes Class Attributes
TrainImages 28x28x1x60000 376320000 double
whos TrainLabels
Name Size Bytes Class Attributes
TrainLabels 1x60000 480000 double
whos TestImages
Name Size Bytes Class Attributes
TestImages 28x28x1x10000 62720000 double
whos TestLabels
Name Size Bytes Class Attributes
TestLabels 1x10000 80000 double
以上の準備のもとに畳み込みニューラルネットの深層学習による手書き手書き文字画像の分類を始める.
TrainImages = permute(TrainImages,[4 3 1 2]);
学習用画像データに対する正解データ
% 例えば正解が 3 なら 0 0 1 0 0 0 0 0 0 0 の形に変更
TrainLabels(TrainLabels == 0) = 10;
TrainLabels = permute(TrainLabels,[2 1]);
TrainLabels_vector=zeros(60000,10);
TrainLabels_vector(i,k)=1;
TrainLabels_vector(1:3,:)
0 0 0 0 1 0 0 0 0 0
0 0 0 0 0 0 0 0 0 1
0 0 0 1 0 0 0 0 0 0
畳み込みニューラルネットの基本設計図の設定
init_image_size = [1,28,28]; % [チャネル数,縦サイズ 横サイズ]
% [1,28,28] は MNIST手書き文字データのものである.
L1 = 2; %畳み込み層の数の指定 画像サイズとプーリングのため,L1=1 または 2 に限る.
n1 = [32,64]; %各畳み込み層(1~L1層)のフィルタ数の指定
n2 = [100 200 10]; %各全結合層(L1+1~L1+L2)のユニットの数の指定
学習率,モメンタム法パラメータ,ドロップアウト率,エポック数,ミニバッチサイズの設定
% モメンタム法.Beta=0 にすれば通常の方法.
フィルタ,重み,バイアスの格納セルの設定
n1 = [init_image_size(1),n1];
%最初の全結合層の入力データの次元の計算;ただし畳み込みは 'same' 型
M = fix(init_image_size(2)/2^L1);
Setting = {L1,L2,n1,n3}; %読み込み用パラメータ
畳み込み用フィルタをランダムに設定
FilterSize = [3,3]; %フィルタサイズの指定(全フィルタ共通)
FT = FilterSize(1)*FilterSize(2);
Filters{1,k} = randn(n1(k+1),n1(k),FilterSize(1),FilterSize(2))*sqrt(2/(n1(k)*FT));
bias_c{1,k} = zeros(1,size(Filters{1,k},1));
全結合層の重みとバイアスをランダムに設定する場合(Heの方法)
Weights{1,k} = randn(n3(k+1),n3(k))*sqrt(2/(n3(k)));
bias_t{1,k} = zeros(1,size(Weights{1,k},1));
初期モーメントの設定(ゼロに設定)
Moment_c{1,k} = zeros(size(Filters{1,k}));
biasMoment_c{1,k} = zeros(size(bias_c{1,k}));
Moment_t{1,k} =zeros(size(Weights{1,k}));
biasMoment_t{1,k} = zeros(size(bias_t{1,k}));
順処理と誤差逆伝播法
以下では次の出力用データ蓄積用セルが作業上で設定される.
畳み込み層
Z_c = {1,L1}; アフィン変換によるデータ用
X_c = {1,L1}; 活性化によるデータ用
PI = {1,L1}; 最大プーリングの最大値の位置情報用
D_c = {1,L1}; 誤差逆伝播のデルタ
全結合層
Z_t = {1,L2}; アフィン変換によるデータ用
X_t = {1,L2}; 活性化によるデータ用
D_t = {1,L2}; 誤差逆伝播のデルタ
disp(['Epoch = ',num2str(epoch)]);
p = randperm(length(TrainImages));
%p = 1: length(TrainImages); %ランダムに配列しなおさない場合
for i = 1:ceil(length(TrainImages)/BatchSize)
indice = p((i-1)*BatchSize+1: i*BatchSize);
X0 = TrainImages(indice,1,:,:);
Y = TrainLabels_vector(indice,:);
[Z_c, X_c, PI, Z_t, X_t] = Forward(X0,Filters,Weights, bias_c,bias_t,Setting,drop_ratio);
[D_c,D_t] = Backward(Y,X_c,Z_c,X_t,Z_t,PI,Filters,Weights,Setting);
[Filters,Weights,bias_c,bias_t,Moment_c,biasMoment_c,Moment_t,biasMoment_t]=Update_Parameters_Momentum3(D_t, ...
D_c,X_t,X_c,X0,Filters,Weights,bias_c,bias_t,Eta, Beta,Moment_t,Moment_c,biasMoment_t,biasMoment_c,Setting);
Error = sum(-Y.*log(X_t{1,L2}+1e-5),'all') %誤差表示
%Error = sum(abs(-Y+X_t{1,L2}),'all') %誤差表示
end
Error = 187.0938
Error = 154.2223
Error = 117.5135
Error = 107.3014
Error = 101.3021
Error = 69.7032
Error = 60.1384
Error = 80.3276
Error = 71.5251
Error = 62.1540
Error = 76.2987
Error = 57.6518
Error = 45.5254
Error = 74.5437
Error = 64.2208
Error = 58.5835
Error = 48.9222
Error = 41.6133
Error = 58.5802
Error = 44.5758
Error = 51.5016
Error = 44.2006
Error = 51.9910
Error = 38.2157
Error = 53.8233
Error = 48.7570
Error = 37.7552
Error = 27.3906
Error = 50.5307
Error = 33.1821
Error = 26.9751
Error = 22.9181
Error = 37.8468
Error = 52.7764
Error = 32.8974
Error = 52.4852
Error = 46.7627
Error = 39.0996
Error = 41.5830
Error = 34.5535
Error = 42.3832
Error = 40.3643
Error = 36.7090
Error = 40.7426
Error = 30.4544
Error = 41.6558
Error = 35.8866
Error = 49.3361
Error = 51.0127
Error = 28.0032
Error = 23.8644
Error = 16.4526
Error = 52.4258
Error = 26.5160
Error = 45.3697
Error = 30.7617
Error = 19.3094
Error = 26.6900
Error = 47.1224
Error = 27.8648
Error = 17.9712
Error = 19.8456
Error = 42.3838
Error = 18.7059
Error = 20.4427
Error = 29.8678
Error = 27.5733
Error = 25.4239
Error = 24.2062
Error = 46.4912
Error = 42.9663
Error = 32.7472
Error = 32.4016
Error = 23.7074
Error = 16.6487
Error = 17.2878
Error = 22.2105
Error = 27.7210
Error = 43.5128
Error = 21.5606
Error = 19.6045
Error = 19.6218
Error = 22.8724
Error = 17.9338
Error = 15.4357
Error = 17.0479
Error = 15.5933
Error = 29.9307
Error = 21.4913
Error = 41.7859
Error = 40.5609
Error = 17.0601
Error = 27.4634
Error = 14.2217
Error = 20.1364
Error = 28.3615
Error = 28.9792
Error = 27.1420
Error = 20.4150
Error = 14.3001
Error = 29.4409
Error = 24.7152
Error = 24.1101
Error = 46.0693
Error = 12.1892
Error = 11.7126
Error = 29.5687
Error = 8.6009
Error = 23.2503
Error = 18.8115
Error = 22.9263
Error = 19.9231
Error = 24.2117
Error = 10.6676
Error = 21.7786
Error = 20.9067
Error = 30.4851
Error = 23.5525
Error = 23.8264
Error = 22.1190
Error = 19.2768
Error = 29.8607
Error = 24.3771
Error = 31.8694
Error = 24.1968
Error = 24.8173
Error = 26.4573
Error = 22.9752
Error = 26.0388
Error = 20.6534
Error = 29.1067
Error = 17.2930
Error = 23.3750
Error = 20.0302
Error = 20.2315
Error = 7.5367
Error = 9.7960
Error = 23.1133
Error = 12.8893
Error = 25.5206
Error = 17.3525
Error = 39.4160
Error = 8.2115
Error = 14.3914
Error = 10.8945
Error = 23.4490
Error = 26.7701
Error = 13.8361
Error = 13.9693
Error = 24.7258
Error = 17.8551
Error = 27.9483
Error = 13.0666
Error = 14.3412
Error = 11.7164
Error = 14.7687
Error = 24.8150
Error = 16.2963
Error = 20.0386
Error = 28.0697
Error = 11.0644
Error = 19.8943
Error = 11.4326
Error = 9.2160
Error = 10.1255
Error = 18.8511
Error = 32.0373
Error = 11.2674
Error = 19.2310
Error = 10.6894
Error = 28.6003
Error = 29.0682
Error = 23.5396
Error = 20.3993
Error = 18.8565
Error = 21.3105
Error = 12.4028
Error = 9.2942
Error = 33.1903
Error = 17.0115
Error = 20.7610
Error = 19.5488
Error = 6.6304
Error = 24.3450
Error = 8.5416
Error = 29.9209
Error = 12.2441
Error = 16.0289
Error = 25.4931
Error = 8.3173
Error = 10.9692
Error = 14.6697
Error = 8.3543
Error = 26.0118
Error = 14.9715
Error = 18.9929
Error = 7.7125
Error = 12.8754
Error = 24.4945
Error = 23.5788
Error = 20.3461
Error = 28.1717
Error = 20.7155
Error = 20.4459
Error = 17.9059
Error = 8.5980
Error = 6.1635
Error = 8.0842
Error = 9.9898
Error = 16.8833
Error = 8.4715
Error = 17.6937
Error = 25.1088
Error = 34.2543
Error = 7.6551
Error = 22.4387
Error = 10.7015
Error = 13.5003
Error = 16.6288
Error = 16.7999
Error = 4.2738
Error = 22.1706
Error = 6.7277
Error = 12.8019
Error = 17.5114
Error = 16.6468
Error = 8.1301
Error = 24.8585
Error = 15.6152
Error = 26.5191
Error = 14.7005
Error = 21.1266
Error = 17.8858
Error = 19.2591
Error = 9.9611
Error = 17.9872
Error = 17.9793
Error = 20.3158
Error = 10.1157
Error = 29.8966
テスト画像で学習結果をテスト
NumTestImages=1000; %テスト画像数 1000
X = TestImages(:,:,1:NumTestImages); % テスト用画像
Z = zeros(NumTestImages,1,28,28);
TL = TestLabels(1:NumTestImages); % テスト用画像の正解データ
TL(TL == 0) = 10; % 0 は 10 にする
正解率の計算
X0test = zeros(1,1,28,28);
X0test(1,1,:,:)=TX(i,1,:,:);
[~, ~, ~, ~, X_t] = Forward(X0test,Filters,Weights, bias_c,bias_t,Setting,0);
fprintf('正解率は %f\n', acc);
テスト画像:最初の10枚 画像と推測値の表示
X0test = zeros(10,1,28,28);
X0test(1:10,1,:,:)=TX(1:10,1,:,:);
[~, ~, ~, ~, X_t] = Forward(X0test,Filters,Weights, bias_c,bias_t,Setting,0);
TestData(:,:)=X0test(i,1,:,:);
end
answer = 7
answer = 2
answer = 1
answer = 10
answer = 4
answer = 1
answer = 4
answer = 9
answer = 6
answer = 9
このプログラム実行に必要な関数ファイル
Forward 順計算を行う関数ファイル
function [Z_c, X_c, PI, Z_t, X_t] = Forward(X0,Filters,Weights, bias_c,bias_t,Setting,drop_ratio)
[X1,PI1,Z1] = Conv_Layer(X0,Filters{1,1},bias_c{1,1});
[X,P,Z] = Conv_Layer(X_c{1,k-1},Filters{1,k},bias_c{1,k});
X_c{1,L1} = reshape_1(X_c{1,L1});
[X,Z] = Total_Layer(X_c{1,L1}, Weights{1,1},bias_t{1,1},0);
for k = 2:L2 %全結合層が複数ある場合
[X,Z] = Total_Layer(X_t{1,k-1}, Weights{1,k},bias_t{1,k},drop_ratio);
X_t{1,L2} = SoftMax(Z_t{1,L2}); %出力
Conv_Layer
function [X,PI,Z] = Conv_Layer(X,W,b)
[X,PI] = Pooling(A,[2,2],2,0);
Cnv2 (convolution)
function Z = Cnv2(Image,Filter,Bias,Stride,Padding)
% Stride = 1, Padding = 0 -> 'valid'型畳み込み
% Stride = 1, Padding = 1 -> 'same'型畳み込み
%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Image = Image(Number of Images, Channel, Image_height, Image_width)
% Filter = Filter(Number of Filters, Channel, Filter_height, Filter_width)
% Number of Images x Number of Filters x output_hight x output_width
% Output_hight = fix((Image_height-Block_height+2*Padding)/Stride)+1
% Output_width = fix((Image_width-Block_width+2*Padding)/Stride)+1
% 本ノートの参考文献 [2], [3], [4] の Pythonプログラムにもとづく.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%
[NumImage, ~, Image_height, Image_width] = size(Image);
[NumFilter,Channel,Filter_height,Filter_width] = size(Filter);
Output_height = fix((Image_height -Filter_height +2*Padding)/Stride)+1;
Output_width = fix((Image_width -Filter_width +2*Padding)/Stride)+1;
Image = Im2Col(Image,[Filter_height,Filter_width],Stride,Padding);
% Im2Col が Python 仕様の配列の場合
%Filter = permute(Filter,[4 3 2 1]);
%% Im2Col が MATLAB 仕様の配列方式の場合
Filter = permute(Filter,[3 4 2 1]);
Filter = reshape(Filter,[Channel*Filter_height*Filter_width NumFilter])';
Filter = permute(Filter,[2 1]);
Z = affine_product(Image,Filter,Bias);
Z = reshape(Z,[NumFilter Output_width Output_height NumImage]);
% Z = permute(Z,[4 1 3 2]);
Z = permute(Z,[4 1 2 3]);
if mod(Filter_height,2) == 0
Z = Z(:,:,2:Output_height,:);
if mod(Filter_width,2) == 0
Z = Z(:,:,:,2:Output_width);
アフィン変換
function C = affine_product(A,B,b)
% 画像 A とファイルタ B,そしてバイアス = 実数とのアフィン積
Pooling 最大プーリング
function [Pooling_Image,Max_positions] = Pooling(Image,block,Stride,Padding)
% 入力タイプ Images(Number of Images, Number of Channels, Image_height, Image_width)
% 出力タイプ Pooling_image(Number of Images Number of Channels Output_height Output_width)
Image = padarray(Image,[0 0 Padding Padding],0,'both');
[NumImage, Channels, Image_height, Image_width] = size(Image);
Output_height = fix((Image_height - block_height + 2*Padding)/Stride)+1;
Output_width = fix((Image_width - block_width + 2*Padding)/Stride)+1;
Col = Im2Col(Image,[block_height,block_width],Stride,Padding);
Col = permute(Col,[2 1]);
Col = reshape(Col,[block_height*block_width Channels*NumImage*Output_height*Output_width]);
[Max_values,Max_positions] = max(Col);
Pooling_Image = reshape(Max_values,[Channels Output_height Output_width NumImage]);
Pooling_Image = permute(Pooling_Image,[4 1 3 2]);
Total_Layer (オプション: Dropout)
function [Y,Z] = Total_Layer(X,W,b,drop_ratio)
Y = Y.*Dropout(Y,drop_ratio);
%Y = Y.*Dropout(X,0.2); %Dropout をするときに追加.
%% 下記の Dropout 関数もコメント解除する.
function ym = Dropout(y,ratio)
num = round(m*n*(1-ratio));
ソフトマックス関数
Y = exp(Y)./sum(exp(Y),2);
Backward
function [D_c,D_t] = Backward(Y,X_c,Z_c,X_t,Z_t,PI,Filters,Weights,Setting)
D_t{1,L2} = -Y+X_t{1,L2}; %最終層のデルタ
D_t{1,k} = (Z_t{1,k}>0).*(D_t{1,k+1}*Weights{1,k+1}); %全結合層の第k番目の層のデルタ
D = (X_c{1,L1}>0).*(D_t{1,1}*Weights{1,1}); %畳み込み層 第L1層のデルタ 畳み込み部分
D = Max_unpooling(Z_c{1,L1},PI{L1},D,[2,2],2);%畳み込み層 第L1層のデルタ 逆プーリング部分
s = fix(size(Filters{1,i},3)/2);
D1 = Delta_ConvLayer(X_c{1,i},D_c{1,i+1},Filters{1,i+1},1,s); %畳み込み層のデルタ
D1 = Max_unpooling(Z_c{1,i},PI{i},D1,[2,2],2);
逆最大プーリング
function X = Max_unpooling(Image_type,PI,Delta,block,Stride)
%[NumImage,Channel_Images,Image_height,Image_width] = size(Image_type);
% 下記参考文献 [3] の Python プログラムに基づく.
D_flat = reshape_1(Delta);
col_D(k,PI(k))=D_flat(k);
X=Col2Im(col_D,Image_type,block,Stride,0);
Max_unpooling で使う reshape_1 の定義
function Y = reshape_1(X)
m2 = permute(X,ndims(X):-1:1); % ndims(X) は X の次元数
m3 = reshape(m2,numel(X)/N,N); % numel(X) は X の要素数
Y = permute(m3,ndims(m3):-1:1);
Delta_ConvLayer
function dcx=Delta_ConvLayer(Output_shape,D,W,Stride,Padding)
% 下記参考文献 [3] の Python プログラムに基づく.
% D1 = delta_conv(X1,D2,W2,1,s); % 本プログラムで使う場合
[Num_D,Channel_D,D_height,D_width]=size(D);
[Num_Filter,Channel_Filter,Filter_height,Filter_width]=size(W);
D = permute(D,[2 3 4 1]); % [Channel_D D_height D_width Num_D]
D = reshape(D,[Channel_D Num_D*D_height*D_width]);
D = permute(D,[2 1]); %[Num_D*D_height*D_width Channel_D]
W = permute(W,[4 3 2 1]); %[Filter_width Filter_heigt Channel_Filter Num_Filter]
W = reshape(W,[Channel_Filter*Filter_height*Filter_width Num_Filter]);
W = permute(W,[2 1]); %[Num_Filter Channel_Filter*Filter_heigt*Filter_width]
col_D = D*W; %[Num_D*D_height*D_width Channel_Filter*Filter_heigt*Filter_width]
dcx = Col2Im(col_D,Output_shape,[Filter_height,Filter_width],Stride,Padding);
dcx = (Output_shape>0).*dcx; %ReLUの逆伝播
Updata_Parameters_Momentum3
function [Filters,Weights,bias_c,bias_t,Moment_c,biasMoment_c,Moment_t,biasMoment_t]=Update_Parameters_Momentum3(D_t, ...
D_c,X_t,X_c,X0,Filters, Weights,bias_c,bias_t,Eta, Beta,Moment_t,Moment_c,biasMoment_t,biasMoment_c,Setting)
Grad_weight = D_t{1,k}'*X_t{1,k-1};
Moment_t{1,k} = Eta*Grad_weight - Beta*Moment_t{1,k}; % 左辺を Moment と称しているが,この名称はあまり良くない.以下同様.
Weights{1,k} = Weights{1,k} - Moment_t{1,k};
Grad_bias_t = sum(D_t{1,k},1);
biasMoment_t{1,k} = Eta*Grad_bias_t - Beta*biasMoment_t{1,k};
bias_t{1,k} = bias_t{1,k} -biasMoment_t{1,k};
Moment_t{1,1} = Eta*(D_t{1,1}'*X_c{1,2}) - Beta*Moment_t{1,1};
Weights{1,1} = Weights{1,1} - Moment_t{1,1};
D_bias = sum(D_t{1,1},1);
biasMoment_t{1,1} = Eta*D_bias - Beta*biasMoment_t{1,1};
bias_t{1,1} = bias_t{1,1} - biasMoment_t{1,1};
Moment_c{1,k} = Eta*Grad_filter(X_c{1,k-1},D_c{1,k},Filters{1,k})-Beta*Moment_c{1,k} ;
Filters{1,k} = Filters{1,k} - Moment_c{1,k};
biasMoment_c{1,k} = Eta*Grad_bias_c(D_c{1,k})-Beta*biasMoment_c{1,k};
bias_c{1,k} = bias_c{1,k} -biasMoment_c{1,k};
Moment_c{1,1} = Eta*Grad_filter(X0,D_c{1,1},Filters{1,1})-Beta*Moment_c{1,1};
Filters{1,1} = Filters{1,1} - Moment_c{1,1};
biasMoment_c{1,1} = Eta*Grad_bias_c(D_c{1,1})-Beta*biasMoment_c{1,1};
bias_c{1,1} = bias_c{1,1} -biasMoment_c{1,1};
Grad_filter
function COL = Grad_filter(X,D,F)
% 下記参考文献 [3] の Python プログラムに基づく.
[NumFilter,ChannelFilter,Filter_height,Filter_width] = size(F);
[NumDelta,ChannelDelta,Delta_height,Delta_width] = size(D);
D = permute(D,[4 3 1 2]);
D = reshape(D,[NumDelta*Delta_height*Delta_width ChannelDelta]);
Col_X = Im2Col(X,[Filter_height,Filter_width],1,s);
COL = permute(COL,[2 1]);
COL = reshape(COL,[Filter_width Filter_height ChannelFilter NumFilter]);
COL = permute(COL, [4 3 2 1]);
Grad_bias_c
function B = Grad_bias_c(D)
% 下記参考文献 [3] の Python プログラムに基づく.
[NumDelta,ChannelDelta,Delta_height,Delta_width] = size(D);
D = permute(D,[4 3 1 2]);
D = reshape(D,[NumDelta*Delta_height*Delta_width ChannelDelta]);
Im2Col
function Col = Im2Col(Image,block,Stride,Padding)
%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Image = Image(Number of Filter, Channel, Image_height, Image_width)
% block = [number1, number2]
% Stride = number, Padding = number
% (NumImage*Output_height * Output_width) x (Channel * Block_height*Block_width)
% Output_hight = fix((Image_height-Block_height+2*Padding)/Stride)+1
% Output_width = fix((Image_width-Block_width+2*Padding)/Stride)+1
% 参考文献 [1], [2],[3], [4] の Pythonプログラムにもとづく.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%
[NumImage, Channel, Image_height, Image_width] = size(Image);
Output_hight = fix((Image_height-Block_height+2*Padding)/Stride)+1;
Output_width = fix((Image_width-Block_width+2*Padding)/Stride)+1;
col = zeros(NumImage,Block_height*Block_width,Channel,Output_hight,Output_width);
Image = padarray(Image,[0 0 Padding Padding],0,'both');
% MATLAB の im2col の配列にするには次のようにする.
col(:,(h-1)*Block_width+w,:,:,:)=Image(:,:,w:Stride:w-1+WS ,h:Stride:h-1+HS);
% この部分は,Python などの他の文献のようにするには次のようにする.
% col(:,(h-1)*Block_width+w,:,:,:)=Image(:,:, h:Stride:h-1+HS, w:Stride:w-1+WS);
% MATLAB式 im2col にするには次のようにする:
Col = permute(col,[2 3 4 5 1]);
% この部分は,Python などの他の文献のようにするには次のようにする.
% Col = permute(col,[2 3 5 4 1]);
Col = reshape(Col, [Channel*Block_height*Block_width NumImage*Output_hight*Output_width ]);
Col = permute(Col,[2 1]);
Col2Im
function Img = Col2Im(Col, Image_type, Block_size,Stride,Padding)
%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Col = Im2Col(Image,block,Stride,Padding) に対して,
% Img = Col2Im(Col,Image_type,block,Stride,Padding) となっていることを想定している.
% [NumImage, Channel, Image_height, Image_width] = size(Image)
% Image_type は [NumImage, Channel, Image_height, Image_width]
% block はデータを区切りブロック化するサイズを指定.畳み込みのフィルタのサイズに相当.
% Stride はデータをずらす幅.畳み込みのずらし幅に相当.
% Padding は画像のゼロパディングの幅.畳み込みの出力サイズを定める幅に相当.
% [NumFilter,Channel,Image_height,Image_width] = size(Image_type);
% [2], [3], [4] の Pythonプログラムにもとづく.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%
Block_height = Block_size(1);
Block_width = Block_size(2);
[NumFilter,Channel,Image_height,Image_width] = size(Image_type);
Output_height = fix((Image_height - Block_height + 2*Padding)/Stride)+1;
Output_width = fix((Image_width - Block_width + 2*Padding)/Stride)+1;
Col = permute(Col, [2 1]);
Col = reshape(Col,[Block_height*Block_width Channel Output_width Output_height NumFilter]);
Col = permute(Col,[5 1 2 4 3]); %[NumFilter Block_height*Block_width Channel Output_height Output_width]
im_scheme = zeros(NumFilter,Channel,Image_height+2*Padding+Stride-1, ...
Image_width+2*Padding+Stride-1);
coll = zeros(NumFilter,Channel,Output_height,Output_width);
coll(:,:,:,:) = Col(:,(h-1)*Block_width+w,:,:,:);
im_scheme(:,:, h:Stride:(h-1)+Output_height*Stride, w:Stride:(w-1)+Output_width*Stride) = im_scheme(:,:, ...
h:Stride:(h-1)+Output_height*Stride, w:Stride:(w-1)+Output_width*Stride)+coll(:,:,:,:);
Img = im_scheme(:,:,Padding+1:Image_height+Padding,Padding+1:Image_width+Padding);
プログラムの参考文献
[1] https://docs.chainer.org/en/v7.8.1.post1/reference/generated/chainer.functions.im2col.html
[2] 斎藤康毅,ゼロから作る Deep Learning,O'REILLY, 2016.
[3] 立石賢吾,やさしく学ぶディープラーニングがわかる数学のきほん,マイナビ,2019.
[4] 我妻幸長,はじめてのディープラーニング - Python で学ぶニューラルネットワークとバックプロパゲーション,SB Creative, 2018
Copyright © Hitoshi Arai, 2025