除多年編程經(jīng)驗之外，還有什么能區(qū)分一個程序員是“老手”還是“新手”？編程技巧當然是一部分，但它絕非是全部。
聰明的程序員可能比他們的同行擁有更出眾的編程技巧，但那不足以說明他們就是“老手”。同樣，僅僅因為擁有10年編程經(jīng)驗也并不意味著他們就是高手。在工作崗位上，擁有多年編程經(jīng)驗也不能說明問題。即便沒被炒魷魚，那也不能提升你的價值。

下面列舉的事情是大多數(shù)高級程序員都會做的。

1．至少掌握一門編程語言

我相信有些優(yōu)秀的程序員只懂（并精通）一門編程語言，但在某種程度上而言，這其實會限制一個人的思維。就像當你手拿一把錘子時，任何東西看起來都像釘子。我認為，知道并成功使用至少一門編程語言，這是程序員從新手走向老手的重要一步。我要說的是，像JavaScript和SQL這樣的輔助編程語言，只有當你確實已經(jīng)開發(fā)了完整的應用程序，并在其中使用這些編程語言時，它們才有價值。

2．工作之余也經(jīng)常編程

我抱怨過把開源作為招賢的一項要求，但那僅僅因為許多充滿激情的程序員把時間花在別的地方。除了對開源有所貢獻，你還可以做兼職顧問，兼職創(chuàng)業(yè)，開發(fā)自己的產(chǎn)品或者創(chuàng)辦自己的微型軟件公司。當然，你也可以嘗試從外部接些兼職項目，可參考伯樂在線的這篇《成功接項目需要注意的幾個要點》。

注：mISV即MicroISV,是一個只有一名員工組成的軟件公司，是一種微型公司。

3．經(jīng)歷完整的軟件開發(fā)過程，從概念設計到產(chǎn)品實現(xiàn)，再到產(chǎn)品維護

有的程序員希望不用自己動手就可以得到詳細的設計說明，然后把缺陷代碼交給測試/維護小組，這是平庸程序員的一個縮影。任何稱職的程序員都會跟客戶密切合作，去制定需求分析，然后編碼實現(xiàn)，當然也要維護。如果你在編碼實現(xiàn)階段偷懶了，那你在維護階段不得不付出代價。

4．不斷創(chuàng)新

創(chuàng)新就是做一些你身邊的人沒有做過的事情，用來改善你的過程或產(chǎn)品。你不一定非得是世界上第一個做這件事的人，只要發(fā)現(xiàn)一個問題，找到解決方法然后實現(xiàn)它就行。

5．編寫的軟件能解決實際問題

有一副虛構的場景：一名黑客，僅僅是出于對技術以及自己所做事情的愛，一天到晚都在編寫代碼。但這幾乎無助于成就一名優(yōu)秀的開發(fā)者。事實上，我曾見過有些開發(fā)人員和客戶爭論，來采用更好但不太有助客戶的技術。這會適得其反。你可以利用自己的時間來完善。但涉及工作時，你最好還是編寫能實際改進并解決問題的代碼，而不是使用那些不同尋常的算法或接口。

這些問題對于任何想成為高級開發(fā)人員的朋友來說，都合情合理。因為這些問題和擁有多少年編程經(jīng)驗并沒有關聯(lián)。如果你能做到上面4-5條，那你就是高級程序員。

posted @ 2013-12-05 10:30 杰哥閱讀(537) | 評論 (0) | 編輯收藏

[zz]【面向代碼】學習 Deep Learning（一）Neural Network

http://blog.csdn.net/dark_scope/article/details/9421061

==========================================================================================

最近一直在看Deep Learning，各類博客、論文看得不少

但是說實話，這樣做有些疏于實現(xiàn)，一來呢自己的電腦也不是很好，二來呢我目前也沒能力自己去寫一個toolbox

只是跟著Andrew Ng的UFLDL tutorial 寫了些已有框架的代碼(這部分的代碼見github)

后來發(fā)現(xiàn)了一個matlab的Deep Learning的toolbox，發(fā)現(xiàn)其代碼很簡單，感覺比較適合用來學習算法

再一個就是matlab的實現(xiàn)可以省略掉很多數(shù)據(jù)結構的代碼，使算法思路非常清晰

所以我想在解讀這個toolbox的代碼的同時來鞏固自己學到的，同時也為下一步的實踐打好基礎

(本文只是從代碼的角度解讀算法，具體的算法理論步驟還是需要去看paper的

我會在文中給出一些相關的paper的名字，本文旨在梳理一下算法過程，不會深究算法原理和公式)

==========================================================================================

使用的代碼：DeepLearnToolbox ，下載地址：點擊打開，感謝該toolbox的作者

==========================================================================================

第一章從分析NN(neural network)開始，因為這是整個deep learning的大框架，參見UFLDL

==========================================================================================

首先看一下\tests\test_example_NN.m ，跳過對數(shù)據(jù)進行normalize的部分，最關鍵的就是：

（為了注釋顯示有顏色，我把matlab代碼中的%都改成了//）

[cpp] view plain copy print ?

nn = nnsetup([784 100 10]);
opts.numepochs = 1; // Number of full sweeps through data
opts.batchsize = 100; // Take a mean gradient step over this many samples
[nn, L] = nntrain(nn, train_x, train_y, opts);
[er, bad] = nntest(nn, test_x, test_y);

nn = nnsetup([784 100 10]); opts.numepochs =  1;   //  Number of full sweeps through data opts.batchsize = 100;  //  Take a mean gradient step over this many samples [nn, L] = nntrain(nn, train_x, train_y, opts); [er, bad] = nntest(nn, test_x, test_y);

很簡單的幾步就訓練了一個NN，我們發(fā)現(xiàn)其中最重要的幾個函數(shù)就是nnsetup,nntrain和nntest了

那么我們分別來分析著幾個函數(shù)，\NN\nnsetup.m

nnsetup

[cpp] view plain copy print ?

function nn = nnsetup(architecture)
//首先從傳入的architecture中獲得這個網(wǎng)絡的整體結構，nn.n表示這個網(wǎng)絡有多少層，可以參照上面的樣例調(diào)用nnsetup([784 100 10])加以理解
nn.size = architecture;
nn.n = numel(nn.size);
//接下來是一大堆的參數(shù)，這個我們到具體用的時候再加以說明
nn.activation_function = 'tanh_opt'; // Activation functions of hidden layers: 'sigm' (sigmoid) or 'tanh_opt' (optimal tanh).
nn.learningRate = 2; // learning rate Note: typically needs to be lower when using 'sigm' activation function and non-normalized inputs.
nn.momentum = 0.5; // Momentum
nn.scaling_learningRate = 1; // Scaling factor for the learning rate (each epoch)
nn.weightPenaltyL2 = 0; // L2 regularization
nn.nonSparsityPenalty = 0; // Non sparsity penalty
nn.sparsityTarget = 0.05; // Sparsity target
nn.inputZeroMaskedFraction = 0; // Used for Denoising AutoEncoders
nn.dropoutFraction = 0; // Dropout level (http://www.cs.toronto.edu/~hinton/absps/dropout.pdf)
nn.testing = 0; // Internal variable. nntest sets this to one.
nn.output = 'sigm'; // output unit 'sigm' (=logistic), 'softmax' and 'linear'
//對每一層的網(wǎng)絡結構進行初始化，一共三個參數(shù)W,vW，p，其中W是主要的參數(shù)
//vW是更新參數(shù)時的臨時參數(shù)，p是所謂的sparsity，(等看到代碼了再細講)
for i = 2 : nn.n
// weights and weight momentum
nn.W{i - 1} = (rand(nn.size(i), nn.size(i - 1)+1) - 0.5) * 2 * 4 * sqrt(6 / (nn.size(i) + nn.size(i - 1)));
nn.vW{i - 1} = zeros(size(nn.W{i - 1}));
// average activations (for use with sparsity)
nn.p{i} = zeros(1, nn.size(i));
end
end

function nn = nnsetup(architecture) //首先從傳入的architecture中獲得這個網(wǎng)絡的整體結構，nn.n表示這個網(wǎng)絡有多少層，可以參照上面的樣例調(diào)用nnsetup([784 100 10])加以理解      nn.size   = architecture;     nn.n      = numel(nn.size);     //接下來是一大堆的參數(shù)，這個我們到具體用的時候再加以說明     nn.activation_function              = 'tanh_opt';   //  Activation functions of hidden layers: 'sigm' (sigmoid) or 'tanh_opt' (optimal tanh).     nn.learningRate                     = 2;            //  learning rate Note: typically needs to be lower when using 'sigm' activation function and non-normalized inputs.     nn.momentum                         = 0.5;          //  Momentum     nn.scaling_learningRate             = 1;            //  Scaling factor for the learning rate (each epoch)     nn.weightPenaltyL2                  = 0;            //  L2 regularization     nn.nonSparsityPenalty               = 0;            //  Non sparsity penalty     nn.sparsityTarget                   = 0.05;         //  Sparsity target     nn.inputZeroMaskedFraction          = 0;            //  Used for Denoising AutoEncoders     nn.dropoutFraction                  = 0;            //  Dropout level (http://www.cs.toronto.edu/~hinton/absps/dropout.pdf)     nn.testing                          = 0;            //  Internal variable. nntest sets this to one.     nn.output                           = 'sigm';       //  output unit 'sigm' (=logistic), 'softmax' and 'linear'     //對每一層的網(wǎng)絡結構進行初始化，一共三個參數(shù)W,vW，p，其中W是主要的參數(shù)       //vW是更新參數(shù)時的臨時參數(shù)，p是所謂的sparsity，(等看到代碼了再細講)        for i = 2 : nn.n            // weights and weight momentum         nn.W{i - 1} = (rand(nn.size(i), nn.size(i - 1)+1) - 0.5) * 2 * 4 * sqrt(6 / (nn.size(i) + nn.size(i - 1)));         nn.vW{i - 1} = zeros(size(nn.W{i - 1}));                  // average activations (for use with sparsity)         nn.p{i}     = zeros(1, nn.size(i));        end end

nntrain

setup大概就這樣一個過程，下面就到了train了，打開\NN\nntrain.m

我們跳過那些檢驗傳入數(shù)據(jù)是否正確的代碼，直接到關鍵的部分

denoising 的部分請參考論文：Extracting and Composing Robust Features with Denoising Autoencoders

[cpp] view plain copy print ?

m = size(train_x, 1);
//m是訓練樣本的數(shù)量
//注意在調(diào)用的時候我們設置了opt，batchsize是做batch gradient時候的大小
batchsize = opts.batchsize; numepochs = opts.numepochs;
numbatches = m / batchsize; //計算batch的數(shù)量
assert(rem(numbatches, 1) == 0, 'numbatches must be a integer');
L = zeros(numepochs*numbatches,1);
n = 1;
//numepochs是循環(huán)的次數(shù)
for i = 1 : numepochs
tic;
kk = randperm(m);
//把batches打亂順序進行訓練，randperm(m)生成一個亂序的1到m的數(shù)組
for l = 1 : numbatches
batch_x = train_x(kk((l - 1) * batchsize + 1 : l * batchsize), :);
//Add noise to input (for use in denoising autoencoder)
//加入noise，這是denoising autoencoder需要使用到的部分
//這部分請參見《Extracting and Composing Robust Features with Denoising Autoencoders》這篇論文
//具體加入的方法就是把訓練樣例中的一些數(shù)據(jù)調(diào)整變?yōu)?，inputZeroMaskedFraction表示了調(diào)整的比例
if(nn.inputZeroMaskedFraction ~= 0)
batch_x = batch_x.*(rand(size(batch_x))>nn.inputZeroMaskedFraction);
end
batch_y = train_y(kk((l - 1) * batchsize + 1 : l * batchsize), :);
//這三個函數(shù)
//nnff是進行前向傳播，nnbp是后向傳播，nnapplygrads是進行梯度下降
//我們在下面分析這些函數(shù)的代碼
nn = nnff(nn, batch_x, batch_y);
nn = nnbp(nn);
nn = nnapplygrads(nn);
L(n) = nn.L;
n = n + 1;
end
t = toc;
if ishandle(fhandle)
if opts.validation == 1
loss = nneval(nn, loss, train_x, train_y, val_x, val_y);
else
loss = nneval(nn, loss, train_x, train_y);
end
nnupdatefigures(nn, fhandle, loss, opts, i);
end
disp(['epoch ' num2str(i) '/' num2str(opts.numepochs) '. Took ' num2str(t) ' seconds' '. Mean squared error on training set is ' num2str(mean(L((n-numbatches):(n-1))))]);
nn.learningRate = nn.learningRate * nn.scaling_learningRate;
end

m = size(train_x, 1); //m是訓練樣本的數(shù)量 //注意在調(diào)用的時候我們設置了opt，batchsize是做batch gradient時候的大小 batchsize = opts.batchsize; numepochs = opts.numepochs; numbatches = m / batchsize;  //計算batch的數(shù)量 assert(rem(numbatches, 1) == 0, 'numbatches must be a integer'); L = zeros(numepochs*numbatches,1); n = 1; //numepochs是循環(huán)的次數(shù) for i = 1 : numepochs     tic;     kk = randperm(m);     //把batches打亂順序進行訓練，randperm(m)生成一個亂序的1到m的數(shù)組     for l = 1 : numbatches         batch_x = train_x(kk((l - 1) * batchsize + 1 : l * batchsize), :);         //Add noise to input (for use in denoising autoencoder)         //加入noise，這是denoising autoencoder需要使用到的部分         //這部分請參見《Extracting and Composing Robust Features with Denoising Autoencoders》這篇論文         //具體加入的方法就是把訓練樣例中的一些數(shù)據(jù)調(diào)整變?yōu)?，inputZeroMaskedFraction表示了調(diào)整的比例         if(nn.inputZeroMaskedFraction ~= 0)             batch_x = batch_x.*(rand(size(batch_x))>nn.inputZeroMaskedFraction);         end         batch_y = train_y(kk((l - 1) * batchsize + 1 : l * batchsize), :);         //這三個函數(shù)         //nnff是進行前向傳播，nnbp是后向傳播，nnapplygrads是進行梯度下降         //我們在下面分析這些函數(shù)的代碼         nn = nnff(nn, batch_x, batch_y);         nn = nnbp(nn);         nn = nnapplygrads(nn);         L(n) = nn.L;         n = n + 1;     end          t = toc;     if ishandle(fhandle)         if opts.validation == 1             loss = nneval(nn, loss, train_x, train_y, val_x, val_y);         else             loss = nneval(nn, loss, train_x, train_y);         end         nnupdatefigures(nn, fhandle, loss, opts, i);     end              disp(['epoch ' num2str(i) '/' num2str(opts.numepochs) '. Took ' num2str(t) ' seconds' '. Mean squared error on training set is ' num2str(mean(L((n-numbatches):(n-1))))]);     nn.learningRate = nn.learningRate * nn.scaling_learningRate; end

下面分析三個函數(shù)nnff,nnbp和nnapplygrads

nnff

nnff就是進行feedforward pass，其實非常簡單，就是整個網(wǎng)絡正向跑一次就可以了

當然其中有dropout和sparsity的計算

具體的參見論文“Improving Neural Networks with Dropout“和Autoencoders and Sparsity

[cpp] view plain copy print ?

function nn = nnff(nn, x, y)
//NNFF performs a feedforward pass
// nn = nnff(nn, x, y) returns an neural network structure with updated
// layer activations, error and loss (nn.a, nn.e and nn.L)
n = nn.n;
m = size(x, 1);
x = [ones(m,1) x];
nn.a{1} = x;
//feedforward pass
for i = 2 : n-1
//根據(jù)選擇的激活函數(shù)不同進行正向傳播計算
//你可以回過頭去看nnsetup里面的第一個參數(shù)activation_function
//sigm就是sigmoid函數(shù),tanh_opt就是tanh的函數(shù)，這個toolbox好像有一點改變
//tanh_opt是1.7159*tanh(2/3.*A)
switch nn.activation_function
case 'sigm'
// Calculate the unit's outputs (including the bias term)
nn.a{i} = sigm(nn.a{i - 1} * nn.W{i - 1}');
case 'tanh_opt'
nn.a{i} = tanh_opt(nn.a{i - 1} * nn.W{i - 1}');
end
//dropout的計算部分部分 dropoutFraction 是nnsetup中可以設置的一個參數(shù)
if(nn.dropoutFraction > 0)
if(nn.testing)
nn.a{i} = nn.a{i}.*(1 - nn.dropoutFraction);
else
nn.dropOutMask{i} = (rand(size(nn.a{i}))>nn.dropoutFraction);
nn.a{i} = nn.a{i}.*nn.dropOutMask{i};
end
end
//計算sparsity，nonSparsityPenalty 是對沒達到sparsitytarget的參數(shù)的懲罰系數(shù)
//calculate running exponential activations for use with sparsity
if(nn.nonSparsityPenalty>0)
nn.p{i} = 0.99 * nn.p{i} + 0.01 * mean(nn.a{i}, 1);
end
//Add the bias term
nn.a{i} = [ones(m,1) nn.a{i}];
end
switch nn.output
case 'sigm'
nn.a{n} = sigm(nn.a{n - 1} * nn.W{n - 1}');
case 'linear'
nn.a{n} = nn.a{n - 1} * nn.W{n - 1}';
case 'softmax'
nn.a{n} = nn.a{n - 1} * nn.W{n - 1}';
nn.a{n} = exp(bsxfun(@minus, nn.a{n}, max(nn.a{n},[],2)));
nn.a{n} = bsxfun(@rdivide, nn.a{n}, sum(nn.a{n}, 2));
end
//error and loss
//計算error
nn.e = y - nn.a{n};
switch nn.output
case {'sigm', 'linear'}
nn.L = 1/2 * sum(sum(nn.e .^ 2)) / m;
case 'softmax'
nn.L = -sum(sum(y .* log(nn.a{n}))) / m;
end
end

function nn = nnff(nn, x, y) //NNFF performs a feedforward pass // nn = nnff(nn, x, y) returns an neural network structure with updated // layer activations, error and loss (nn.a, nn.e and nn.L)      n = nn.n;     m = size(x, 1);          x = [ones(m,1) x];     nn.a{1} = x;      //feedforward pass     for i = 2 : n-1         //根據(jù)選擇的激活函數(shù)不同進行正向傳播計算         //你可以回過頭去看nnsetup里面的第一個參數(shù)activation_function         //sigm就是sigmoid函數(shù),tanh_opt就是tanh的函數(shù)，這個toolbox好像有一點改變         //tanh_opt是1.7159*tanh(2/3.*A)         switch nn.activation_function              case 'sigm'                 // Calculate the unit's outputs (including the bias term)                 nn.a{i} = sigm(nn.a{i - 1} * nn.W{i - 1}');             case 'tanh_opt'                 nn.a{i} = tanh_opt(nn.a{i - 1} * nn.W{i - 1}');         end                  //dropout的計算部分部分 dropoutFraction 是nnsetup中可以設置的一個參數(shù)         if(nn.dropoutFraction > 0)             if(nn.testing)                 nn.a{i} = nn.a{i}.*(1 - nn.dropoutFraction);             else                 nn.dropOutMask{i} = (rand(size(nn.a{i}))>nn.dropoutFraction);                 nn.a{i} = nn.a{i}.*nn.dropOutMask{i};             end         end         //計算sparsity，nonSparsityPenalty 是對沒達到sparsitytarget的參數(shù)的懲罰系數(shù)         //calculate running exponential activations for use with sparsity         if(nn.nonSparsityPenalty>0)             nn.p{i} = 0.99 * nn.p{i} + 0.01 * mean(nn.a{i}, 1);         end                  //Add the bias term         nn.a{i} = [ones(m,1) nn.a{i}];     end     switch nn.output          case 'sigm'             nn.a{n} = sigm(nn.a{n - 1} * nn.W{n - 1}');         case 'linear'             nn.a{n} = nn.a{n - 1} * nn.W{n - 1}';         case 'softmax'             nn.a{n} = nn.a{n - 1} * nn.W{n - 1}';             nn.a{n} = exp(bsxfun(@minus, nn.a{n}, max(nn.a{n},[],2)));             nn.a{n} = bsxfun(@rdivide, nn.a{n}, sum(nn.a{n}, 2));      end     //error and loss 	//計算error     nn.e = y - nn.a{n};          switch nn.output         case {'sigm', 'linear'}             nn.L = 1/2 * sum(sum(nn.e .^ 2)) / m;          case 'softmax'             nn.L = -sum(sum(y .* log(nn.a{n}))) / m;     end end

nnbp

代碼：\NN\nnbp.m

nnbp呢是進行back propagation的過程，過程還是比較中規(guī)中矩，和ufldl中的Neural Network講的基本一致

值得注意的還是dropout和sparsity的部分

[cpp] view plain copy print ?

if(nn.nonSparsityPenalty>0)
pi = repmat(nn.p{i}, size(nn.a{i}, 1), 1);
sparsityError = [zeros(size(nn.a{i},1),1) nn.nonSparsityPenalty * (-nn.sparsityTarget ./ pi + (1 - nn.sparsityTarget) ./ (1 - pi))];
end
// Backpropagate first derivatives
if i+1==n % in this case in d{n} there is not the bias term to be removed
d{i} = (d{i + 1} * nn.W{i} + sparsityError) .* d_act; // Bishop (5.56)
else // in this case in d{i} the bias term has to be removed
d{i} = (d{i + 1}(:,2:end) * nn.W{i} + sparsityError) .* d_act;
end
if(nn.dropoutFraction>0)
d{i} = d{i} .* [ones(size(d{i},1),1) nn.dropOutMask{i}];
end

        if(nn.nonSparsityPenalty>0)             pi = repmat(nn.p{i}, size(nn.a{i}, 1), 1);             sparsityError = [zeros(size(nn.a{i},1),1) nn.nonSparsityPenalty * (-nn.sparsityTarget ./ pi + (1 - nn.sparsityTarget) ./ (1 - pi))];         end                  // Backpropagate first derivatives         if i+1==n % in this case in d{n} there is not the bias term to be removed                          d{i} = (d{i + 1} * nn.W{i} + sparsityError) .* d_act; // Bishop (5.56)         else // in this case in d{i} the bias term has to be removed             d{i} = (d{i + 1}(:,2:end) * nn.W{i} + sparsityError) .* d_act;         end                  if(nn.dropoutFraction>0)             d{i} = d{i} .* [ones(size(d{i},1),1) nn.dropOutMask{i}];         end

這只是實現(xiàn)的內(nèi)容，代碼中的d{i}就是這一層的delta值，在ufldl中有講的

dW{i}基本就是計算的gradient了，只是后面還要加入一些東西，進行一些修改

具體原理參見論文“Improving Neural Networks with Dropout“ 以及 Autoencoders and Sparsity的內(nèi)容

nnapplygrads

代碼文件：\NN\nnapplygrads.m

[cpp] view plain copy print ?

for i = 1 : (nn.n - 1)
if(nn.weightPenaltyL2>0)
dW = nn.dW{i} + nn.weightPenaltyL2 * nn.W{i};
else
dW = nn.dW{i};
end
dW = nn.learningRate * dW;
if(nn.momentum>0)
nn.vW{i} = nn.momentum*nn.vW{i} + dW;
dW = nn.vW{i};
end
nn.W{i} = nn.W{i} - dW;
end

    for i = 1 : (nn.n - 1)         if(nn.weightPenaltyL2>0)             dW = nn.dW{i} + nn.weightPenaltyL2 * nn.W{i};         else             dW = nn.dW{i};         end                  dW = nn.learningRate * dW;                  if(nn.momentum>0)             nn.vW{i} = nn.momentum*nn.vW{i} + dW;             dW = nn.vW{i};         end                      nn.W{i} = nn.W{i} - dW;     end

這個內(nèi)容就簡單了，nn.weightPenaltyL2 是weight decay的部分，也是nnsetup時可以設置的一個參數(shù)

有的話就加入weight Penalty，防止過擬合，然后再根據(jù)momentum的大小調(diào)整一下，最后改變nn.W{i}即可

nntest

nntest再簡單不過了，就是調(diào)用一下nnpredict，在和test的集合進行比較

[cpp] view plain copy print ?

function [er, bad] = nntest(nn, x, y)
labels = nnpredict(nn, x);
[~, expected] = max(y,[],2);
bad = find(labels ~= expected);
er = numel(bad) / size(x, 1);
end

function [er, bad] = nntest(nn, x, y)     labels = nnpredict(nn, x);     [~, expected] = max(y,[],2);     bad = find(labels ~= expected);         er = numel(bad) / size(x, 1); end

nnpredict

代碼文件：\NN\nnpredict.m

[cpp] view plain copy print ?

function labels = nnpredict(nn, x)
nn.testing = 1;
nn = nnff(nn, x, zeros(size(x,1), nn.size(end)));
nn.testing = 0;
[~, i] = max(nn.a{end},[],2);
labels = i;
end

function labels = nnpredict(nn, x)     nn.testing = 1;     nn = nnff(nn, x, zeros(size(x,1), nn.size(end)));     nn.testing = 0;          [~, i] = max(nn.a{end},[],2);     labels = i; end

繼續(xù)非常簡單，predict不過是nnff一次，得到最后的output~~

max(nn.a{end},[],2); 是返回每一行的最大值以及所在的列數(shù)，所以labels返回的就是標號啦

(這個test好像是專門用來test 分類問題的，我們知道nnff得到最后的值即可)

總結

總的來說，神經(jīng)網(wǎng)絡的代碼比較常規(guī)易理解，基本上和UFLDL中的內(nèi)容相差不大

只是加入了dropout的部分和denoising的部分

本文的目的也不奢望講清楚這些東西，只是給出一個路線，可以跟著代碼去學習，加深對算法的理解和應用能力

posted @ 2013-11-27 20:10 杰哥閱讀(1075) | 評論 (0) | 編輯收藏

戴爾臺式機系列問題：每次開機要按F1

每次開機要按F1，曾經(jīng)網(wǎng)上搜索過沒找到原因。因為CPU風扇響，今天20131122戴爾客服上門服務，講是因為我沒有使用獨立顯卡的問題。如果顯示器加上白色的轉(zhuǎn)接頭，使用獨立顯卡就沒有問題。原來一直使用的是集成顯卡。這樣要耗內(nèi)存。
按F8進入不了安全模式，20131202電話戴爾客服，只解決硬件問題，軟件問題不負責上門解決。

posted @ 2013-11-22 11:47 杰哥閱讀(502) | 評論 (0) | 編輯收藏

[zz]Matlab圖形中顯示數(shù)學公式

http://blog.sina.com.cn/s/blog_401aa0540100ufl1.html

Matlab 繪圖時，可以通過設置 "interpreter" 屬性顯示數(shù)學公式。

E.g.

xlabel('$n$','interpreter','latex');

ylabel('${\gamma}$','interpreter','latex');

title('${\gamma}(n)$','interpreter','latex');

text(0.5,0.5, '$\int_{a}^{b}f(x)dx$', 'interpreter','latex');

例1：我的程序Flicker.m (所有字體都是Latex)

h = legend('ours','$l_p$-norm MKL',1);

set(h,'Interpreter','latex');

例2：我的程序AR_Sparse.m (僅僅公式字體是latex)

h = legend('l1','DLSR-FS',4);

h1 = findobj(get(h,'Children'),'String','l1');

set(h1,'String','$l_1$','Interpreter','LaTex');

---------------------------如何在matlab中的xlabel，ylabel，legend和text函數(shù)中使用latex(以下關于legend的不必看，已到我的例二)---------------------------
http://blog.sina.com.cn/s/blog_6e0693f70100nj22.html
以下例子中展示了如何用在matlab函數(shù)中使用latex
t = 0:0.1:2*pi;

x = sin(t);

hold on;

plot(t,x,'-*');

plot(t,2*x,'-.');

%% Using Latex in xlabel and ylabel

xlabel('$Time$','Interpreter','LaTex');

ylabel('$Value$','Interpreter','LaTex');

%% Using Latex in legend

h = legend('sin(x)__','2*sin(x)__');

h1 = findobj(get(h,'Children'),'String','sin(x)__');

set(h1,'String','$sin(\hat{x})$','Interpreter','LaTex');

h2 = findobj(get(h,'Children'),'String','2*sin(x)__');

set(h2,'String','$2*sin(\hat{x})$','Interpreter','LaTex');

%% Using Latex in text

text('Interpreter','latex',...

'String','$$\int_0^x\!\int_y dF(u,v)$$',...

'Position',[3 1],...

'FontSize',16);

%% Using Latex in title

title('$How \ to \ use \ latex \ in \ figure$','Interpreter','LaTex');

posted @ 2013-11-02 20:00 杰哥閱讀(3610) | 評論 (0) | 編輯收藏

元胞(cell)與結構體(struct)

元胞數(shù)組cell array

Element:cell;以下標index訪問cell，以元胞內(nèi)編址content addressing訪問元胞內(nèi)容；cell中可以存放任何類型，任何大小的數(shù)組

cell的創(chuàng)建

cell()%創(chuàng)建元胞數(shù)組

c=cell(2);%創(chuàng)建2×2的

c=cell(m,n);%創(chuàng)建m×n的,用cell函數(shù)創(chuàng)建元胞數(shù)組，創(chuàng)建的數(shù)組為空元胞。cell函數(shù)創(chuàng)建空元胞數(shù)組的主要目的是為數(shù)組預先分配連續(xù)的存儲空間，節(jié)約內(nèi)存占用，提高執(zhí)行效率。同cppblog/MATLAB程序的優(yōu)化、預先分配存儲空間

c_str=char('This is cell array');c_R=rand(3,3);c_comp=1+2i;c_sym=sym('sin(-3*t)*exp(-t)');

兩種賦值方式：index賦值和content addressing賦值

index賦值：A(1,1)={c_str};A(1,2)={c_R};A(2,1)={c_comp};A(2,2)={c_sym};

content addressing賦值:B{1,1}=c_str;B{1,2}=c_R;B{2,1}=c_comp;B{2,2}=c_sym;% The class({c_str}) and class(c_str) are cell and string, respectively.

cell array的訪問
B{1,2}
B{1,2}(1,2)

問題：我有一個cell變量。比方說3*1的size 每一個里面存了一個1?9的矩陣。matlab中有沒有快速的語句把每一個1*9的矩陣，reshape成3*3的矩陣。也就是說對一個cell變量中的每一個元素(應該是一個矩陣)進行reshape操作。當然不想用循環(huán)做。
答案：A = {rand(1,9),rand(1,9),rand(1,9)}; cellfun(@(x) reshape(x, 3, 3).', A, 'UniformOutput', false)

cell轉(zhuǎn)化成矩陣的函數(shù)cell2mat
例：A = {rand(1,9),rand(1,9),rand(1,9)};cell2mat(A)

結構體數(shù)組 structure array

Element:structure

域訪問

域中可以存放任何類型、任何大小的數(shù)組

類C

cell和struct的轉(zhuǎn)換cell2struct.

Matlab提供了兩種定義結構的方式：直接應用和使用struct函數(shù)。
1. 使用直接引用方式定義結構
與建立數(shù)值型數(shù)組一樣，建立新struct對象不需要事先申明，可以直接引用，而且可以動態(tài)擴充。比如建立一個復數(shù)變量x：
x.real = 0; % 創(chuàng)建字段名為real，并為該字段賦值為0
x.imag = 0 % 為x創(chuàng)建一個新的字段imag，并為該字段賦值為0
x =
real: 0
imag: 0
然后可以將旗動態(tài)擴充為數(shù)組：
x(2).real = 0; % 將x擴充為1×2的結構數(shù)組
x(2).imag = 0;
在任何需要的時候，也可以為數(shù)組動態(tài)擴充字段，如增加字段scale：
x(1).scale = 0;
這樣，所有x都增加了一個scale字段，而x(1)之外的其他變量的scale字段為空：
x(1) % 查看結構數(shù)組的第一個元素的各個字段的內(nèi)容
ans =
real: 0
imag: 0
scale: 0
2. 使用struct函數(shù)創(chuàng)建結構
使用struct函數(shù)也可以創(chuàng)建結構，該函數(shù)產(chǎn)生或吧其他形式的數(shù)據(jù)轉(zhuǎn)換為結構數(shù)組。
struct的使用格式為：
s = sturct('field1',values1,'field2',values2,…);
該函數(shù)將生成一個具有指定字段名和相應數(shù)據(jù)的結構數(shù)組，其包含的數(shù)據(jù)values1、valuese2等必須為具有相同維數(shù)的數(shù)據(jù)，數(shù)據(jù)的存放位置域其他結構位置一一對應的。對于struct的賦值用到了元胞數(shù)組。數(shù)組values1、values2等可以是元胞數(shù)組、標量元胞單元或者單個數(shù)值。每個values的數(shù)據(jù)被賦值給相應的field字段。
當valuesx為元胞數(shù)組的時候，生成的結構數(shù)組的維數(shù)與元胞數(shù)組的維數(shù)相同。而在數(shù)據(jù)中不包含元胞的時候，得到的結構數(shù)組的維數(shù)是1×1的。例如：
s = struct('type',{'big','little'},'color',{'blue','red'},'x',{3,4})
s =
1x2 struct array with fields:
type
color
x

參考文獻：http://blog.sciencenet.cn/blog-436588-320694.html

posted @ 2013-09-21 11:31 杰哥閱讀(3981) | 評論 (0) | 編輯收藏

matlab fscanf

結合我的教材P31

matlab文件操作：見電腦目錄Blessing of Dimensionality\Features\code\readFea.m 和 findImsFeaIdx.m

http://blog.sina.com.cn/s/blog_4cfb5a6201015i8q.html

例1：
路徑+文件名：d:\moon.txt
內(nèi)容：

13,1,3.4

3,2.1,23

1,12,2

4,5.4,6
現(xiàn)在為了讀取moon中的數(shù)據(jù)存在一個數(shù)組里，可以用如下方法

fid=fopen('d:\moon.txt');

data=fscanf(fid,'%f,%f,%f',[3,inf]) ;%這里得用單引號

fclose(fid);

這時data中的數(shù)據(jù)如下：

13 3 1 4

1 2.1 12 5.4

3.4 23 2 6

例2：
數(shù)據(jù)在d:\test.txt

0.00 good 2

0.10 bot 3

1.02 yes 4

1.00 yes 5

1.00 yes 6

1.00 yes 3

1.00 yes 5

1.00 yes 6

1.00 yes 1

1.00 yes 3

1.00 yes 7

1.00 yes 3

1.00 yes 2

程序：

fid = fopen('d:\test.txt', 'r');

a = fscanf(fid, '%f %*s %d ', [2 inf]) % It has two rows now.

fclose(fid);
a

解釋下：第一列和第二列之間有四個空格，format也要四空格哦！有三列即三種類型，要有三種format，%*s即為不輸出字符串型。

fid = fopen('E:\temp\test.txt', 'r');

a = fscanf(fid, '%f %*s %*f ', 5) % It has five rows and one column now. %*s %*f 即這兩個不輸出

fclose(fid);
a

程序結果為：

a =

0.1000

1.0200

1.0000

posted @ 2013-09-13 14:34 杰哥閱讀(1271) | 評論 (0) | 編輯收藏

ICCV-Final-Results

摘要: http://www.cs.toronto.edu/~kyros/local_outgoing/ICCV-Final-Results/ICCV 2013 Results (29 Aug, 2013)Papers submitted: 1629Withdrawals and administrative rejections: 128Accepted as Orals: 41 (2.52% oral... 閱讀全文

posted @ 2013-09-05 16:02 杰哥閱讀(14203) | 評論 (0) | 編輯收藏

LibSVM學習記錄

在電腦文件夾E:\other\matlab 2007a\work\SVM\libsvm-mat-3.0-1 ，這個是已經(jīng)編譯好的，到64位機上要重新編譯(不要利用別人傳的，因為可能改過SVM程序，例如Libing wang他改過其中程序，最原始版本：E:\other\matlab 2007a\work\SVM\libsvm-mat-3.0-1.zip,從http://www.csie.ntu.edu.tw/~cjlin/libsvm/matlab/libsvm-mat-3.0-1.zip下載)。svmtrain在matlab自帶的工具箱中也有這個函數(shù)， libing 講libsvm-mat-3.0-1放到C:\Program Files\MATLAB\R2010a\toolbox\目錄，再adddpath和savepath即可。如果產(chǎn)生以下問題：每次都要 adddpath和savepath ，在matlab重新啟動后要重新
adddpath和savepath。解決方案：可以在要運行的程序前面添加如下語句即可：
addpath('C:\Program Files\MATLAB\R2010a\toolbox\libsvm-mat-3.0-1');

README文件寫得很好，其中的Examples完全理解(包括Precomputed Kernels.Constructing a linear kernel matrix and then using the precomputed kernel gives exactly the same testing error as using the LIBSVM built-in linear kernel.核就是相似度，自己想定義什么相似度都可以)

(1) model = svmtrain(training_label_vector, training_instance_matrix [, 'libsvm_options']);

libsvm_options的設置：

Examples of options: -s 0 -c 10 -t 1 -g 1 -r 1 -d 3
Classify a binary data with polynomial kernel (u'v+1)^3 and C = 10

options:

-s svm_type : set type of SVM (default 0)

0 -- C-SVC

1 -- nu-SVC

2 -- one-class SVM

3 -- epsilon-SVR

4 -- nu-SVR
C-SVC全稱是什么？
C-SVC(C-support vector classification),nu-SVC(nu-support vector classification),one-class SVM(distribution estimation),epsilon-SVR(epsilon-support vector regression),nu-SVR(nu-support vector regression)

-t kernel_type : set type of kernel function (default 2)

0 -- linear: u'*v

1 -- polynomial: (gamma*u'*v + coef0)^degree

2 -- radial basis function: exp(-gamma*|u-v|^2)

3 -- sigmoid: tanh(gamma*u'*v + coef0)

-d degree : set degree in kernel function (default 3)

-g gamma : set gamma in kernel function (default 1/num_features)

-r coef0 : set coef0 in kernel function (default 0)

-c cost : set the parameter C of C-SVC, epsilon-SVR, and nu-SVR (default 1)

-n nu : set the parameter nu of nu-SVC, one-class SVM, and nu-SVR (default 0.5)

-p epsilon : set the epsilon in loss function of epsilon-SVR (default 0.1)

-m cachesize : set cache memory size in MB (default 100)

-e epsilon : set tolerance of termination criterion (default 0.001)

-h shrinking: whether to use the shrinking heuristics, 0 or 1 (default 1)

-b probability_estimates: whether to train a SVC or SVR model for probability estimates, 0 or 1 (default 0)

-wi weight: set the parameter C of class i to weight*C, for C-SVC (default 1)

The k in the -g option means the number of attributes in the input data.

(2)如何采用線性核？

matlab> % Linear Kernel

matlab> model_linear = svmtrain(train_label, train_data, '-t 0');

嚴格講，線性核也要像高斯核一樣調(diào)整c這個參數(shù)，Libing wang講一般C=1效果比較好，可能調(diào)整效果差異不大，當然要看具體的數(shù)據(jù)集。c大，從SVM目標函數(shù)可以看出，c越大，相當于懲罰松弛變量，希望松弛變量接近0，即都趨向于對訓練集全分對的情況，這樣對訓練集測試時準確率很高，但推廣能力未必好，即在測試集上未必好。c小點，相當于邊界的有些點容許分錯，將他們當成噪聲點，這樣外推能力比較好。

(3)如何采用高斯核？

matlab> load heart_scale.mat

matlab> model = svmtrain(heart_scale_label, heart_scale_inst, '-c 1 -g 0.07');

高斯的SVM比線性SVM效果要差，為什么？

20150420 libing討論，可能的解釋：樣本少，不適合高斯核。范圍有限，也許更廣泛的參數(shù)范圍會有更好的效果

(4)如何實現(xiàn)交叉驗證？

README文件有如下一句話：If the '-v' option is specified, cross validation is

conducted and the returned model is just a scalar: cross-validation

accuracy for classification and mean-squared error for regression.

(5) 如何調(diào)整高斯核的兩個參數(shù)？

思路1：在訓練集上調(diào)整兩個參數(shù)使在訓練集上測試錯誤率最低，就選這樣的參數(shù)來測試測試集

思路1的問題：Libing Wang講這樣很容易過學習，因為在訓練集上很容易達到100%準確率，但在測試集上未必好，即過學習。用思路2有交叉驗證，推廣性能比較好(交叉驗證將訓練集隨機打亂，推廣性能很好)

思路2：% E:\other\matlab 2007a\work\DCT\DCT_original\network.m

思路2的問題：針對不同的數(shù)據(jù)集，這兩個參數(shù)分別在什么范圍內(nèi)調(diào)整,有沒有什么經(jīng)驗？
方式1：就是network.m中gamma的取值
方式2：http://m.shnenglu.com/guijie/archive/2010/12/02/135243.html.
其他答案：除了在訓練集上做交叉驗證，還有另外一種思路：類似A Regularized Approach to Feature Selection for Face Detection (ACCV 2007)的4.2節(jié):訓練集、驗證集和測試集，Libing講該文4.2節(jié)調(diào)參數(shù)除了分成訓練集、驗證集和測試集，沒有其他什么的。Libing講在訓練集上交叉驗證也相當于訓練集挑了一部分做驗證，原理一樣。

(6)如何采用預定義核？

To use precomputed kernel, you must include sample serial number asthe first column of the training and testing data (assume your kernel matrix is K, # of instances is n):

matlab> K1 = [(1:n)', K]; % include sample serial number as first column

matlab> model = svmtrain(label_vector, K1, '-t 4');

matlab> [predict_label, accuracy, dec_values] = svmpredict(label_vector, K1, model); % test the training data

We give the following detailed example by splitting heart_scale into 150 training and 120 testing data. Constructing a linear kernel matrix and then using the precomputed kernel gives exactly the same testing error as using the LIBSVM built-in linear kernel.

matlab> load heart_scale.mat

matlab>

matlab> % Split Data

matlab> train_data = heart_scale_inst(1:150,:);

matlab> train_label = heart_scale_label(1:150,:);

matlab> test_data = heart_scale_inst(151:270,:);

matlab> test_label = heart_scale_label(151:270,:);

matlab>

matlab> % Linear Kernel

matlab> model_linear = svmtrain(train_label, train_data, '-t 0');

matlab> [predict_label_L, accuracy_L, dec_values_L] = svmpredict(test_label, test_data, model_linear);

matlab>

matlab> % Precomputed Kernel

matlab> model_precomputed = svmtrain(train_label, [(1:150)', train_data*train_data'], '-t 4');

matlab> [predict_label_P, accuracy_P, dec_values_P] = svmpredict(test_label, [(1:120)', test_data*train_data'], model_precomputed);

matlab>

matlab> accuracy_L % Display the accuracy using linear kernel

matlab> accuracy_P % Display the accuracy using precomputed kernel

(7)如何實現(xiàn)概率估計？

For probability estimates, you need '-b 1' for training and testing:

matlab> load heart_scale.mat

matlab> model = svmtrain(heart_scale_label, heart_scale_inst, '-c 1 -g 0.07 -b 1');

matlab> load heart_scale.mat

matlab> [predict_label, accuracy, prob_estimates] = svmpredict(heart_scale_label, heart_scale_inst, model, '-b 1');

非概率估計

matlab> load heart_scale.mat

matlab> model = svmtrain(heart_scale_label, heart_scale_inst, '-c 1 -g 0.07');

matlab> [predict_label, accuracy, dec_values] = svmpredict(heart_scale_label, heart_scale_inst, model); % test the training data

(8) svmpredict的用法(摘自libsvm-mat-2.9-1的README)

[predicted_label, accuracy, decision_values/prob_estimates] = svmpredict(testing_label_vector, testing_instance_matrix, model [, 'libsvm_options']);

輸入：testing_label_vector， If labels of test data are unknown, simply use any random values. (type must be double)。模型一旦確定，預測的標記就確定了，如果不利用第二個輸出accuracy，則testing_label_vector隨便設置，當然如果要利用accuracy，就要將testing_label_vector設置成測試標記了。(Action recognition\ASLAN database中的代碼CLSlibsvmC，第九行用到svmpredict，testing_label_vector設置成ones(size(Samples,2),1)，是無所謂的)。

svmpredict輸出的含義：

predictd_label, is a vector of predicted labels(故CLSlibsvmC的12到14行沒用);

胥志偉I-Rising(285308540) 2015/5/31 21:56:30
各位老師，同學。請問有人研究過svm 中predict的decision value嗎？麻煩幫忙解釋下怎么計算的。謝謝。
蘇松志-T-廈大(14291414) 2015/5/31 22:04:40
指的是libsvm？優(yōu)化完之后得到每個支持向量的alph_i,然后，計算輸入的x和各個支持向量的核函數(shù)距離ki，sum(ki*alph_i*yi) for all SUPPORT VECTORS,+b,就是按照教科書上的公式
說明：以上understand completely,對應楊光正教材P28公式(2.54)
胥志偉I-Rising(285308540) 2015/5/31 22:08:30
多類的問題也是這樣嗎？
蘇松志-T-廈大(14291414) 2015/5/31 22:13:41
是的，多類采用的是1vs1，所有里面的sv_coef是一個mxn的矩陣，m：支持向量的數(shù)目，n：類別數(shù)目-1，然后有一個rho向量，套用公式，計算

摘自libsvm-mat-3.0-1的README

The function 'svmpredict' has three outputs. The first one, predictd_label, is a vector of predicted labels. The second output, accuracy, is a vector including accuracy (for classification), mean squared error, and squared correlation coefficient (for regression). The third is a matrix containing decision values or probability estimates (if '-b 1' is specified). If k is the number of classes, for decision values, each row includes results of predicting k(k-1)/2 binary-class SVMs. For probabilities, each row contains k values indicating the probability that the testing instance is in each class. Note that the order of classes here is the same as 'Label' field in the model structure.

(9)LibSVM是如何采用one-versus-rest和one-verse-one實現(xiàn)多類分類的？
one-versus-rest和one-verse-one的定義見模式識別筆記第四頁反面(同時見孫即祥教材P47)。找libing wang和junge zhang,他們都講沒對這個深究過。根據(jù)“If k is the number of classes, for decision values, each row includes results of predicting k(k-1)/2 binary-class SVMs. For probabilities, each row contains k values indicating the probability that the testing instance is in each class. ”，我覺得應該是probabilities實現(xiàn)的是one-versus-rest，即采用-b 1這個選項，他倆都覺得我理解應該是正確的。junge講參加pascal競賽和imagenet，他們都是訓練k個SVM(即one-versus-rest,沒用one-versus-one,后者太慢，而且估計效果差不多)，沒有直接采用SVM做多類問題。
20130910 LibSVM作者回信：
Libsvm implements only 1vs1.
For 1vsrest, you can check the following
libsvm faq

Q: LIBSVM supports 1-vs-1 multi-class classification. If instead I would
like to use 1-vs-rest, how to implement it using MATLAB interface?
網(wǎng)址：http://www.csie.ntu.edu.tw/~cjlin/libsvm/faq.html#f808
Q: LIBSVM supports 1-vs-1 multi-class classification. If instead I would like to use 1-vs-rest, how to implement it using MATLAB interface?

Please use code in the following directory. The following example shows how to train and test the problem dna (training and testing).

Load, train and predict data:

[trainY trainX] = libsvmread('./dna.scale');
 [testY testX] = libsvmread('./dna.scale.t');
model = ovrtrain(trainY, trainX, '-c 8 -g 4'); 
[pred ac decv] = ovrpredict(testY, testX, model);
 fprintf('Accuracy = %g%%\n', ac * 100);

Conduct CV on a grid of parameters

bestcv = 0; 
for log2c = -1:2:3,  
  for log2g = -4:2:1,    
    cmd = ['-q -c ', num2str(2^log2c), ' -g ', num2str(2^log2g)];    
    cv = get_cv_ac(trainY, trainX, cmd, 3);   
    if (cv >= bestcv),      
      bestcv = cv; bestc = 2^log2c; bestg = 2^log2g;    
    end    
    fprintf('%g %g %g (best c=%g, g=%g, rate=%g)\n', log2c, log2g, cv, bestc, bestg, bestcv);   
  end 
end

(9)如何實現(xiàn)驗證模式下的準確率？
見我寫的程序RVM\code\Yale\SVM\TestYale_SVM_2classes
--------------------------------------------------------------------------------------------------------------------------------------------------------
http://blog.sina.com.cn/s/blog_64b046c701018c8n.html

MATLAB自帶的svm實現(xiàn)函數(shù)與libsvm差別小議

1 MATLAB自帶的svm實現(xiàn)函數(shù)僅有的模型是C-SVC(C-support vector classification)；而libsvm工具箱有C-SVC(C-support vector classification),nu-SVC(nu-support vector classification),one-class SVM(distribution estimation),epsilon-SVR(epsilon-support vector regression),nu-SVR(nu-support vector regression)等多種模型可供使用。
2 MATLAB自帶的svm實現(xiàn)函數(shù)僅支持分類問題，不支持回歸問題；而libsvm不僅支持分類問題，亦支持回歸問題。
3 MATLAB自帶的svm實現(xiàn)函數(shù)僅支持二分類問題，多分類問題需按照多分類的相應算法編程實現(xiàn)；而libsvm采用1v1算法支持多分類。
4 MATLAB自帶的svm實現(xiàn)函數(shù)采用RBF核函數(shù)時無法調(diào)節(jié)核函數(shù)的參數(shù)gamma，貌似僅能用默認的；而libsvm可以進行該參數(shù)的調(diào)節(jié)。
5 libsvm中的二次規(guī)劃問題的解決算法是SMO；而MATLAB自帶的svm實現(xiàn)函數(shù)中二次規(guī)劃問題的解法有三種可以選擇：經(jīng)典二次方法；SMO；最小二乘。（這個是我目前發(fā)現(xiàn)的MATLAB自帶的svm實現(xiàn)函數(shù)唯一的優(yōu)點~）

--------------------------------------------------------------------------------------------------------------------------------------------------------

SVM 理論部分

SVM下面推導核化形式(Eric Xing教材)+M. Belkin, P. Niyogi, and V. Sindhwani, “Manifold Regularization: AGeometric Framework for Learning from Labeled and Unlabeled Examples,” J. Machine Learning Research, vol. 7, pp. 2399-2434, 2006的4.3和4.4節(jié).+Ensemble Manifold Regularization (TPAMI 2012)

電腦里的"ZhuMLSS14.pdf"是很好的入門材料

posted @ 2013-09-05 14:47 杰哥閱讀(12056) | 評論 (6) | 編輯收藏

[zz]用MATLAB做T檢驗（ttest）

http://blog.csdn.net/xiao_xia_/article/details/6906465

t-檢驗：

t-檢驗，又稱student‘s t-test，可以用于比較兩組數(shù)據(jù)是否來自同一分布（可以用于比較兩組數(shù)據(jù)的區(qū)分度），假設了數(shù)據(jù)的正態(tài)性，并反應兩組數(shù)據(jù)的方差在統(tǒng)計上是否有顯著差異。

matlab中提供了兩種相同形式的方法來解決這一假設檢驗問題，分別為ttest方法和ttest2方法，兩者的參數(shù)、返回值類型均相同，不同之處在于ttest方法做的是 One-sample and paired-sample t-test，而ttest2則是 Two-sample t-test with pooled or unpooled variance estimate， performs an unpaired two-sample t-test。但是這里至于paired和unpaired之間的區(qū)別我卻還沒搞清楚，只是在Student's t-test中看到了如下這樣一段解釋：

“Two-sample t-tests for a difference in mean involve independent samples, paired samples and overlapping samples. Pairedt-tests are a form of blocking, and have greater powerthan unpaired tests when the paired units are similar with respect to "noise factors" that are independent of membership in the two groups being compared.^[8] In a different context, paired t-tests can be used to reduce the effects ofconfounding factors in an observational study.”

因此粗略認為paired是考慮了噪聲因素的。

在同樣的兩組數(shù)據(jù)上分別用ttest和ttest2方法得出的結果進行比較，發(fā)現(xiàn)ttest返回的參數(shù)p普遍更小，且置信區(qū)間ci也更小。

最常用用法：
[H,P,CI]=ttest2(x,y);（用法上ttest和ttest2相同，完整形式為[H,P,CI, STATS]=ttest2(x,y, ALPHA);）

其中，x，y均為行向量（維度必須相同），各表示一組數(shù)據(jù)，ALPHA為可選參數(shù)，表示設置一個值作為t檢驗執(zhí)行的顯著性水平（performs the test at the significance level
(100*ALPHA)%），在不設置ALPHA的情況下默認ALPHA為0.05，即計算x和y在5%的顯著性水平下是否來自同一分布(假設是否被接受）
結果:H=0，則表明零假設在5%的置信度下不被拒絕（根據(jù)當設置x=y時候，返回的H=0推斷而來），即x，y在統(tǒng)計上可看做來自同一分布的數(shù)據(jù)；H=1，表明零假設被拒絕，即x，y在統(tǒng)計上認為是來自不同分布的數(shù)據(jù)，即有區(qū)分度。

P為一個概率，matlab help中的解釋是“ the p-value, i.e., the probability of observing the given result, or one more extreme, by chance if the null hypothesis is true. Small values of P cast doubt on the validity of the null hypothesis.” 暫且認為表示判斷值在真實分布中被觀察到的概率（？不太懂）

CI為置信區(qū)間(confidence interval)，表示“a 100*(1-ALPHA)% confidence interval for the true difference of population means”，即達到100*(1-ALPHA)%的置信度的數(shù)據(jù)區(qū)間（？）

應用：常與k-fold crossValidation（交叉驗證）聯(lián)用可以用于兩種算法效果的比較，例如A1,A2兩算法得出的結果分別為x，y，且從均值上看mean(x)>mean(y)，則對[h,p,ci]=ttest2(x,y);當h=1時，表明可以從統(tǒng)計上斷定算法A1的結果大于（？）A2的結果（即兩組數(shù)據(jù)均值的比較是有意義的），h=0則表示不能根據(jù)平均值來斷定兩組數(shù)據(jù)的大小關系（因為區(qū)分度小）

臨時學的，沒經(jīng)過太多測試，不一定對，還請高手指教。

另外還有在某個ppt(http://jura.wi.mit.edu/bio/education/hot_topics/pdf/matlab.pdf)中看到這樣一頁

參考資料：

經(jīng)驗+自身理解

matlab 7.11.0（R2010b）的幫助文檔

wikipedia

http://www.biosino.org/pages/newhtm/r/schtml/One_002d-and-two_002dsample-tests.html

posted @ 2013-08-30 19:41 杰哥閱讀(4501) | 評論 (0) | 編輯收藏

[zz]Robust PCA 學習筆記

http://blog.csdn.net/abcjennifer/article/details/8572994

很久沒有寫學習筆記了，年初先后忙考試，忙課程，改作業(yè)，回家剛安定下來，讀了大神上學期給的paper，這幾天折騰數(shù)學的感覺很好，就在這里和大家一起分享一下，希望能夠有所收獲。響應了Jeffrey的建議，強制自己把筆記做成英文的，可能給大家?guī)黹喿x上的不便，希望大家理解，多讀英文的東西總沒壞處的。這里感謝大神和我一起在本文手稿部分推了一些大牛的“ easily achieved”stuff... 本文尚不成熟，我也是初接觸Robust PCA,希望各位能夠拍磚提出寶貴意見。

Robust PCA

Rachel Zhang

1. RPCA Brief Introduction

1. Why use Robust PCA?

Solve the problem withspike noise with high magnitude instead of Gaussian distributed noise.

2. Main Problem

Given C = A*+B*, where A*is a sparse spike noise matrix and B* is a Low-rank matrix, aiming at recoveringB*.

B*= UΣV’, in which U∈R^n*k ,Σ∈R^k*k ,V∈R^n*k

3. Difference from PCA

Both PCA and Robust PCAaims at Matrix decomposition, However,

In PCA, M = L0+N0, L0:low rank matrix ; N0: small idd Gaussian noise matrix,it seeks the best rank-k estimation of L0 by minimizing ||M-L||₂ subjectto rank(L)<=k. This problem can be solved by SVD.

In RPCA, M = L0+S0, L0:low rank matrix ; S0: a sparse spikes noise matrix, we’lltry to give the solution in the following sections.

2. Conditionsfor correct decomposition

4. Ill-posed problem:

Suppose sparse matrix A* and B*=e_ie_j^Tare the solution of this decomposition problem.

1) With the assumption that B* is not only low rank but alsosparse, another valid sparse-plus low-rank decomposition might be A₁= A*+ e_ie_j^T andB₁ = 0, Therefore, we need an appropriatenotion of low-rank that ensures that B* is not too sparse. Conditionswill be imposed later that require the space spanned by singular vectors U andV (i.e., the row and column spaces of B*) to be “incoherent” with the standardbasis.

2) Similarly, with the assumption that A* is sparse as well aslow-rank (e.g. the first column of A* is non-zero, while all other columns are0, then A* has rank 1 and sparse). Another valid decomposition might be A₂=0,B₂ = A*+B* (Here rank(B₂) <= rank(B*) + 1). Thus weneed the limitation that sparse matrix should beassumed not to be low rank. I.e., assume each row/column does not havetoo many non-zero entries (don’t exist dense row/column), to avoid such issues.

5. Conditions for exact recovery / decomposition:

If A* and B* are drawnfrom these classes, then we have exact recovery with high probability [1].

1) For low rank matrix L---Random orthogonal model [Candes andRecht 2008]:

A rank-k matrix B* with SVD B*=UΣV’ is constructed in this way: Thesingular vectors U,V∈R^n*k are drawnuniformly at random from the collection of rank-k partial isometries(等距算子) inR^n*k. The singular vectors in U and V need not be mutuallyindependent. No restriction is placed on the singular values.

2) For sparse matrix S---Random sparsity model:

The matrix A* such that support(A*) is chosen uniformlyat random from the collection of all support sets of size m. There is noassumption made about the values of A* at locations specified by support(A*).

[Support(M)]: thelocations of the non-zero entries in M

Latest [2] improved on the conditions andyields the ‘best’ condition.

3. Recovery Algorithms

6. Formulization

For decomposition D = A+E,in which A is low rank and error E is sparse.

1) Intuitively propose

minrank(A)+γ||E||₀, (1)

However, it is non-convex thus intractable (both of these 2are NP-hard to approximate).

2) Relax L0-norm to L1-norm and replace rank with nuclear norm

min||A||_*+ λ||E||₁,

where||A||_*=Σ_iσ_i(A) (2)

This is convex, i.e., exist a uniquely minimizer.

Reason: This relaxation ismotivated by observing that ||A||_*+ λ||E||₁is theconvex envelop (凸包) of rank(A)+γ||E||₀ over the set of (A,E) suchthat max(||A||_2,2,||E||_{1, ∞})≤1.

Moreover, there might be circumstancesunder which (2) perfectly recovers low-rank matrix A₀.[3] shows itis indeed true under surprising broad conditions.

7. Approach RPCA Optimization Algorithm

We approach in twodifferent ways. 1^st approach, use afirst-order method to solve the primal problem directly. (E.g. ProximalGradient, Accelerated Proximal Gradient (APG)), the computational bottleneck ofeach iteration is a SVD computation. 2^ndapproach is to formulate and solve the dual problem, and retrieve the primalsolution from the dual optimal solution. The dual problem too RPCA canbe written as:

max_Ytrace(D^TY) , subject to J(Y) ≤ 1

where J(Y) = max(||Y||₂,λ^-1||Y||_∞). ||A||_x means the x norm of A.(無窮范數(shù)表示矩陣中絕對值最大的一個)。This dual problem can besolved by constrained steepest ascent.

Now let’s talk about Augmented Lagrange Multiplier (ALM) and Alternating Directions Method (ADM) [2,4].

7.1. General method of ALM

For the optimizationproblem

min f(X), subj. h(X)=0 (3)

we can define the Lagrangefunction:

L(X,Y,μ) = f(X)+<Y, h(x)>+μ/2||h(X)||_F² (4)

where Y is a Lagrangemultiplier and μ is a positive scalar.

General method of ALM is:

A genetic Lagrangemultiplier algorithm would solve PCP(principle component pursuit) by repeatedlysetting (X_k) = arg min L(X_k,Y_k,μ), and then the Lagrangemultiplier matrix via Y_k+1=Y_k+μ(h_k(X))

7.2 ALM algorithm for RPCA

In RPCA, we can define (5)

X = (A,E), f(x) = ||A||_*+ λ||E||₁, andh(X) = D-A-E

Then the Lagrange functionis (6)

L(A,E,Y, μ) = ||A||_*+ λ||E||₁+ <Y, D-A-E>+μ/2·|| D-A-E ||_F²

The Optimization Flow is just like the general ALM method. The initialization Y= Y₀* isinspired by the dual problem(對偶問題) as it is likely to make the objective function value <D,Y₀*> reasonably large.

Theorem 1. For algorithm 4, any accumulation point (A*,E*) of (A_k*, E_k*)is an optimal solution to the RPCA problem and the convergence rate is at least O(μ_k^-1).[5]

Inthis RPCA algorithm, a iteration strategy is adopted. As the optimizationprocess might be confused, we impose 2 well-known facts: (7)(8)

Sε[W] = arg min_Xε||X||₁+ ½||X-W||_F^{2 （7）}

U Sε[W] V^T = arg min_Xε||X||_*+½||X-W||_F^{2 （8）}

which is used in the abovealgorithm for optimization one parameter while fixing another. In thesolutions, Sε[W] is the soft thresholding operator. Here I will impose the problemto speculate this. (To facilitate writing formulation and ploting, I use amanuscript. )

BTW,S_u(x) is easily implemented by 2 lines:

S_u(X)= max(x-u , 0);

S_u(X)= S_u(X) + min(X+u , 0);

Now we utilize formulation (7,8) into RPCA problem.

For the objective function (6) w.r.t get optimal E, wecan rewrite the objective function by deleting unrelated component into:

f(E) = λ||E||₁+ <Y, D-A-E> +μ/2·|| D-A-E||_F²

=λ||E||₁+ <Y, D-A-E> +μ/2·||D-A-E ||_F²+(μ/2)||μ^-1Y||² //add an irrelevant item w.r.t E

=λ||E||₁+(μ/2)(2(μ^-1Y· (D-A-E))+|| D-A-E ||_F²+||μ^-1Y||²) //try to transform into (7)’s form

=(λ/μ)||E||1+½||E-(D-A-μ^-1Y)||_F²

Finally we get the form of (7) and in the optimizationstep of E, we have

E = S_λ/μ[D-A-μ^-1Y]

,same as what mentioned in algorithm 4.

Similarly, for matrices X, we can prove A=US_1/μ(D-E-μ^-1Y)V is theoptimization process of A.

8. Experiments

Here I've tested on a video data. This data is achieved from a fixed point camera and the scene is same at most time, thus the active variance part can be regarded as error E and the stationary/invariant part serves as low rank matrix A. The following picture shows the result. As the person walks in, error matrix has its value. The 2 subplots below represent low rank matrix and sparse one respectively.

9. Reference:

1) E. J. Candes and B. Recht. Exact Matrix Completion Via ConvexOptimization. Submitted for publication, 2008.

2) E. J. Candes, X. Li, Y. Ma, and J. Wright. Robust PrincipalComponent Analysis Submitted for publication, 2009.

3) Wright, J., Ganesh, A., Rao, S., Peng, Y., Ma, Y.: Robustprincipal component analysis: Exact recovery of corrupted low-rank matrices viaconvex optimization. In: NIPS 2009.

4) X. Yuan and J. Yang. Sparse and low-rank matrix decompositionvia alternating direction methods. preprint, 2009.

5) Z. Lin, M. Chen, L. Wu, and Y. Ma. The augmented Lagrangemultiplier method for exact recovery of a corrupted low-rank matrices.Mathematical Programming, submitted, 2009.

6) Generalized Power method for Sparse Principal Component Analysis

posted @ 2013-08-12 15:46 杰哥閱讀(696) | 評論 (0) | 編輯收藏

常用鏈接

留言簿(57)

隨筆分類

隨筆檔案

相冊

Other

Paper submission

福彩

留學相關

論壇

搜索

學者

郵箱

中科大和中科院

搜索

最新評論

閱讀排行榜

評論排行榜

nnsetup

nntrain

nnff

nnbp

nnapplygrads

nntest

nnpredict

總結