Re: [徵文] Self-Normalizing Neural Networks

看板DataScience作者MXNet (MXNet)時間7年前 (2018/07/09 16:53)推噓3(3推 0噓 2→)

留言5則, 4人參與討論串3/4 (看更多)

※ 引述《PyTorch (PY火炬)》之銘言： : 感謝MXNet大的詳細解說 : 想請教MXNet : 我一直以來有個疑惑未明 : 就是selu是make Feed-forward great again : 但是如果加在convolution layer也有self normalize的效果嗎？ : 以這篇post的作者使用DCGAN的經驗來看 : https://ajolicoeur.wordpress.com/cats/ : “All my initial attempts at generating cats in 128 x 128 with DCGAN failed. : However, simply by replacing the batch normalizations and ReLUs with SELUs, : I was able to get slow (6+ hours) but steady convergence with the same learning : rates as before. : SELUs are self-normalizing and thus remove the need for batch normalization.” : 看似是selu也能用在convolution layer且self normalize : 不知道數學上也能支持這件事嗎？ : selu paper裡的數學推導應該是在Feed-forward的前提？簡答：我看完後，認為是 support 的。長的回答：關於數學上假設的部分，在論文中的第四頁的這段 "Deriving the Mean and Variance Mapping Function g" 裡面有提到，我們關心的是 z 的 distribution 的假設，即兩個 layer 之間，後面的那個 layer 的 input 為 z = Wx 的 distribution。那麼要考慮 distribution 的時候，我們根據中央極限定理，夠多各式各樣的 z 形成的 distribution 是 normal dist。而且是以 E(z) 為中心，var 為 Var(z) 的一個 normal dist。那麼我們現在就來想一下 E(z) E(z) = μ ω 這裡的 ω 是 weight matrix 的 mean， CNN 的 case 是能夠被計算的，沒問題。那麼在單一一個 layer 中的參數量夠大的就會越符合 normal dist。越寬的網路越符合這個假設，文中是提到的是一個 layer 上百個 node 以上是很常見的，所以就當作這個假設成立。 -- ※ 發信站: 批踢踢實業坊(ptt.cc), 來自: 140.113.73.135 ※ 文章網址: https://www.ptt.cc/bbs/DataScience/M.1531126386.A.9BB.html