[討論] Deep learning - the "why" question.

看板DataScience作者 (ka)時間6年前 (2018/10/14 19:04), 編輯推噓8(8014)
留言22則, 10人參與, 6年前最新討論串1/1
https://blog.piekniewski.info/2018/10/13/deep-learning-the-why-question/ There are many many deep learning models out there doing various things. Depending on the exact task they are solving, they may be constructed differently. Some will use convolution followed by pooling. Some will use several convolutional layers before there is any pooling layer. Some will use max-pooling. Some will use mean-pooling. Some will have a dropout added. Some will have a batch-norm layer here and there. Some will use sigmoid neurons, some will use half-recitfiers. Some will classify and therefore optimize for cross-entropy. Others will minimize mean-squared error. Some will use unpooling layers. Some will use deconvolutional layers. Some will use stochastic gradient descent with momentum. Some will use ADAM. Some will have RESNET layers, some will use Inception. The choices are plentiful (see e.g.here). Reading any of these particular papers, one is faced with a set of choices the authors had made, followed by the evaluation on the dataset of their choice. The discussion of choices typically refers strongly to papers where given techniques were first introduced, whereas the results section typically discusses in detail the previous state of the art. The shape of the architecture is often broken down into obvious and non obvious decisions. The obvious ones are dictated by the particular task the authors are trying to solve (e.g. when they have an autoencoding like task, they obviously use a form of an autoencoder). The non obvious choices would include questions similar to those: Why did they use 3x3 conv followed by 1x1 conv and only then pooling? Why did they only replaced the 3 middle layers with MobileNet layers (ridiculous name BTW)? Why did they slap batch-norm only in the middle two layers and not all of them? Why did they use max-pooling in the first two layers and no pooling whatsoever in the following three? Obvious stuff is not discussed because it is obvious, the non-obvious stuff is not discussed because ... let me get back to that in a moment. In my opinion discussing these questions separates a paper from something at least shallowly scientific from complete charlatanry, even if thecharlatanry appears to improve the results on the given dataset. The sad truth, that few even talk about, is that in the vast majority of cases the answers to the why questions are purely empirical: they tried a bunch of models and these worked best - it is called "hyperparameter tuning" (or meta-parameter tuning). What does that tell us? A few things, first the authors are completely ignoring the danger of multiple hypothesis testing and generally piss on any statistical foundations of their "research". Second, they probably have more GPU's accessible than they know what to do with (very often they case in big companies these days). Third, they just want to stamp their names on some new record breaking benchmark, that obviously will be broken two weeks later by somebody who takes their model and does some extra blind tweaking, utilizing even more GPU power. This is not science. This has more to do with people who build beefy PC's and submit their 3dMark results to hold a record for a few days. It is a craft, no doubt, but it is not science. The PC-builders don't make a pretense for this to be any science. The deep learning people do. They write what appears to be research papers, just to describe their GPU-rig and the result of their random meta-parameter search, with perhaps some tiny shreds of real scientific discussion. Benchmark results provide a nice cover, to claim that the paper is in some way "novel" and interesting, but truth to the mater is, they just overfitted that dataset some more. They might just as well memorize the entire dataset in their model and achieve 100% accuracy, who cares? (read my AI winter addendum post for some interesting literature on the subject). Similarly to the difference between chemistry andalchemy, the scientific discussion is about building a concept, a theory that will enable one to make accurate predictions. Something to guide their experimental actions. Science does not need to make gold out of lead every time, or in the case of machine learning, a real scientific paper in this field does not need to beat some current benchmark. A scientific paper does not even need to answer any questions, if it happens to ask some good ones. Now obviously there are exceptions, a small fraction of papers have interesting stuff in them. These are mostly the ones which try to show the deficits of deep learning and engage into a discussion as to why that might be the case. So next time you read a deep learning paper, try to contemplate these quiet and never explained choices the authors have made. You'll be shocked to see how many of those are hidden between the lines. ---------------------------- 這篇文章打到我心坎裡 就算上了這麼多課,實作了好幾十個model 也打了好幾場Kaggle 甚至即將發布paper 對自己還是有很深的imposter syndrome 到底,為什麼這個kernel是3不是2、filter為什麼設64成績就是比128好看 工作最多的時間除了 data preprocessing 就是弄到質疑人生的 hyperparameter tuning 懷疑自己到底是在搞學術研究,還是窮舉各種可能性以結果論的資料煉金術 去年韓國人拿掉了幾層BN就奪了NTIRE冠軍 除了大家津津樂道地事後諸葛BN本來就不適合super resolution云云 難到這背後到底有什麼經得起嚴謹檢驗的基礎嗎? 這種自我質疑是來自我自己過深的自卑感還是其實沒那麼罕見呢? ----- Sent from JPTT on my Asus ASUS_Z01KDA. -- ※ 發信站: 批踢踢實業坊(ptt.cc), 來自: 61.231.154.103 ※ 文章網址: https://www.ptt.cc/bbs/DataScience/M.1539515084.A.E47.html

10/14 19:18, 6年前 , 1F
推這個想法
10/14 19:18, 1F

10/14 19:18, 6年前 , 2F
現在這個領域就是各種亂兜model拼publication
10/14 19:18, 2F

10/14 19:18, 6年前 , 3F
但是底子裡是不是真的有東西就難說了
10/14 19:18, 3F

10/14 19:25, 6年前 , 4F
還好吧 他自己也提到化學 拉塞福也嗆過化學是集郵活動啊
10/14 19:25, 4F

10/14 19:28, 6年前 , 5F
效果在那邊 邊往前邊探討罷了 化學生物醫學什麼的哪個沒
10/14 19:28, 5F

10/14 19:29, 6年前 , 6F
經歷過這一段
10/14 19:29, 6F

10/14 21:21, 6年前 , 7F
目前無法解決 因為投資報酬率不成比例 大概要等出現某
10/14 21:21, 7F

10/14 21:21, 6年前 , 8F
種瓶頸或是熱潮泡沫化
10/14 21:21, 8F

10/14 21:23, 6年前 , 9F
這也牽涉到水論文很多 還有很大一部分無法reproduce之
10/14 21:23, 9F

10/14 21:23, 6年前 , 10F
類的問題 總之有這個問題意識是好事…
10/14 21:23, 10F

10/14 22:37, 6年前 , 11F
事實如此, 不用懷疑了
10/14 22:37, 11F

10/14 23:22, 6年前 , 12F
研究不就這樣 沒有這些垃圾產出的經驗 也找不到正確的路
10/14 23:22, 12F

10/15 18:16, 6年前 , 13F
發現後再用數學證明很普遍吧
10/15 18:16, 13F

10/18 20:29, 6年前 , 14F
以業界來說,能work就好,不會要求嚴謹證明
10/18 20:29, 14F

10/19 23:17, 6年前 , 15F
業界要求比學界paper高多了,尤其是非純吸金炒作要出產品
10/19 23:17, 15F

10/19 23:17, 6年前 , 16F
的公司
10/19 23:17, 16F

10/20 10:57, 6年前 , 17F
業界要求面向不同啦,業界訴求是1.穩定2.速度快3.可預期
10/20 10:57, 17F

10/20 10:58, 6年前 , 18F
對於「不知道什麼時候會吐出壞掉東西」的Model實務上其實
10/20 10:58, 18F

10/20 10:58, 6年前 , 19F
是非常難真的上線應用的,偏偏DL Model常常很容易出現無
10/20 10:58, 19F

10/20 10:59, 6年前 , 20F
法預測的output,然後出錯的時候又很容易不確定原因在哪
10/20 10:59, 20F

10/20 11:00, 6年前 , 21F
是不是足夠創新反而不是主要訴求的點,有人願意付錢就好
10/20 11:00, 21F

10/23 19:24, 6年前 , 22F
我認為可能需要嚴謹的理論基礎,不過也不一定對
10/23 19:24, 22F
文章代碼(AID): #1RmoBCv7 (DataScience)
文章代碼(AID): #1RmoBCv7 (DataScience)