[討論] Teacher Student Model Semi-supervised

看板DataScience作者 (中立評論員)時間3年前 (2021/01/25 19:47), 編輯推噓1(103)
留言4則, 2人參與, 3年前最新討論串1/1
在 Billion-scale semi-supervised learning for image classification(Facebook A I Research) 當中有提到student model不將D與D-hat合起來train的原因: Remark: It is possible to use a mixture of data in D and Dˆ for training like in previous approaches [34]. However, this requires for searching for optimal mixing parameters, which depend on other parameters. This is resource-intensi ve in the case of our large-scale training. Additionally, as shown later in ou r analysis, taking full advantage of large-scale un- labelled data requires ad opting long pre-training schedules, which adds some complexity when mixing is involved. 不太確定第一個原因searching for mixing parameters指的是? 及第二個原因 D+D-hat不是在training student model前就準備好了嗎? 為何會增加complexity 謝謝大家 -- ※ 發信站: 批踢踢實業坊(ptt.cc), 來自: 140.115.59.247 (臺灣) ※ 文章網址: https://www.ptt.cc/bbs/DataScience/M.1611575265.A.D9C.html

01/26 20:21, 3年前 , 1F
應該是指D跟D hat sample的比例吧 他ref的那篇是固定6:4
01/26 20:21, 1F

01/26 20:21, 3年前 , 2F
他懶得調參
01/26 20:21, 2F

01/26 20:24, 3年前 , 3F
第二個就是全部用下去要train很久 所以乾脆不用
01/26 20:24, 3F

01/26 21:20, 3年前 , 4F
好的 謝謝~
01/26 21:20, 4F
文章代碼(AID): #1W3g_XsS (DataScience)
文章代碼(AID): #1W3g_XsS (DataScience)