[討論] Teacher Student Model Semi-supervised
在 Billion-scale semi-supervised learning for image classification(Facebook A
I Research)
當中有提到student model不將D與D-hat合起來train的原因:
Remark: It is possible to use a mixture of data in D and Dˆ for training like
in previous approaches [34]. However, this requires for searching for optimal
mixing parameters, which depend on other parameters. This is resource-intensi
ve in the case of our large-scale training. Additionally, as shown later in ou
r analysis, taking full advantage of large-scale un- labelled data requires ad
opting long pre-training schedules, which adds some complexity when mixing is
involved.
不太確定第一個原因searching for mixing parameters指的是?
及第二個原因 D+D-hat不是在training student model前就準備好了嗎?
為何會增加complexity
謝謝大家
--
※ 發信站: 批踢踢實業坊(ptt.cc), 來自: 140.115.59.247 (臺灣)
※ 文章網址: https://www.ptt.cc/bbs/DataScience/M.1611575265.A.D9C.html
推
01/26 20:21,
3年前
, 1F
01/26 20:21, 1F
→
01/26 20:21,
3年前
, 2F
01/26 20:21, 2F
→
01/26 20:24,
3年前
, 3F
01/26 20:24, 3F
→
01/26 21:20,
3年前
, 4F
01/26 21:20, 4F
DataScience 近期熱門文章
PTT數位生活區 即時熱門文章