[問題] 用R抓取網路資料
我想要抓取http://www.twse.com.tw/ch/trading/fund/T86/T86.php中的表格
經檢視碼確認編碼為UTF-8,但其程式碼跑出來卻是亂碼
[軟體熟悉度]:
新手(使用R3個月,有將Wush Wu的翻轉教室上過一次,知道概念)
[程式範例]:
page.html<-read.html("http://www.twse.com.tw/ch/trading/fund/T86/T86.php",encoding
= "UTF-8")
version.block<-html_node(page.html,"table border='1' align='center'
style='width:1400px;' id='tbl-sortable-header")
html_text(version.block)
【輸出成果】:
> html_text(version.block)
[1] "\n\t\n \xe5阋\xe7哐霅桧驼鈭斗\x98猟\x89\u0080\n\n \n
ENGLISH\xc2í\xa0 \n \xe6鞒\xe6珻隤鏛í\xa0\n Twitter\n
Facebook\n Plurk\n \n\n \n \n 蝺栶\xb8簧瓱\xe6鹁\n
蝬脩\xab⒠琔\xe5\x9c\x96\n 蝬脩\xab⒡\xb1弡阔\n 霅桧驼蝺函
Ⅳ\n 霅桧驼閰霶\xbd\x99\n \n \n \xe7灜\xe9\x97\x9c菝
\x8b⒠像\xe5阋\n \xe5饬\xe9\x96鹑\xb3殴\xa8篑\xa7\u0080皜祉\xab\x99\n
\xe5腖\xe6珻撣弴\xb3癴ē撠\x8e\n 蝬脰楝鞈殴\xa8箫\x95疟
\xba\x97\n \xe6\x8a厠\xb3乐犖\xe7缷霅咹雯\n 敶梢耯\xe5嘘\xe6軽蝬
\xb2\n 蝚砌\xb8栉硅\xe8\x88桧ē銝剖\xbf阵像\xe5阋\n 鞎∪\x8b⒡
\xaf鯒\xbc侠暺鞇\u0080\x9a\n \n\t\t\t\n \n\n\t\n"
--
※ 發信站: 批踢踢實業坊(ptt.cc), 來自: 49.159.166.248
※ 文章網址: https://www.ptt.cc/bbs/R_Language/M.1494510281.A.472.html
→
05/11 21:55, , 1F
05/11 21:55, 1F
→
05/11 22:15, , 2F
05/11 22:15, 2F
→
05/11 22:15, , 3F
05/11 22:15, 3F
→
05/11 22:21, , 4F
05/11 22:21, 4F
→
05/11 22:22, , 5F
05/11 22:22, 5F
→
05/11 22:27, , 6F
05/11 22:27, 6F
→
05/11 22:39, , 7F
05/11 22:39, 7F
→
05/12 01:07, , 8F
05/12 01:07, 8F
→
05/12 01:07, , 9F
05/12 01:07, 9F
→
05/12 01:08, , 10F
05/12 01:08, 10F
→
05/12 01:09, , 11F
05/12 01:09, 11F
→
05/12 20:30, , 12F
05/12 20:30, 12F
→
05/12 20:30, , 13F
05/12 20:30, 13F
推
05/19 12:42, , 14F
05/19 12:42, 14F
→
05/19 13:17, , 15F
05/19 13:17, 15F
→
05/19 13:18, , 16F
05/19 13:18, 16F
→
05/19 13:19, , 17F
05/19 13:19, 17F
→
05/19 13:20, , 18F
05/19 13:20, 18F
推
05/19 17:25, , 19F
05/19 17:25, 19F
R_Language 近期熱門文章
PTT數位生活區 即時熱門文章