Re: [問題] 丟入htmlParse的東西

看板R_Language作者oldjojotenya (舊舅舅)時間11年前 (2015/01/31 13:46)推噓0(0推 0噓 1→)

留言1則, 1人參與討論串2/2 (看更多)

後來找了兩種網頁測試了一下：一、全部資訊在單一頁面的： https://tw.stock.yahoo.com/d/s/company_2330.html 1. url<-"https://tw.stock.yahoo.com/d/s/company_2330.html" content0<-htmlParse(url) 結果：成功但是顯示警告訊息：XML content does not seem to be XML 後來去stockoverflow查了一下，有人回答遇到這種狀況的處理方法： "You can use RCurl to fetch the content and then XML seems to be able to handle it"，表示要用RCurl的getURL就能成功。 2. url<-getURL("https://tw.stock.yahoo.com/d/s/company_2330.html") content1<-htmlParse(url) 結果：成功 3. url<-"https://tw.stock.yahoo.com/d/s/company_2330.html" f<-file(url) f_size<-file.info(url)$size content2<-readChar(f,f_size) close(f) 結果： #錯誤在readChar(f, f_size) : 無法開啟連結此外: 警告訊息： In readChar(f, f_size) : 不支援這種 URL 方法二、搜尋頁： http://www.taifex.com.tw/chinese/3/7_12_1.asp 1. url<-"http://www.taifex.com.tw/chinese/3/7_12_1.asp" content0<-htmlParse(url) 結果：成功 2. url<-getURL("http://www.taifex.com.tw/chinese/3/7_12_1.asp") content1<-htmlParse(url) 結果：成功 3. url<-"http://www.taifex.com.tw/chinese/3/7_12_1.asp" f<-file(url) f_size<-file.info(url)$size content2<-readChar(f,f_size) close(f) 結果： #錯誤: 'nchars' 引數不正確查了readChar的使用方法，nchars不能為NA，但在此處帶入的f_size不知道為何卻是NA 總結： 1.不管怎樣用getURL比較保險 2.用file.info連接到本地file時，抓出來的size都是該file的size，但是連接到網路上的file時，不知道為何都讀不到正確的size(都顯示為NA)，所以就不能用 readChar抓出網頁內容了。可請問為何是這樣嘛？ -- ※ 發信站: 批踢踢實業坊(ptt.cc), 來自: 112.105.245.56 ※ 文章網址: https://www.ptt.cc/bbs/R_Language/M.1422683212.A.3FB.html