[問題] 爬蟲抓取資料問題
[問題類型]:網路爬蟲
[軟體熟悉度]:入門
[問題敘述]:
我想要抓這個網頁的資料 但不知道是不是Xpath寫錯了 我到後來抓到的資料是NULL
懇請各位大神給予指教 如果有需要補充的資訊也請不吝指出
已經google過相關訊息 用不同的package但結果相同 所以才會覺得是不是一層一層的Tag
寫錯了
Update Code:
myheader <- c(
"User-Agent"="Mozilla/5.0 (iPhone; U; CPU iPhone OS 4_0_1 like Mac OS X; ja-jp) AppleWebKit/532.9 (KHTML, like Gecko) Version/4.0.5 Mobile/8A306 Safari/6531.22.7",
"Accept"="text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
"Accept-Language"="en-us",
"Connection"="keep-alive",
"Accept-Charset"="GB2312,utf-8;q=0.7,*;q=0.7"
)
#加上myheader
d <- debugGatherer()
get_url <- getURL(url, httpheader = myheader, debugfunction = d$update, verbose = T)
get_url_parse = htmlTreeParse(get_url, encoding = "UTF-8", error=function(...){}, useInternalNodes = TRUE,trim=TRUE)
cat(d$value()[3])
node<-getNodeSet(get_url_parse, "//div[@class='page-content-wrapper']")
info<-sapply(node,xmlValue)
info
[程式範例]:
library(XML)
library(RCurl)
url="https://www.eex.com/en/market-data/environmental-markets/spot-market/european-emission-allowances#!/2017/01/04"
get_url = getURL(url,encoding = "UTF-8")
#將url解析
get_url_parse = htmlParse(get_url, encoding = "UTF-8")
tablehead <- xpathSApply(get_url_parse, "//div[@id='content']/section[@class='clearfix']/article[@id='market_data']/div[@id='content']/div/div/div/div",xmlValue)
[環境敘述]:
mac10.12.2
[關鍵字]:
--
※ 發信站: 批踢踢實業坊(ptt.cc), 來自: 114.36.131.182
※ 文章網址: https://www.ptt.cc/bbs/R_Language/M.1489680159.A.038.html
→
03/17 00:25, , 1F
03/17 00:25, 1F
→
03/17 00:25, , 2F
03/17 00:25, 2F
→
03/17 00:25, , 3F
03/17 00:25, 3F
→
03/18 17:50, , 4F
03/18 17:50, 4F
→
03/18 17:51, , 5F
03/18 17:51, 5F
→
03/18 17:51, , 6F
03/18 17:51, 6F
→
03/18 17:51, , 7F
03/18 17:51, 7F
→
03/18 22:05, , 8F
03/18 22:05, 8F
※ 編輯: ya32347844 (114.36.131.182), 03/18/2017 23:44:44
→
03/18 23:46, , 9F
03/18 23:46, 9F
討論串 (同標題文章)
R_Language 近期熱門文章
PTT數位生活區 即時熱門文章