[問題] Error in UseMethod("xml_find_all")
各位版大好,
首次使用R爬蟲,
嘗試其他網頁的爬蟲都有成功,
唯獨此篇的網頁一直爬不下來,
嘗試多日真的快想破頭,
懇請版上大大能幫忙!
[問題類型]:
程式諮詢(我想用R 做某件事情,但是我不知道要怎麼用R 寫出來)
[軟體熟悉度]:
新手(沒寫過程式,R 是我的第一次)
[問題敘述]:
library(httr)
library(rvest)
etf_url <- "http://etfdb.com/etfdb-category/high-yield-bonds/"
首先想爬這個表格就嘗試了幾個方法:
# Definitive List Of High Yield Bonds ETFs
(I)
hyetf <- read_html(etf_url) %>%
html_nodes(xpath = '//*[@id="etfs"]') %>%
html_table()
==> hyetf 顯示為 list()
(II)
etf <- content(GET(etf_url), as = "text", encoding = "UTF-8")
hyetf <- etf %>%
html_nodes(xpath = '//*[@id="etfs"]') %>%
html_table()
hyetf
==> Error in UseMethod("xml_find_all") :
沒有適用的方法可將 'xml_find_all' 套用到 "character" 類別的物件;
hyetf 同樣顯示為 list()
(III)
想說試試看用爬 <table> tag 的方式,結果卻發現找不到 <table> tag 的 list:
hyetftable <- read_html(etf_url) %>%
hyetftable <- read_html(etf_url) %>%
html_nodes("table")
==> {xml_nodeset (0)}
於是不死心決定爬個入門版的最上方大標題試試,結果還是失敗了
# 網頁頂層大標 High Yield Bond ETFs
(I)
hyetf <- etf %>%
html_nodes(".mm-heading-no-top-margin") %>%
html_nodes("h1") %>%
html_text()
==> Error in UseMethod("xml_find_all") :
沒有適用的方法可將 'xml_find_all' 套用到 "character" 類別的物件;
hyetf 顯示為 character(0)
(II)
hyetf <- read_html(etf_url) %>%
html_nodes(".mm-heading-no-top-margin") %>%
html_nodes("h1") %>%
html_text()
==> hyetf 仍顯示為 character(0)
嘗試了好幾週,真的不確定是哪裡缺少什麼,懇請版上大大賜教,感激!
[環境敘述]:
> sessionInfo()
R version 3.3.1 (2016-06-21)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: OS X 10.11.6 (El Capitan)
locale:
[1] zh_TW.UTF-8/zh_TW.UTF-8/zh_TW.UTF-8/C/zh_TW.UTF-8/zh_TW.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets
[6] methods base
other attached packages:
[1] httr_1.2.1 Rwordseg_0.2-1 ggplot2_2.2.0
[4] jiebaR_0.9.1 jiebaRD_0.1 tmcn_0.1-4
[7] tm_0.6-2 NLP_0.1-9 rJava_0.9-8
[10] rvest_0.3.2 xml2_1.0.0
loaded via a namespace (and not attached):
loaded via a namespace (and not attached):
[1] Rcpp_0.12.7 magrittr_1.5
[3] munsell_0.4.3 colorspace_1.2-6
[5] R6_2.1.3 stringr_1.1.0
[7] plyr_1.8.4 tools_3.3.1
[9] parallel_3.3.1 grid_3.3.1
[11] gtable_0.2.0 selectr_0.3-0
[13] lazyeval_0.2.0 assertthat_0.1
[15] tibble_1.2 curl_2.1
[17] rsconnect_0.4.3 slam_0.1-40
[19] stringi_1.1.1 scales_0.4.1
[21] XML_3.98-1.5
--
※ 發信站: 批踢踢實業坊(ptt.cc), 來自: 49.214.198.211
※ 文章網址: https://www.ptt.cc/bbs/R_Language/M.1486207270.A.8F8.html
→
02/04 20:00, , 1F
02/04 20:00, 1F
→
02/04 20:01, , 2F
02/04 20:01, 2F
→
02/04 20:01, , 3F
02/04 20:01, 3F
→
02/04 20:05, , 4F
02/04 20:05, 4F
→
02/04 20:05, , 5F
02/04 20:05, 5F
→
02/04 20:08, , 6F
02/04 20:08, 6F
→
02/05 15:41, , 7F
02/05 15:41, 7F
→
02/05 15:41, , 8F
02/05 15:41, 8F
→
02/05 15:44, , 9F
02/05 15:44, 9F
→
02/05 15:44, , 10F
02/05 15:44, 10F
→
02/05 15:44, , 11F
02/05 15:44, 11F
→
02/05 15:45, , 12F
02/05 15:45, 12F
→
02/05 15:46, , 13F
02/05 15:46, 13F
→
02/05 15:47, , 14F
02/05 15:47, 14F
→
02/05 15:47, , 15F
02/05 15:47, 15F
→
02/05 15:47, , 16F
02/05 15:47, 16F
→
02/05 15:48, , 17F
02/05 15:48, 17F
→
02/05 15:48, , 18F
02/05 15:48, 18F
→
02/05 15:48, , 19F
02/05 15:48, 19F
→
02/05 15:50, , 20F
02/05 15:50, 20F
→
02/05 15:53, , 21F
02/05 15:53, 21F
→
02/05 16:56, , 22F
02/05 16:56, 22F
→
02/05 16:57, , 23F
02/05 16:57, 23F
→
02/05 17:36, , 24F
02/05 17:36, 24F
→
02/05 17:36, , 25F
02/05 17:36, 25F
→
02/05 17:46, , 26F
02/05 17:46, 26F
→
02/05 17:46, , 27F
02/05 17:46, 27F
→
02/05 17:47, , 28F
02/05 17:47, 28F
→
02/05 23:05, , 29F
02/05 23:05, 29F
→
02/05 23:05, , 30F
02/05 23:05, 30F
→
02/05 23:05, , 31F
02/05 23:05, 31F
→
02/05 23:05, , 32F
02/05 23:05, 32F
推
02/06 04:40, , 33F
02/06 04:40, 33F
→
02/06 09:05, , 34F
02/06 09:05, 34F
→
02/06 09:22, , 35F
02/06 09:22, 35F
→
02/06 09:23, , 36F
02/06 09:23, 36F
→
02/06 09:23, , 37F
02/06 09:23, 37F
→
02/06 09:25, , 38F
02/06 09:25, 38F
→
02/06 09:25, , 39F
02/06 09:25, 39F
※ 編輯: k5171 (49.214.198.211), 02/06/2017 12:22:07
推
02/06 13:47, , 40F
02/06 13:47, 40F
→
02/06 14:04, , 41F
02/06 14:04, 41F
R_Language 近期熱門文章
PTT數位生活區 即時熱門文章