[問題] 抓取網頁

看板R_Language作者david31408 (Hope)時間9年前 (2016/08/12 18:05)推噓0(0推 0噓 6→)

留言6則, 3人參與討論串1/1

[軟體熟悉度]: 請把以下不需要的部份刪除入門(寫過其他程式，只是對語法不熟悉) [問題敘述]: 請簡略描述你所要做的事情，或是這個程式的目的大家好，我是R的新手，所以最近在練習想要用XML這個package試著抓取 baseballreference的資料試看看由於很菜，所以就先亂試，程式碼跟提示如下會不會不是所有的網頁都可以用xml抓取? > library("XML", lib.loc="~/R/win-library/3.2") > url <- "http://www.baseball-reference.com/leaders/H_career.shtml" > Hits <- readHTMLTable(url) Error in UseMethod("xpathApply") : no applicable method for 'xpathApply' applied to an object of class "NULL" 在上面的case中，不知道為什麼會出現這樣的error message 但我猜網頁本身不是table 後來又試了方法2 > url <- "http://www.baseball-reference.com/leaders/H_career.shtml" > x <- xmlParse(url) Error message 如下 Specification mandate value for attribute itemscope attributes construct error Couldn't find end of Start Tag html line Extra content at the end of the document Error: 1: Specification mandate value for attribute itemscope 2: attributes construct error 3: Couldn't find end of Start Tag html line 1 4: Extra content at the end of the document 可能baseballreference防止這樣? 謝謝大家教學 :) [關鍵字]: MLB, XML -- ※ 發信站: 批踢踢實業坊(ptt.cc), 來自: 140.109.55.227 ※ 文章網址: https://www.ptt.cc/bbs/R_Language/M.1470996319.A.26D.html