[問題] 爬蟲相關的疑問

看板Python作者時間12年前 (2013/08/20 19:04), 編輯推噓0(004)
留言4則, 3人參與, 最新討論串1/1
之前有寫了一個爬yahoo字典的而且確認沒問題 今天重新跑發現很奇怪的問題 程式碼如下 from bs4 import BeautifulSoup req = urllib2.Request("http://tw.dictionary.yahoo.com/dictionary?p=good") html = urllib2.urlopen(req) htmls = html.read() html.close soup = BeautifulSoup(htmls) #到這一行就會出錯 以下內容是錯誤訊息 Traceback (most recent call last): File "<pyshell#29>", line 1, in <module> soup = BeautifulSoup(html) File "C:\Python26\lib\site-packages\bs4\__init__.py", line 168, in __init__ self._feed() File "C:\Python26\lib\site-packages\bs4\__init__.py", line 181, in _feed self.builder.feed(self.markup) File "C:\Python26\lib\site-packages\bs4\builder\_lxml.py", line 72, in feed self.parser.close() File "parser.pxi", line 1110, in lxml.etree._FeedParser.close (src/lxml/lxml.etree.c:73063) XMLSyntaxError: no element found 想請問是出了什麼錯誤? 謝謝 -- ※ 發信站: 批踢踢實業坊(ptt.cc) ◆ From: 140.135.114.19

08/20 20:02, , 1F
html tag可能有問題, lxml下fromstring報錯, HTML沒問題
08/20 20:02, 1F

08/21 00:07, , 2F
我用2.7跑正常
08/21 00:07, 2F

08/21 08:29, , 3F
恩,有找到了,yahoo那邊把tag改掉了
08/21 08:29, 3F

08/21 08:30, , 4F
所以導致後面tag有問題,^^謝謝
08/21 08:30, 4F
文章代碼(AID): #1I4qqYgt (Python)
文章代碼(AID): #1I4qqYgt (Python)