[問題] lxml遇到<br /> 該如何處理?

看板Python作者 (5566520)時間9年前 (2016/03/14 23:06), 9年前編輯推噓2(203)
留言5則, 3人參與, 最新討論串1/1
大家好 最近想試著撰寫網頁爬蟲 想抓取網頁的這部分資訊 http://imgur.com/rNdE4hh
嘗試的結果為 # -*- coding: utf-8 -*- from urllib2 import urlopen import xml.etree.ElementTree as ET from lxml import etree import mechanize import sys url = "http://www.tham.com.tw/recipe6.php" path = "//*[@id=\"left-inner\"]/div[2]/div[3]" html = urlopen(url).read() tree = etree.HTML(html) startindex = 4 data = tree.xpath(path) print data[0].text Output: >>> ================================ RESTART ================================ >>> 材料 2人份 >>> 看網頁的原始碼猜測是因為<br />阻擋了判斷的緣故 請問這個有解嗎?? -- ※ 發信站: 批踢踢實業坊(ptt.cc), 來自: 123.195.222.114 ※ 文章網址: https://www.ptt.cc/bbs/Python/M.1457968017.A.79E.html

03/15 00:37, , 1F
//*[@id=\"left-inner\"]/div[2]/div[3]//text() 試試
03/15 00:37, 1F

03/15 19:43, , 2F
感謝 已解決
03/15 19:43, 2F
請在請教一下 xpath這部分要怎麼debug? 有什麼秘訣嗎? 下面output也怪怪的 # -*- coding: utf-8 -*- from urllib2 import urlopen import xml.etree.ElementTree as ET from lxml import etree import mechanize import sys url = "https://icook.tw/recipes/133425" html = urlopen(url).read() tree = etree.HTML(html) path = "//*[@id=\"recipes_show\"]/div[3]" title = tree.xpath(path) print title Output: >>> [] ※ 編輯: girl5566 (123.195.222.114), 03/15/2016 20:24:59

03/16 20:18, , 3F
path = "//*[@itemprop=\"name\"]"
03/16 20:18, 3F

03/16 20:19, , 4F
print title[0].text
03/16 20:19, 4F

03/16 20:19, , 5F
你的 XPATH 抓錯了
03/16 20:19, 5F
文章代碼(AID): #1MvjEHUU (Python)
文章代碼(AID): #1MvjEHUU (Python)