[問題] lxml parse html
小弟目前想用lxml parse html的資料,不過卡在一個地方解不出來
下面一段是我想解的html,我要把下面的tag <p>的字串分別解出來
卡的點在Author之後,tag <strong>的部份我可以取得,但是後面就抓不出來了
不知道是哪邊處理出錯,還請各位先進指點一下
<div class="bookinfo">
<h2>C Language Teaching Manual 4th Edition (Traditional Chinese Edition)</h2>
<p><strong>ISBN-13:</strong> <a href="/isbn/9789574424849">9789574424849</a></p>
<p><strong>ISBN-10:</strong> <a href="/isbn/9574424847">9574424847</a></p>
<p><strong>Author:</strong> HongWeiEn</p>
<p><strong>Binding:</strong> Paperback</p>
<p><strong>Publisher:</strong> QiBiaoChuBanGuFenYouXianGongSi</p>
<p><strong>Published:</strong> December 2007</p>
</div>
以下是程式碼的部份:
#ISBN-13, ISBN-10, Author, Binding, Publisher, Published
book_info = inforoot[0].xpath('p')
for info in book_info:
print info.xpath('strong')[0].text
if info.xpath('a'):
print info.xpath('a')[0].text
else :
print info[0].text
--
要感謝的人太多了,那就謝天吧
要改得程式碼太多了,那就改天吧
--
※ 發信站: 批踢踢實業坊(ptt.cc)
◆ From: 220.134.71.154
→
09/27 17:31, , 1F
09/27 17:31, 1F
→
09/27 18:05, , 2F
09/27 18:05, 2F
→
09/27 18:05, , 3F
09/27 18:05, 3F
→
09/27 18:06, , 4F
09/27 18:06, 4F
→
09/27 18:22, , 5F
09/27 18:22, 5F
→
09/27 18:22, , 6F
09/27 18:22, 6F
→
09/27 18:23, , 7F
09/27 18:23, 7F
→
09/27 19:11, , 8F
09/27 19:11, 8F
→
09/28 17:14, , 9F
09/28 17:14, 9F
→
09/30 08:10, , 10F
09/30 08:10, 10F
→
09/30 08:10, , 11F
09/30 08:10, 11F
→
09/30 08:10, , 12F
09/30 08:10, 12F
→
09/30 08:16, , 13F
09/30 08:16, 13F
→
09/30 08:16, , 14F
09/30 08:16, 14F
Python 近期熱門文章
PTT數位生活區 即時熱門文章