[問題] 如何抓取HTML字串
小弟第一次使用python抓取網頁資料
我的HTML檔是這樣的:
{{datas.0}} {{datas.1}} {{datas.2}}<br>
C1I230(0) 你好(1) 0(2) ( 是空白)
466940(0) 我好(1) 0(2) (<br>是換行)
網頁上顯示是:
C1I230 你好 0
466940 我好 0
我在python裡面使用:
urltmp = urllib.urlopen("http://localhost:8080/test")
urluse = urltmp.readlines()
for i in urluse:
print i
我不知道方法是否正確(Google來的)
我得到的東西是:
C1I230 你好 0<br>
466940 我好 0<br>
(都會多一行空白)
我檢查了一下type是string
於是我使用i.split()就發生悲劇了...(split(' ')也是一樣...)
['\xef\xbb\xbfC1I230', '\xe4\xb9\x9d\xe4\xbb\xbd\xe4\', '0<br>\n']
請問這是編碼的問題嗎???
中間那行空白也會被影響嗎???
我要怎麼拿到兩個list---->['C1I230','你好','0'],['466940','我好','0']
懇請各位大大傳授...
--
※ 發信站: 批踢踢實業坊(ptt.cc)
◆ From: 114.33.86.65
→
10/02 21:22, , 1F
10/02 21:22, 1F
→
10/02 21:23, , 2F
10/02 21:23, 2F
→
10/02 21:23, , 3F
10/02 21:23, 3F
→
10/02 21:24, , 4F
10/02 21:24, 4F
→
10/02 21:24, , 5F
10/02 21:24, 5F
→
10/02 21:46, , 6F
10/02 21:46, 6F
→
10/02 21:47, , 7F
10/02 21:47, 7F
→
10/02 21:56, , 8F
10/02 21:56, 8F
→
10/02 21:57, , 9F
10/02 21:57, 9F
→
10/02 22:43, , 10F
10/02 22:43, 10F
→
10/02 22:43, , 11F
10/02 22:43, 11F
→
10/02 22:44, , 12F
10/02 22:44, 12F
→
10/02 22:45, , 13F
10/02 22:45, 13F
→
10/02 22:45, , 14F
10/02 22:45, 14F
→
10/02 22:46, , 15F
10/02 22:46, 15F
→
10/02 23:19, , 16F
10/02 23:19, 16F
→
10/02 23:43, , 17F
10/02 23:43, 17F
→
10/03 11:50, , 18F
10/03 11:50, 18F
→
10/03 11:50, , 19F
10/03 11:50, 19F
→
10/03 11:50, , 20F
10/03 11:50, 20F
→
10/03 11:51, , 21F
10/03 11:51, 21F
→
10/03 12:00, , 22F
10/03 12:00, 22F
→
10/04 16:25, , 23F
10/04 16:25, 23F
Python 近期熱門文章
PTT數位生活區 即時熱門文章