[問題] Crawler問題(Error 403, 500)

看板Python作者kiwistar (暴風雪之喀秋莎)時間8年前 (2018/04/27 02:57)推噓1(1推 0噓 6→)

留言7則, 2人參與討論串1/1

https://ideone.com/9pNQ0X 照課程指示，寫一個簡單的爬蟲原本範例使用google finance的網址來示範但貌似google finance已經變更顯示方式了照原本的方式輸入會得到HTTP ERROR 403 forbidden 改用讀冊書店的商品頁，得到 HTTP ERROR 500: internal server error https://i.imgur.com/UZSSgQ1.jpg

插入try-catch區塊： try: data = urllib.request.urlopen(url).read() data1 = data.decode('utf-8') except HTTPError as e: content = e.read() print(content) 把得到的文字複製下來用瀏覽器檢視： https://i.imgur.com/JpbFiqM.jpg

直接開啟網頁可以正常檢視沒問題，但為什麼用urllib抓就一堆問題？試過幾個網頁 google finance跳 HTTP Error 403 taaze.tw跳HTTP Error 500 最後使用flickr.com才成功抓下圖片但如果正常使用上，三個網站有兩個不能用，顯然這東西根本不能用請問我是不是忘了什麼？還是可以怎麼改進程式碼？？感謝大家 --

推

perry27

10/02 10:37,

10/02 10:37

→

xyz4594

10/02 10:37,

10/02 10:37

-- ※ 發信站: 批踢踢實業坊(ptt.cc), 來自: 123.194.179.102 ※ 文章網址: https://www.ptt.cc/bbs/Python/M.1524769075.A.36A.html

推

kenduest

04/27 11:06, 8年前 , 1^F

04/27 11:06, 1^F

→

kenduest

04/27 11:06, 8年前 , 2^F

04/27 11:06, 2^F

→

kenduest

04/27 11:07, 8年前 , 3^F

04/27 11:07, 3^F

→

kenduest

04/27 11:11, 8年前 , 4^F

04/27 11:11, 4^F

→

kenduest

04/27 11:11, 8年前 , 5^F

04/27 11:11, 5^F

→

coeric

04/27 11:14, 8年前 , 6^F

04/27 11:14, 6^F

呃，不知道為啥，我點進去自己貼的連結跟編輯頁面的url看起來就是不一樣我要貼的連結：https://www.taaze.tw/sing.html?pid=11100843681 但是ideone會自動把我的網址屏蔽，不知道為什麼orz ※ 編輯: kiwistar (123.194.179.102), 04/28/2018 16:59:48

→

kenduest

04/29 05:38, 8年前 , 7^F

04/29 05:38, 7^F

‣ 返回看板[ Python ] 程設

‣ 更多 kiwistar 的文章

文章代碼(AID): #1QuY4pDg (Python)