[問題] 抓取網頁遇到的問題
大家好,
最近初學Python,想做個簡單的抓網頁程式,
我灌的是python3.1.1的版本,我用了urllib的class,以下為測試main
--------------------------------------------------------
import urllib.request
url="http://google.com"
MyWeb=urllib.request.urlopen(url)
WebContent=MyWeb.read()
MyWeb.close()
print(WebContent)
--------------------------------------------------------
我發現如果打一些比較好抓的網頁如(http://google.com)
就會正確的將內容抓下來,但我打一些網站,像是(http://www.wretch.cc/)
執行後就會出現以下訊息↓
Traceback (most recent call last):
File "html2spec.py", line 6, in <module>
MyWeb=urllib.request.urlopen(url)
File "C:\Python31\lib\urllib\request.py", line 119, in urlopen
return _opener.open(url, data, timeout)
File "C:\Python31\lib\urllib\request.py", line 353, in open
response = meth(req, response)
File "C:\Python31\lib\urllib\request.py", line 465, in http_response
'http', request, response, code, msg, hdrs)
File "C:\Python31\lib\urllib\request.py", line 391, in error
return self._call_chain(*args)
File "C:\Python31\lib\urllib\request.py", line 325, in _call_chain
result = func(*args)
File "C:\Python31\lib\urllib\request.py", line 473, in http_error_default
raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 401: Unauthorized
這是為什麼啊?還是我應該用什麼方式呢?
謝謝幫忙解答。
--
※ 發信站: 批踢踢實業坊(ptt.cc)
◆ From: 59.120.3.16
推
08/26 19:29, , 1F
08/26 19:29, 1F
推
08/26 22:04, , 2F
08/26 22:04, 2F
→
08/27 13:23, , 3F
08/27 13:23, 3F
推
08/27 13:48, , 4F
08/27 13:48, 4F
→
10/07 14:08, , 5F
10/07 14:08, 5F
Python 近期熱門文章
PTT數位生活區 即時熱門文章