[請益] 抓取中文網頁並分析

看板PHP作者 (as)時間15年前 (2010/04/10 15:22), 編輯推噓1(102)
留言3則, 2人參與, 最新討論串1/1
下面是一個簡單的程式 去抓出新聞 同樣的邏輯在英文網頁 就可以成功 在這個聯合報的網頁 就失敗 請各位大大幫忙看看了 先拜謝 <?php $search_url= "http://udn.com/NEWS/WORLD/WOR3/5528622.shtml"; $ch = curl_init(); curl_setopt($ch, CURLOPT_URL,$search_url); curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.1.7) Gecko/20091221 Firefox/3.5.7 GTBDFff GTB7.0 (.NET CLR 2.0.50727)'); curl_setopt($ch, CURLOPT_RETURNTRANSFER,1); curl_setopt($ch ,CURLOPT_HTTPHEADER, array("Accept-Language: zh-tw","Accept-Charset: utf-8")); $content = curl_exec ($ch); curl_close ($ch); echo $content."\n"; $pattern = "/(<div class=\"story\" id=\"story\">)(.*?)(<\/div>)/"; echo $pattern."\n"; preg_match($pattern, $content, $matches); print_r($matches); ?> -- For want of a nail the shoe was lost, for want of a shoe the horse was lost, for want of a horse the knight was lost, for want of a knight the battle was lost, for want of a battle the kingom was lost. -- ※ 發信站: 批踢踢實業坊(ptt.cc) ◆ From: 24.6.21.35

04/10 15:59, , 1F
(.*?) 改成 ((?:.|\n)*?)
04/10 15:59, 1F

04/10 16:01, , 2F
'.' matches anyting EXCEPT '\n'
04/10 16:01, 2F

04/10 16:35, , 3F
感謝 :D
04/10 16:35, 3F
文章代碼(AID): #1Bm2TFDu (PHP)
文章代碼(AID): #1Bm2TFDu (PHP)