[請益] 抓取中文網頁並分析
下面是一個簡單的程式 去抓出新聞
同樣的邏輯在英文網頁 就可以成功
在這個聯合報的網頁 就失敗
請各位大大幫忙看看了
先拜謝
<?php
$search_url= "http://udn.com/NEWS/WORLD/WOR3/5528622.shtml";
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL,$search_url);
curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows; U; Windows NT 5.1;
en-US; rv:1.9.1.7) Gecko/20091221 Firefox/3.5.7 GTBDFff GTB7.0 (.NET CLR
2.0.50727)');
curl_setopt($ch, CURLOPT_RETURNTRANSFER,1);
curl_setopt($ch ,CURLOPT_HTTPHEADER, array("Accept-Language:
zh-tw","Accept-Charset: utf-8"));
$content = curl_exec ($ch);
curl_close ($ch);
echo $content."\n";
$pattern = "/(<div class=\"story\" id=\"story\">)(.*?)(<\/div>)/";
echo $pattern."\n";
preg_match($pattern, $content, $matches);
print_r($matches);
?>
--
For want of a nail the shoe was lost,
for want of a shoe the horse was lost,
for want of a horse the knight was lost,
for want of a knight the battle was lost,
for want of a battle the kingom was lost.
--
※ 發信站: 批踢踢實業坊(ptt.cc)
◆ From: 24.6.21.35
推
04/10 15:59, , 1F
04/10 15:59, 1F
→
04/10 16:01, , 2F
04/10 16:01, 2F
→
04/10 16:35, , 3F
04/10 16:35, 3F
PHP 近期熱門文章
PTT數位生活區 即時熱門文章