[請益]XML如何避免重覆抓取相同的TAG
若我只想抓rss中 channel下一第一個title,而之後的title都不抓
我是利用xml parser去做,但會一直抓下去,抓到完為止
這要怎麼解決
附上我的程式碼:
-------------------------------------------------------------------------
<?php
$rssFeeds = array ('http://feeds.feedburner.com/Techcrunch');
//Loop through the array, reading the feeds one by one
foreach ($rssFeeds as $feed) {
readFeeds($feed);
}
function startElement($xp,$name,$attributes) {
global $item,$currentElement; $currentElement = $name;
//the other functions will always know which element we're parsing
if ($currentElement == 'CHANNEL') {
//by default PHP converts everything to uppercase
$item = true;
// We're only interested in the contents of the item element.
////This flag keeps track of where we are
}}
function endElement($xp,$name) {
global $item,$currentElement,$title,$description,$link;
if ($name == 'CHANNEL') {
// If we're at the end of the item element, display
// the data, and reset the globals
$title=iconv("UTF-8", "BIG-5", $title);
//echo "<b>Title:</b> $title<br>";
$description=iconv("UTF-8", "BIG-5", $description);
//echo "<b>Description:</b> $description<br>";
//echo "<b>Link:</b> $link<br><br>";
$title = '';
$description = '';
$link = '';
$item = false; }}
function characterDataHandler($xp,$data) {
global $item,$currentElement,$title,$description,$link;
if ($item) {
//Only add to the globals if we're inside an item element.
switch($currentElement) {
case "TITLE":
$title .= $data;
// We use .= because this function may be called multiple
// times for one element.
break;
case "DESCRIPTION":
$description.=$data;
break;
case "LINK":
$link.=$data;
break; } }}
function readFeeds($feed) {
$fh = @fopen($feed,'r');
// open file for reading
$xp = xml_parser_create();
// Create an XML parser resource
xml_set_element_handler($xp, "startElement", "endElement");
// defines which functions to call when element started/ended
xml_set_character_data_handler($xp, "characterDataHandler");
while ($data = fread($fh, 4096)) {
if (!xml_parse($xp,$data)) {
return 'Error in the feed';
}
}
}
?>
--
※ 發信站: 批踢踢實業坊(ptt.cc)
◆ From: 123.195.65.53
PHP 近期熱門文章
PTT數位生活區 即時熱門文章