[問題] 取得discovery news抬頭

看板RegExp (正規表示式 Regular Expression)作者 (fire)時間12年前 (2012/07/29 20:53), 編輯推噓0(002)
留言2則, 1人參與, 最新討論串1/1
hi all: 我需要取得discovery news的新聞標題 http://news.discovery.com/earth/ 以下是部份原始碼 =====================begin===================== <dl class="asset-items clear clearfix"> <dd class="details"> <h2 class="title"><a href="http://news.discovery.com/earth/us-fire-map-hot-spots-120727.html" onclick="componentClickTracking.build(this, {title:'topic landing',name:'channel module',location:'1',position:'1'}); return false;">Hottest Spots in America: Big Pic </a> </h2> <p class="source">Posted Fri Jul 27, 2012 05:57 AM ET &#160;&#160;|&#160;&#160; <span class="js-kit-comments-count comment-bubble" id="count-fe5a23fb-12e7-4008-acf3-9cd954fe4ade" uniq="/fe5a23fb-12e7-4008-acf3-9cd954fe4ade" exclude-sources="Digg, FriendFeed, Twitter">0</span></p> <p class="description">Armed with NASA satellite data, a clever data visualization expert has produced a hotspots map of all major fires in the contiguous US from 2001 through early July 2012. <a rel="nofollow" href="http://news.discovery.com/earth/us-fire-map-hot-spots-120727.html" onclick="componentClickTracking.build(this, {title:'topic landing',name:'channel module',location:'1',position:'3'}); return false;"><strong> Read&#160;more </strong></a></p> </dd> <dd class="thumbnail"><a href="http://news.discovery.com/earth/us-fire-map-hot-spots-120727.html" onclick="componentClickTracking.build(this, {title:'topic landing',name:'channel module',location:'1',position:'4'}); return false;"><img src="/earth/2012/07/27/firemap-278.jpg" title="fire map" alt="fire map" class="" /></a></dd> </dl> <dl class="asset-items clear clearfix"> <dd class="details"> <h2 class="title"><a href="http://news.discovery.com/earth/dead-lawn-paint-it-green-dnews-nugget-.html" onclick="componentClickTracking.build(this, {title:'topic landing',name:'channel module',location:'2',position:'1'}); return false;">Dead Lawn? Paint it Green: DNews Nugget</a> </h2> <p class="source">Posted by &#160;<a href="/contributors/christina-reed/" onclick="componentClickTracking.build(this, {title:'topic landing',name:'channel module',location:'2',position:'2'}); return false;">Christina Reed</a>&#160; Fri Jul 27, 2012 04:38 AM ET &#160;&#160;|&#160;&#160; <span class="js-kit-comments-count comment-bubble" id="count-2d57762b-b52b-4656-ae24-4f853dd4429d" uniq="/2d57762b-b52b-4656-ae24-4f853dd4429d" exclude-sources="Digg, FriendFeed, Twitter">0</span></p> <p class="description">Residents around the country this summer are calling their local turf and lawn painters for touch-ups to the front yard or getting into the business themselves. <a rel="nofollow" href="http://news.discovery.com/earth/dead-lawn-paint-it-green-dnews-nugget-.html" onclick="componentClickTracking.build(this, {title:'topic landing',name:'channel module',location:'2',position:'3'}); return false;"><strong> Read&#160;more </strong></a></p> </dd> <dd class="thumbnail"><a href="http://news.discovery.com/earth/dead-lawn-paint-it-green-dnews-nugget-.html" onclick="componentClickTracking.build(this, {title:'topic landing',name:'channel module',location:'2',position:'4'}); return false;"><img src="http://blogs.discovery.com/.a/6a00d8341bf67c53ef0168ebeb2dc3970c-800wi" title="Dead Lawn? Paint it Green: DNews Nugget" alt="Dead Lawn? Paint it Green: DNews Nugget" class="" /></a></dd> </dl> =====================end======================= 我需要取得標頭是 Hottest Spots in America: Big Pic Dead Lawn? Paint it Green: DNews Nugget 我觀察到的規則是標頭後會帶有</a> </h2> 中間會插入兩個空白,前頭會有location及position 所以我用了以下的表示法 location([\s\S]+)position([\s\S]+)return\sfalse([\S\s]+)</a>\s\s</h2> 但會全部選擇 請問我要怎麼改 謝謝 -- ※ 發信站: 批踢踢實業坊(ptt.cc) ◆ From: 220.133.98.77

07/29 21:54, , 1F
position:'[0-9]{1}'}\); return false;\">([^>]+)</a>\s\s
07/29 21:54, 1F

07/29 21:54, , 2F
</h2> 自問自答
07/29 21:54, 2F
文章代碼(AID): #1G5J9dXr (RegExp)
文章代碼(AID): #1G5J9dXr (RegExp)