[問題] 抓html tag
看板RegExp (正規表示式 Regular Expression)作者yingwan (yingwan)時間16年前 (2008/11/08 07:39)推噓0(0推 0噓 1→)留言1則, 1人參與討論串1/2 (看更多)
我想把一個網頁裡的<> 跟 <\> 分別抓出來
原始碼是
<HTML >
<HEAD><TITLE> Hello World </TITLE></HEAD >
<BODY>
<H1>Greetings</H1>
<a href="index,html"
targe=_self > Homepage </a ><p>
<strong >Tat Tval Asi</strong>
</BODY>
</HTML>
抓出來後變成:
These are the opening tags:
<HTML>
<HEAD>
<TITLE>
<BODY>
<H1>
<a href="index.html" targe=_self>
<p>
<strong>
These are the closing tags:
</TITLE>
</HEAD>
</H1>
</a>
</strong>
</BODY>
</HTML>
我用perl是這樣寫的:
open(IN, $file) || die "can't read $file";
@file = <IN>;
print "These are the opening tags:\n";
foreach $line (@file){
find_opening_tags($line);
}
print "\n";
print "These are the closing tags:\n";
foreach $line (@file){
find_closing_tags($line);
}
close IN;
# end of main
#-------------------
# subroutines
#-------------------
sub find_opening_tags {
my $line = $_[0];
if ($line=~ /(\<[^\/].*\>)/){
print "$1\n";
}
}
sub find_closing_tags {
my $line = $_[0];
if ($line =~ /(\<\/.*\>)/) {
print "$1\n";
}
}
結果是
These are the opening tags:
<HTML >
<HEAD>
<BODY>
<H1>
<p>
<strong >
These are the closing tags:
</TITLE></HEAD >
</H1>
</a ><p>
</strong>
</BODY>
</HTML>
希望高手指點一下,謝謝
--
※ 發信站: 批踢踢實業坊(ptt.cc)
◆ From: 149.159.132.73
→
11/08 13:37, , 1F
11/08 13:37, 1F
討論串 (同標題文章)
RegExp 近期熱門文章
PTT數位生活區 即時熱門文章