[請益] 這應該是特殊符號吧...請問要怎麼濾掉它
# 以下是小弟 log 的一小段程式
$string = '要被記錄的字串';
$string = trim(preg_replace('/\s+/', ' ', preg_replace('/[\f\r\n\t]+/s', ' ', clear_unicode_spaces(clear_invisible_unicode($string)))));
logger->log($string); # log function
因為我會用 tail -f 的方式在看 log
原本只有用 preg_replace 的方式將 \r\n 等等的換行符號全部換成空白
但有時候就是會看到像下圖這種斷行的記錄...
http://guardlan.myweb.hinet.net/tail.png

改用 vim 去看的時候就會看到一個淡藍色 > 的符號...
http://guardlan.myweb.hinet.net/vim.png

我一直以為那是特殊符號,所以去網路上找了專門過濾字串的 clear_invisible_unicode 跟 clear_unicode_spaces 來使用
但是 log 內還是看到像這種 > 符號
這個到底是什麼,要怎麼樣才能把它濾掉...
以下把 clear_invisible_unicode 跟 clear_unicode_spaces 貼出來
function clear_invisible_unicode($input){
$invisible = array(
"\0",
"\xc2\xad", # 'SOFT HYPHEN' (U+00AD)
"\xcc\xb7", # 'COMBINING SHORT SOLIDUS OVERLAY' (U+0337)
"\xcc\xb8", # 'COMBINING LONG SOLIDUS OVERLAY' (U+0338)
"\xcd\x8f", # 'COMBINING GRAPHEME JOINER' (U+034F)
"\xe1\x85\x9f", # 'HANGUL CHOSEONG FILLER' (U+115F)
"\xe1\x85\xa0", # 'HANGUL JUNGSEONG FILLER' (U+1160)
"\xe2\x80\x8b", # 'ZERO WIDTH SPACE' (U+200B)
"\xe2\x80\x8c", # 'ZERO WIDTH NON-JOINER' (U+200C)
"\xe2\x80\x8d", # 'ZERO WIDTH JOINER' (U+200D)
"\xe2\x80\x8e", # 'LEFT-TO-RIGHT MARK' (U+200E)
"\xe2\x80\x8f", # 'RIGHT-TO-LEFT MARK' (U+200F)
"\xe2\x80\xaa", # 'LEFT-TO-RIGHT EMBEDDING' (U+202A)
"\xe2\x80\xab", # 'RIGHT-TO-LEFT EMBEDDING' (U+202B)
"\xe2\x80\xac", # 'POP DIRECTIONAL FORMATTING' (U+202C)
"\xe2\x80\xad", # 'LEFT-TO-RIGHT OVERRIDE' (U+202D)
"\xe2\x80\xae", # 'RIGHT-TO-LEFT OVERRIDE' (U+202E)
"\xe3\x85\xa4", # 'HANGUL FILLER' (U+3164)
"\xef\xbb\xbf", # 'ZERO WIDTH NO-BREAK SPACE' (U+FEFF)
"\xef\xbe\xa0", # 'HALFWIDTH HANGUL FILLER' (U+FFA0)
"\xef\xbf\xb9", # 'INTERLINEAR ANNOTATION ANCHOR' (U+FFF9)
"\xef\xbf\xba", # 'INTERLINEAR ANNOTATION SEPARATOR' (U+FFFA)
"\xef\xbf\xbb", # 'INTERLINEAR ANNOTATION TERMINATOR' (U+FFFB)
);
return str_replace($invisible, '', $input);
}
function clear_unicode_spaces($input){
$spaces = array(
"\x9", # 'CHARACTER TABULATION' (U+0009)
//"\xa", # 'LINE FEED (LF)' (U+000A)
"\xb", # 'LINE TABULATION' (U+000B)
"\xc", # 'FORM FEED (FF)' (U+000C)
//"\xd", # 'CARRIAGE RETURN (CR)' (U+000D)
"\x20", # 'SPACE' (U+0020)
"\xc2\xa0", # 'NO-BREAK SPACE' (U+00A0)
"\xe1\x9a\x80", # 'OGHAM SPACE MARK' (U+1680)
"\xe1\xa0\x8e", # 'MONGOLIAN VOWEL SEPARATOR' (U+180E)
"\xe2\x80\x80", # 'EN QUAD' (U+2000)
"\xe2\x80\x81", # 'EM QUAD' (U+2001)
"\xe2\x80\x82", # 'EN SPACE' (U+2002)
"\xe2\x80\x83", # 'EM SPACE' (U+2003)
"\xe2\x80\x84", # 'THREE-PER-EM SPACE' (U+2004)
"\xe2\x80\x85", # 'FOUR-PER-EM SPACE' (U+2005)
"\xe2\x80\x86", # 'SIX-PER-EM SPACE' (U+2006)
"\xe2\x80\x87", # 'FIGURE SPACE' (U+2007)
"\xe2\x80\x88", # 'PUNCTUATION SPACE' (U+2008)
"\xe2\x80\x89", # 'THIN SPACE' (U+2009)
"\xe2\x80\x8a", # 'HAIR SPACE' (U+200A)
"\xe2\x80\xa8", # 'LINE SEPARATOR' (U+2028)
"\xe2\x80\xa9", # 'PARAGRAPH SEPARATOR' (U+2029)
"\xe2\x80\xaf", # 'NARROW NO-BREAK SPACE' (U+202F)
"\xe2\x81\x9f", # 'MEDIUM MATHEMATICAL SPACE' (U+205F)
"\xe3\x80\x80", # 'IDEOGRAPHIC SPACE' (U+3000)
);
return str_replace($spaces, ' ', $input);
}
--
※ 發信站: 批踢踢實業坊(ptt.cc)
◆ From: 111.240.54.65
推
12/29 03:46, , 1F
12/29 03:46, 1F
推
12/29 09:44, , 2F
12/29 09:44, 2F
→
12/29 18:57, , 3F
12/29 18:57, 3F
感謝提示...
我去查了那段的ASCII碼之後發現那個特殊字元是空的...根本沒有那個字...
後來發現那個好像是 pietty 偵測視窗邊界有問題造成的...
縮小視窗之後發現那個 > 出現在別行上=.=
有夠搞笑...XDDD
※ 編輯: guardlan 來自: 111.240.54.65 (12/30 00:26)
PHP 近期熱門文章
PTT數位生活區 即時熱門文章