[問題] 請問如何split中文?
想要的結果為 input="批踢踢實業坊" output=["批踢","踢踢","踢實","實業","業坊
"] (取Bigram)
如果是英文的話,因為有空格可用split()來作,但unicode變成
'\xa7\xe5\xbd\xf0\xbd\xf0\xb9\xea\xb7~\xa7{' 不知道該怎麼切。
想說如果可以切成["批","踢","踢","實","業","坊"],應該就可以得到output
我用的是python2.6
--
※ 發信站: 批踢踢實業坊(ptt.cc)
◆ From: 220.132.67.21
→
12/29 21:52, , 1F
12/29 21:52, 1F
→
12/29 21:53, , 2F
12/29 21:53, 2F
→
12/29 21:55, , 3F
12/29 21:55, 3F
→
12/29 22:11, , 4F
12/29 22:11, 4F
推
12/30 18:02, , 5F
12/30 18:02, 5F
→
12/30 22:00, , 6F
12/30 22:00, 6F
→
12/30 22:01, , 7F
12/30 22:01, 7F
→
12/30 22:01, , 8F
12/30 22:01, 8F
→
01/07 13:38, , 9F
01/07 13:38, 9F
Python 近期熱門文章
PTT數位生活區 即時熱門文章