Re: [問題] 正則表示式 regex in R

看板R_Language作者 (cywhale)時間9年前 (2016/04/30 23:51), 編輯推噓0(002)
留言2則, 2人參與, 最新討論串3/3 (看更多)
※ 引述《celestialgod (天)》之銘言: : ※ 引述《cywhale (cywhale)》之銘言: : : [問題類型]: : : 程式諮詢(我想用R 做某件事情,但是我不知道要怎麼用R 寫出來) : : 若一字串的開頭與結尾只想留下英文字,我寫 : : gsub("^[^a-zA-Z]+|[^a-zA-Z]+$", "", x) : : 但若結尾是"sp." or "spp." 我想保留"." 這個符號不被上面這個式子濾掉 : : 比如 "aaa bbb sp." 就維持原字串 : : 但其他情況的"."應該要被濾掉 比如 "aaa bbb22." -> "aaa bbb" : : 試了一些?: ?! 等語法都沒抓到,向大家請教~~ 謝謝~ : str <- c("aaa bbb sp.", "aaa bbb sp2.") : gsub("[^a-zA-Z]*([a-zA-Z. ]+).*", "\\1", str) : ^ 這個空格要留著 不然會出事XD : # [1] "aaa bbb sp." "aaa bbb sp" : 我忘了問 會不會有 "aa2 bb3 cc." 要變成 "aa bb cc." 這種情況了? : 有這種情況建議用regmatches,把 "aa", "bb", "cc."都抓出來,再處理QQ : 大概像這樣(可能考慮還不夠周延): : str <- c("aaa bbb sp.", "aaa bbb sp2.", "aa2 bb3 cc.") : sapply(regmatches(str, gregexpr("[a-zA-Z. ]+", str)), function(x){ : paste0(x[x != "."], collapse = "") : }) : # [1] "aaa bbb sp." "aaa bbb sp" "aa bb cc." From previous post (thanks celestialgod), I learned "\\1" and got some idea.. So I tried and made the following code. The results closed to my targets, to simplify some scientific names collected from web. Those formats were just in a mess. >< After these trials, learned a lot for handling regex... ^_^ gsub("^[^a-zA-Z]+|(?!\\.)[^a-zA-Z]+$| \\b((sp\\.)+$)|\\b((spp\\.)+$)|((\\w{0,})\\.+$)","\\2\\4\\6", c("33aaa sp.", "aaa sp.bb33", "aaasp.bb 33 de","aaa w2sp.", "aaa www spp. ", "spp.","bb.", "XXX sp. ", "YYY spp.()", "ZZZZ.."), perl=T) [1] "aaa sp." "aaa sp.bb" "aaasp.bb 33 de" "aaa w2sp" "aaa www spp." [6] "spp." "bb" "XXX sp." "YYY spp." "ZZZZ" Any comments or bugs found, just tell me! Thanks for the help~ -- ※ 發信站: 批踢踢實業坊(ptt.cc), 來自: 36.225.163.223 ※ 文章網址: https://www.ptt.cc/bbs/R_Language/M.1462031510.A.4C3.html

04/30 23:55, , 1F
這個regex真的好醜XDD
04/30 23:55, 1F

05/01 00:01, , 2F
haha.. really.. @@
05/01 00:01, 2F
文章代碼(AID): #1N9DIMJ3 (R_Language)
文章代碼(AID): #1N9DIMJ3 (R_Language)