Re: [問題] 如何整理數量位置資料如:1胃,2腸
※ 引述《helixc (@_2;)》之銘言:
: [軟體熟悉度]:新手+入門
: [問題敘述]:
: 手上有一筆某蛙類的解剖資料,想要分析食性。
: 紀錄的時候會長這樣:
: ID,Food A,Food B,Food C,Food E
: C146,,,,3腸
: B287,,,,10腸
: C140,,,,4腸
: C133,,,1腸,
: C132,1腸,,,
: B305,,,1腸,
: C112,,2腸,,1腸
: C120,,,,1腸
: C128,,,,1腸
: 想要整理成這樣的資料:
: ID, Food type, Amount, Location
: C146, E, 3, 腸
: B287, E, 10, 腸
: C140, E, 4, 腸
: C133, C, 1, 腸
library(data.table)
library(dplyr)
library(tidyr)
library(magrittr)
library(stringr)
tmp_dt = fread("ID,Food A,Food B,Food C,Food E
C146,,,,3腸
B287,,,,10腸
C140,,,,4腸
C133,,,1腸,
C132,1腸,,,
B305,,,1腸,
C112,,2腸,,1腸
C120,,,,1腸
C128,,,,1腸", colClasses = rep("Character",5))
## method 1
output_dt = tmp_dt %>% gather(foodType, tmpCol,-ID) %>%
filter(tmpCol != "") %>%
mutate(Amount = str_extract(tmpCol, "\\d*"),
Location = str_sub(tmpCol, nchar(tmpCol), nchar(tmpCol))) %>%
select(-tmpCol) %>%
transform(foodType = as.character(foodType)) %>%
transform(foodType = str_sub(foodType, nchar(foodType), nchar(foodType)))
## method 2
output_dt2 = tmp_dt %>% gather(foodType, tmpCol,-ID) %>%
filter(tmpCol != "") %>%
transform(foodType = as.character(foodType),
tmpCol = sub("(\\d*)(.)", "\\1,\\2", tmpCol)) %>%
separate(tmpCol, c("Amount", "Location")) %>%
transform(foodType = str_sub(foodType, nchar(foodType), nchar(foodType)))
## method 3 (不用sub,separate的sep參數可以改成用位置切割)
output_dt2 = tmp_dt %>% gather(foodType, tmpCol,-ID) %>%
filter(tmpCol != "") %>%
transform(foodType = as.character(foodType)) %>%
separate(tmpCol, c("Amount", "Location"), -2) %>%
transform(foodType = str_sub(foodType, nchar(foodType), nchar(foodType)))
output: (3個都一樣)
# ID foodType Amount Location
# 1: C132 A 1 腸
# 2: C112 B 2 腸
# 3: C133 C 1 腸
# 4: B305 C 1 腸
# 5: C146 E 3 腸
# 6: B287 E 10 腸
# 7: C140 E 4 腸
# 8: C112 E 1 腸
# 9: C120 E 1 腸
# 10: C128 E 1 腸
--
※ 發信站: 批踢踢實業坊(ptt.cc), 來自: 123.205.27.107
※ 文章網址: https://www.ptt.cc/bbs/R_Language/M.1436512890.A.854.html
※ 編輯: celestialgod (123.205.27.107), 07/10/2015 15:34:05
→
07/10 20:16, , 1F
07/10 20:16, 1F
討論串 (同標題文章)
本文引述了以下文章的的內容:
完整討論串 (本文為第 2 之 3 篇):
R_Language 近期熱門文章
PTT數位生活區 即時熱門文章