Re: [問題] 同一檔案內的資料合併
※ 引述《thea (裏A)》之銘言:
: [問題類型]:
: 程式諮詢(我想用R 做某件事情,但是我不知道要怎麼用R 寫出來)
: [軟體熟悉度]:
: 入門(寫過其他程式,只是對語法不熟悉)
: [問題敘述]:
: 拿到一個資料,是逗號分隔的檔案
: 資料類似如下
: id,20140101
: user,ABC001,1
: user,ADE002,2
: user,TEX001,3
: event,ABC001,T,C
: event,ADE002,P,RUR
: event,TEX001,pej,C
: id,20140201
: user,ABC001,1
: user,ADE002,2
: user,TEX001,3
: event,ABC001,T,C
: event,ADE002,P,RUR
: event,TEX001,pej,C
: .
: .
: .
: 也就是說,資料是以id為單位做切割,所以我需要整理成以下格式
: (以event為資料角度,把id跟user資料加入event)
: id,event1,event2,event3,user
: 20140101,ABC001,T,C,1
: 20140101,ADE002,P,PUR,2
: 20140101,TEX001,pej,C,3
: 20140201,ABC001,T,C,1
: 20140201,ADE002,P,PUR,2
: 20140201,TEX001,pej,C,3
: [程式範例]:
: 直覺上應該是要用迴圈來處理?
: 不過因為有一些對照(user跟event)
: 之前處理的格式都是比較整齊的csv/excel
: 第一次遇到這種資料格式
: 找了一些資料有點沒頭緒,所以上來請教T__T
: 感謝!!
src = 'id,20140101
user,ABC001,1
user,ADE002,2
user,TEX001,3
event,ABC001,T,C
event,ADE002,P,RUR
event,TEX001,pej,C
id,20140201
user,ABC001,1
user,ADE002,2
user,TEX001,3
event,ABC001,T,C
event,ADE002,P,RUR
event,TEX001,pej,C
id,20140301
user,ABC001,1
user,ADE002,2
user,TEX001,3
event,ABC001,T,C
event,ADE002,P,RUR
event,TEX001,pej,C'
lines = readLines(textConnection(src)) # 這裡可以直接用 readLines(filename)
splitList = strsplit(lines, ",")
len = sapply(splitList, length)
loc = which(len == 2)
loc = c(loc, length(len)+1)
dat = do.call(rbind, lapply(1:(length(loc)-1), function(i){
len_sub = len[(loc[i]+1):(loc[i+1]-1)]
id = splitList[[loc[i]]]
user = do.call(rbind, splitList[loc[i]+which(len_sub==3)])
event = do.call(rbind, splitList[loc[i]+which(len_sub==4)])
cbind(id[2], event[,2:4], user[match(event[,2],user[,2]),3])
}))
dat
# [,1] [,2] [,3] [,4] [,5]
# [1,] "20140101" "ABC001" "T" "C" "1"
# [2,] "20140101" "ADE002" "P" "RUR" "2"
# [3,] "20140101" "TEX001" "pej" "C" "3"
# [4,] "20140201" "ABC001" "T" "C" "1"
# [5,] "20140201" "ADE002" "P" "RUR" "2"
# [6,] "20140201" "TEX001" "pej" "C" "3"
# [7,] "20140301" "ABC001" "T" "C" "1"
# [8,] "20140301" "ADE002" "P" "RUR" "2"
# [9,] "20140301" "TEX001" "pej" "C" "3"
--
※ 發信站: 批踢踢實業坊(ptt.cc), 來自: 123.205.27.107
※ 文章網址: https://www.ptt.cc/bbs/R_Language/M.1442371798.A.0C3.html
推
09/16 22:34, , 1F
09/16 22:34, 1F
推
09/20 11:16, , 2F
09/20 11:16, 2F
→
09/20 11:17, , 3F
09/20 11:17, 3F
print match(user[,2],event[,2])這個出來看
有沒有超過user的matrix大小
→
09/24 01:01, , 4F
09/24 01:01, 4F
→
09/24 01:01, , 5F
09/24 01:01, 5F
我match寫反了...~"~ 我修正 抱歉....
※ 編輯: celestialgod (140.109.73.159), 09/24/2015 08:09:20
推
09/24 23:58, , 6F
09/24 23:58, 6F
討論串 (同標題文章)
R_Language 近期熱門文章
PTT數位生活區 即時熱門文章