[問題] merge 3 tables with summing common var
[問題類型]:
效能諮詢(我想讓R 跑更快)
好像在哪曾看過較簡易的寫法或function,但一時想不起,也沒找到,寫了比較複雜的
code,想請問是否有更快或更簡易的方式做到
[軟體熟悉度]:
請把以下不需要的部份刪除
入門(寫過其他程式,只是對語法不熟悉)
[問題敘述]:
請簡略描述你所要做的事情,或是這個程式的目的
Merge some data tables by the same key, 但若有相同的variables則合併時要相加,
不管NA,data tables彼此間的行、列數均不同
[程式範例]:
library(data.table)
library(dplyr)
# testing data, assuming merge by key = "SP"
set.seed(NULL)
x <- matrix(sample(1e6), 1e5) %>% data.table() %>%
setnames(1:10,sample(LETTERS,10)) %>% .[,SP:=seq_len(nrow(.))]
y <- matrix(sample(1e5), 1e4) %>% data.table() %>%
setnames(1:10,sample(LETTERS,10)) %>% .[,SP:=seq_len(nrow(.))]
z <- matrix(sample(4e5), 2e4) %>% data.table() %>%
setnames(1:20,sample(LETTERS,20)) %>% .[,SP:=seq_len(nrow(.))]
# function.. try to write Rcpp function..
require(Rcpp)
cppFunction('NumericVector addv(NumericVector x, NumericVector y) {
NumericVector out(x.size());
NumericVector::iterator x_it,y_it,out_it;
for (x_it = x.begin(), y_it=y.begin(), out_it = out.begin();
x_it != x.end(); ++x_it, ++y_it, ++out_it) {
if (ISNA(*x_it)) {
*out_it = *y_it;
} else if (ISNA(*y_it)) {
*out_it = *x_it;
} else {
*out_it = *x_it + *y_it;
}
}
return out;}')
### merge two data.table with different columns/rows,
### and summing identical column names
outer_join2 <- function (df1,df2,byNames) {
tt=intersect(colnames(df1)[-match(byNames,colnames(df1))],
colnames(df2)[-match(byNames,colnames(df2))])
df <- merge(df2,df1[,-tt,with=F],by=byNames,all=T)
dt <- merge(df2[,-tt,with=F],df1[,c(byNames,tt),with=F],by=byNames,all=T) %>%
.[,tt,with=F]
for (j in colnames(dt)) {set(df,j=j,value=addv(df[[j]],dt[[j]]))}
return (df)
}
# get results, 參考c大 #1LaHm_aH (R_Language)
system.time(Reduce(function(x, y) outer_join2(x, y, byNames="SP"), list(x,y,z)))
用了較多行code來完成這件事,速度上似乎還可以,但不確定是否有更好的寫法?謝謝!
[關鍵字]:
選擇性,也許未來有用
--
※ 發信站: 批踢踢實業坊(ptt.cc), 來自: 140.112.65.48
※ 文章網址: https://www.ptt.cc/bbs/R_Language/M.1444640089.A.EE0.html
推
10/15 17:40, , 1F
10/15 17:40, 1F
→
10/15 21:51, , 2F
10/15 21:51, 2F
→
10/15 21:57, , 3F
10/15 21:57, 3F
推
10/15 21:58, , 4F
10/15 21:58, 4F
→
10/15 21:58, , 5F
10/15 21:58, 5F
※ 編輯: cywhale (36.228.159.121), 10/15/2015 22:07:23
→
10/15 22:08, , 6F
10/15 22:08, 6F
推
10/15 22:08, , 7F
10/15 22:08, 7F
討論串 (同標題文章)
以下文章回應了本文 (最舊先):
完整討論串 (本文為第 1 之 5 篇):
R_Language 近期熱門文章
PTT數位生活區 即時熱門文章