Re: [問題] merge 3 tables with summing common var
速度看起來還好? 還是我搞錯cywhale想要做的是什麼?
library(data.table)
library(dplyr)
# testing data, assuming merge by key = "SP"
set.seed(NULL)
x <- matrix(sample(1e6), 1e5) %>% data.table() %>%
setnames(1:10,sample(LETTERS,10)) %>% .[,SP:=seq_len(nrow(.))]
y <- matrix(sample(1e5), 1e4) %>% data.table() %>%
setnames(1:10,sample(LETTERS,10)) %>% .[,SP:=seq_len(nrow(.))]
z <- matrix(sample(4e5), 2e4) %>% data.table() %>%
setnames(1:20,sample(LETTERS,20)) %>% .[,SP:=seq_len(nrow(.))]
###### mycode
t = proc.time()
xyz <- x %>% full_join(y, by='SP') %>% full_join(z, by='SP') %>%
as.data.table()
mut_list <- unique(substr(names(xyz)[grep('.', names(xyz), fix=T)],1,1))
for(i in 1:length(mut_list)){
mycols <- grep(mut_list[i], names(xyz), fix=T)
xyz[,mySum := rowSums(.SD), .SDcols=mycols]
xyz[,(mycols):= NULL]
names(xyz)[names(xyz)=="mySum"] <- mut_list[i]
cat(paste0(mut_list[i]),"\n")
}
proc.time() - t
※ 引述《cywhale (cywhale)》之銘言:
: [問題類型]:
:
: 效能諮詢(我想讓R 跑更快)
:
: 好像在哪曾看過較簡易的寫法或function,但一時想不起,也沒找到,寫了比較複雜的
: code,想請問是否有更快或更簡易的方式做到
: [軟體熟悉度]:
: 請把以下不需要的部份刪除
: 入門(寫過其他程式,只是對語法不熟悉)
: [問題敘述]:
: 請簡略描述你所要做的事情,或是這個程式的目的
: Merge some data tables by the same key, 但若有相同的variables則合併時要相加,
: 不管NA,data tables彼此間的行、列數均不同
: [程式範例]:
:
:
: library(data.table)
: library(dplyr)
: # testing data, assuming merge by key = "SP"
: set.seed(NULL)
: x <- matrix(sample(1e6), 1e5) %>% data.table() %>%
: setnames(1:10,sample(LETTERS,10)) %>% .[,SP:=seq_len(nrow(.))]
: y <- matrix(sample(1e5), 1e4) %>% data.table() %>%
: setnames(1:10,sample(LETTERS,10)) %>% .[,SP:=seq_len(nrow(.))]
: z <- matrix(sample(4e5), 2e4) %>% data.table() %>%
: setnames(1:20,sample(LETTERS,20)) %>% .[,SP:=seq_len(nrow(.))]
: # function.. try to write Rcpp function..
: require(Rcpp)
: cppFunction('NumericVector addv(NumericVector x, NumericVector y) {
: NumericVector out(x.size());
: NumericVector::iterator x_it,y_it,out_it;
: for (x_it = x.begin(), y_it=y.begin(), out_it = out.begin();
: x_it != x.end(); ++x_it, ++y_it, ++out_it) {
: if (ISNA(*x_it)) {
: *out_it = *y_it;
: } else if (ISNA(*y_it)) {
: *out_it = *x_it;
: } else {
: *out_it = *x_it + *y_it;
: }
: }
: return out;}')
: ### merge two data.table with different columns/rows,
: ### and summing identical column names
: outer_join2 <- function (df1,df2,byNames) {
: tt=intersect(colnames(df1)[-match(byNames,colnames(df1))],
: colnames(df2)[-match(byNames,colnames(df2))])
: df <- merge(df2,df1[,-tt,with=F],by=byNames,all=T)
: dt <- merge(df2[,-tt,with=F],df1[,c(byNames,tt),with=F],by=byNames,all=T) %>%
: .[,tt,with=F]
: for (j in colnames(dt)) {set(df,j=j,value=addv(df[[j]],dt[[j]]))}
: return (df)
: }
: # get results, 參考c大 #1LaHm_aH (R_Language)
: system.time(Reduce(function(x, y) outer_join2(x, y, byNames="SP"), list(x,y,z)))
: 用了較多行code來完成這件事,速度上似乎還可以,但不確定是否有更好的寫法?謝謝!
: [關鍵字]:
:
: 選擇性,也許未來有用
:
--
※ 發信站: 批踢踢實業坊(ptt.cc), 來自: 140.109.73.102
※ 文章網址: https://www.ptt.cc/bbs/R_Language/M.1444722039.A.6EB.html
→
10/13 16:37, , 1F
10/13 16:37, 1F
→
10/13 16:37, , 2F
10/13 16:37, 2F
→
10/13 16:37, , 3F
10/13 16:37, 3F
→
10/13 16:45, , 4F
10/13 16:45, 4F
→
10/13 16:45, , 5F
10/13 16:45, 5F
→
10/13 16:50, , 6F
10/13 16:50, 6F
→
10/13 16:50, , 7F
10/13 16:50, 7F
推
10/13 17:29, , 8F
10/13 17:29, 8F
推
10/13 17:29, , 9F
10/13 17:29, 9F
→
10/13 17:30, , 10F
10/13 17:30, 10F
→
10/13 17:39, , 11F
10/13 17:39, 11F
→
10/13 17:43, , 12F
10/13 17:43, 12F
→
10/13 17:43, , 13F
10/13 17:43, 13F
→
10/13 17:51, , 14F
10/13 17:51, 14F
→
10/13 17:51, , 15F
10/13 17:51, 15F
→
10/13 17:52, , 16F
10/13 17:52, 16F
→
10/13 17:52, , 17F
10/13 17:52, 17F
推
10/13 19:13, , 18F
10/13 19:13, 18F
→
10/13 19:14, , 19F
10/13 19:14, 19F
→
10/13 19:15, , 20F
10/13 19:15, 20F
→
10/13 19:16, , 21F
10/13 19:16, 21F
→
10/13 19:16, , 22F
10/13 19:16, 22F
→
10/13 19:40, , 23F
10/13 19:40, 23F
→
10/13 19:41, , 24F
10/13 19:41, 24F
推
10/13 19:42, , 25F
10/13 19:42, 25F
推
10/13 19:42, , 26F
10/13 19:42, 26F
討論串 (同標題文章)
本文引述了以下文章的的內容:
完整討論串 (本文為第 4 之 5 篇):
R_Language 近期熱門文章
PTT數位生活區 即時熱門文章