Will RCpp speed up the evaluation of basic R functions? -


pardon me, not know rcpp, trying figure out if learn in order improve package writing.

i have written r package (is supposed to) efficiently , randomly samples uniformly high-dimensional, constrained space mcmc algorithm. (unfinished and) located @ https://github.com/davidkane9/kmatching.

the problem when run statistical test called gelman-rubin diagnostic see whether mcmc algorithm has converged stationary distribution, should r = 1 statistic, high numbers, tells me sampling bad , no 1 should use it. solution take more samples , skip more (taking 1 out of ever 1000 instead of every 100). takes lot of time. if want code run, here example:

##install package first data(lalonde) matchvars = c("age", "educ", "black") k = kmatch(x = lalonde, weight.var = "treat", match.var = matchvars, n = 1000, skiplength = 1000, chains = 2, verbose = true) 

looking @ rprof output of rnorm , %*% taking of time:

                       total.time total.pct self.time self.pct "kmatch"                  1453.14    100.00      0.00     0.00 "hitandrun"               1450.18     99.79    128.80     8.86 "%*%"                      757.00     52.09    757.00    52.09 "cat"                      343.18     23.62    329.82    22.70 "rnorm"                    106.34      7.32    103.50     7.12 "mirror"                    35.26      2.43     21.84     1.50 "paste"                     14.02      0.96     14.02     0.96 "stdout"                    13.36      0.92     13.36     0.92 "runif"                     13.32      0.92     13.32     0.92 "/"                         12.82      0.88     12.82     0.88 ">"                          7.42      0.51      7.42     0.51 "<"                          6.22      0.43      6.22     0.43 "-"                          5.78      0.40      5.78     0.40 "max"                        5.18      0.36      5.18     0.36 "nchar"                      5.12      0.35      5.12     0.35 "*"                          4.84      0.33      4.84     0.33 "min"                        3.94      0.27      3.94     0.27 "sum"                        3.42      0.24      3.42     0.24 "gelman.diag"                2.90      0.20      0.00     0.00 "=="                         2.86      0.20      2.86     0.20 "ncol"                       2.84      0.20      2.84     0.20 "apply"                      2.72      0.19      0.26     0.02 "+"                          2.48      0.17      2.48     0.17 "fun"                        2.32      0.16      1.66     0.11 "^"                          2.08      0.14      2.08     0.14 ":"                          1.24      0.09      1.24     0.09 "sqrt"                       0.96      0.07      0.96     0.07 "%%"                         0.90      0.06      0.90     0.06 "mean.default"               0.62      0.04      0.62     0.04 "lapply"                     0.40      0.03      0.26     0.02 "("                          0.32      0.02      0.32     0.02 "unlist"                     0.26      0.02      0.00     0.00 "array"                      0.12      0.01      0.02     0.00 "sapply"                     0.12      0.01      0.00     0.00 "matrix"                     0.06      0.00      0.02     0.00 "null"                       0.04      0.00      0.04     0.00 "print"                      0.04      0.00      0.00     0.00 "unique"                     0.04      0.00      0.00     0.00 "abs"                        0.02      0.00      0.02     0.00 "all"                        0.02      0.00      0.02     0.00 "aperm.default"              0.02      0.00      0.02     0.00 "as.matrix.mcmc"             0.02      0.00      0.02     0.00 "file.exists"                0.02      0.00      0.02     0.00 "list"                       0.02      0.00      0.02     0.00 "print.default"              0.02      0.00      0.02     0.00 "stopifnot"                  0.02      0.00      0.02     0.00 "unique.default"             0.02      0.00      0.02     0.00 "which.min"                  0.02      0.00      0.02     0.00 "<anonymous>"                0.02      0.00      0.00     0.00 "aperm"                      0.02      0.00      0.00     0.00 "as.mcmc.list"               0.02      0.00      0.00     0.00 "as.mcmc.list.default"       0.02      0.00      0.00     0.00 "data"                       0.02      0.00      0.00     0.00 "mcmc.list"                  0.02      0.00      0.00     0.00 "print.gelman.diag"          0.02      0.00      0.00     0.00 "quantile.default"           0.02      0.00      0.00     0.00 "sort"                       0.02      0.00      0.00     0.00 "sort.default"               0.02      0.00      0.00     0.00 "sort.int"                   0.02      0.00      0.00     0.00 "summary"                    0.02      0.00      0.00     0.00 "summary.default"            0.02      0.00      0.00     0.00 

if set verbose = f, cat goes away, %*% takes 70% of time. wondering if worthwhile try write code in c++ , use rcpp, or if because functions taking time basic functions (already written in c) wouldn't worth , i'll have live or find better algorithm.

edit: according rprof, 1 line holding me u = z %*% r in hitandrun:

## loop being run millions of times , taking forever for(i in 1:(n*skiplength+discard)) {         tmin<-0;tmax<-0;         ## runs counts how many times tried pick direction, if         ## high fail.         runs = 0         while(tmin ==0 && tmax ==0) {           ## r random unit vector in basis in z           r <- rnorm(ncol(z))           r <- r/sqrt(sum(r^2))            ## u unit vector in appropriate k-plane pointing in           ## random direction z %*% r same in mirror           u <- z%*%r           c <- y/u           ## determine intersections of x + t*u walls           ## limits on how far can go backward , forward           ## i.e. maximum , minimum ratio y_i/u_i negative , positive u.           tmin <- max(-c[u>0]); tmax <- min(-c[u<0]);           ## unboundedness           if(tmin == -inf || tmax == inf){             stop("problem unbounded")           }           ## if stuck on boundary point           if(tmin==0 && tmax ==0) {             runs = runs + 1             if(runs >= 1000) stop("hitandrun found can't find feasible direction, cannot generate points")           }         }          ## chose point on line segment         y <- y + (tmin + (tmax - tmin)*runif(1))*u;          ## choose point every 'skiplength' samples         if(i %% skiplength == 0) {           x[,index] <- y           index <- index + 1         }         if(verbose) for(j in 1:nchar(str)) cat("\b")         str <- paste(i)         if(verbose) cat(str)       } 

it time matrix multiplication in sampling loop, thousands of times, once per sample taking million samples , throwing out 99%.

rcpp has in fact been used lot precisely purpose: mcmc. pretty decent speed gains on order of 30 50 or 70.

one of packages whit's rcppbugs converted rcpp general ease of use after having programmed classes wrote. casual web search 'rcpp mcmc' lead few posts.

and other authors have used rcpp well. , sits on inside of (r)stan want looping constructs inherent in mcmc running fast possible. hence compiled.

i asked rcpp-devel list last week should talk in brief r user group presentation i'll give tomorrow, , 'mcmc' suggestions more or less dominated. 1 entire talk rug presented too. i'd link thread somehow fell of gmane's archive rcpp-devel.

so in sum i'd yes, want consider using rcpp here.


Comments

Popular posts from this blog

image - ClassNotFoundException when add a prebuilt apk into system.img in android -

I need to import mysql 5.1 to 5.5? -

Java, Hibernate, MySQL - store UTC date-time -