#### Faster self cross product using Armadillo

I want to compute the cross product (ZZ‘) of a matrix Z with dimensions n x p (n is the number of rows and p is the number of columns). For some analyses I have to compute this cross product repeatedly and I was looking for an efficient way to do it. I know that Eigen may be faster in some cases but I work mostly with Armadillo (use their random number routines) and did not want to re-write all my code. I have a few implementations that I tested:

• R: tcrossprod() function.
• Plain Rcpp: Since the result is symmetric I was thinking I can only calculate the upper triangle and then copy the lower triangle. However, from the results below my implementation is clearly mediocre.
• Eigen: Using code from the RcppEigen Introduction vignette.
• Combining Armadillo and the Eigen function above (since most of my code uses Armadillo routines).

The code I used is:

``````#include <RcppArmadillo.h>
#include <RcppEigen.h>

// [[Rcpp::depends(RcppEigen)]]

// [[Rcpp::export]]
Rcpp::NumericMatrix Crossprod_rcpp(const Rcpp::NumericMatrix & Z){
int p = Z.ncol();
int n = Z.nrow();
int i, j, k;

Rcpp::NumericMatrix XpX(n, n);
for(i = 0; i < n; i++){
for(j = i; j < n; j++){
for(k = 0; k < p; k++){
XpX(i, j) = XpX(i, j) + Z(i, k)*Z(j, k);
}
XpX(j, i) = XpX(i, j);
}
}
return(XpX);
}

// [[Rcpp::export]]
arma::mat Crossprod_arma(const arma::mat & Z){
return(Z*Z.t());
}

// [[Rcpp::export]]
Eigen::MatrixXd Crossprod_eigen(const Eigen::Map<Eigen::MatrixXd> & Z){
int n = Z.rows();
return(ZZt);
}

// [[Rcpp::export]]
arma::mat Crossprod_armeig(arma::mat Z){
int n = Z.n_rows;
int p = Z.n_cols;
Eigen::MatrixXd X = Eigen::Map<Eigen::MatrixXd>(Z.memptr(), n, p);
arma::mat ZZt = arma::mat(XXt.data(), XXt.rows(), XXt.cols(), false, false);
return(ZZt);
}

``````

To test their performance in R I used the following code:

``````library(Rcpp)
library(microbenchmark)
sourceCpp('D:/Crossprodfuns.cpp')
n = 200
p = 1000
Z = matrix(runif(n*p), ncol = p, nrow = n)
microbenchmark(tcrossprod(Z), Crossprod_rcpp(Z), Crossprod_arma(Z), Crossprod_eigen(Z), Crossprod_armeig(Z), times = 100)
``````

The results from running the code are:

``````microbenchmark(tcrossprod(Z), Crossprod_rcpp(Z), Crossprod_arma(Z), Crossprod_eigen(Z), Crossprod_armeig(Z), times = 100)
Unit: milliseconds
expr       min        lq      mean    median        uq     max neval  cld
tcrossprod(Z) 12.285202 12.477001 12.766616 12.562951 12.709301 21.1945   100   c
Crossprod_rcpp(Z) 21.764501 22.178501 22.488434 22.266651 22.404251 27.1600   100    d
Crossprod_arma(Z) 12.025001 12.387151 12.571402 12.434302 12.514851 16.3147   100   c
Crossprod_eigen(Z)  3.439802  3.531801  3.801725  3.570251  3.653002 12.4534   100 a
Crossprod_armeig(Z)  4.364201  4.506051  4.712580  4.605751  4.746951  7.0822   100  b

all.equal(tcrossprod(Z), Crossprod_rcpp(Z), Crossprod_arma(Z), Crossprod_eigen(Z), Crossprod_armeig(Z))
[1] TRUE
``````

From the output above, it is clear that the Eigen and Eigen-Armadillo hybrid implementation are the most efficient. However, I’m not sure the performance of the hybrid strategy is being fairly tested since I couldn’t figure out how to pass Z as `const arma::mat & Z` in that implementation. Also, I don’t know if there is an error in that function since the R session sometimes crashes when I used it (did not happen with the other functions). Although it does not seem as much of a difference between the pure Eigen and hybrid strategy, in real life applications, the Z matrix can be significantly larger (10000 x 50000) and thus, efficiency will matter.