Faster self cross product using Armadillo

  armadillo, c++, eigen, r

I want to compute the cross product (ZZ‘) of a matrix Z with dimensions n x p (n is the number of rows and p is the number of columns). For some analyses I have to compute this cross product repeatedly and I was looking for an efficient way to do it. I know that Eigen may be faster in some cases but I work mostly with Armadillo (use their random number routines) and did not want to re-write all my code. I have a few implementations that I tested:

  • R: tcrossprod() function.
  • Plain Rcpp: Since the result is symmetric I was thinking I can only calculate the upper triangle and then copy the lower triangle. However, from the results below my implementation is clearly mediocre.
  • Armadillo: Using Z*Z.t()
  • Eigen: Using code from the RcppEigen Introduction vignette.
  • Combining Armadillo and the Eigen function above (since most of my code uses Armadillo routines).

The code I used is:

#include <RcppArmadillo.h>
#include <RcppEigen.h>

// [[Rcpp::depends(RcppEigen)]]

// [[Rcpp::depends(RcppArmadillo)]]

// [[Rcpp::export]]
Rcpp::NumericMatrix Crossprod_rcpp(const Rcpp::NumericMatrix & Z){
    int p = Z.ncol();
    int n = Z.nrow();
    int i, j, k;
    Rcpp::NumericMatrix XpX(n, n);
    for(i = 0; i < n; i++){
        for(j = i; j < n; j++){
            for(k = 0; k < p; k++){
                XpX(i, j) = XpX(i, j) + Z(i, k)*Z(j, k);
            XpX(j, i) = XpX(i, j);

// [[Rcpp::export]]
arma::mat Crossprod_arma(const arma::mat & Z){

// [[Rcpp::export]]
Eigen::MatrixXd Crossprod_eigen(const Eigen::Map<Eigen::MatrixXd> & Z){
    int n = Z.rows();
    Eigen::MatrixXd ZZt(Eigen::MatrixXd(n, n).setZero().selfadjointView<Eigen::Lower>().rankUpdate(Z));

// [[Rcpp::export]]
arma::mat Crossprod_armeig(arma::mat Z){
    int n = Z.n_rows;
    int p = Z.n_cols;
    Eigen::MatrixXd X = Eigen::Map<Eigen::MatrixXd>(Z.memptr(), n, p);
    Eigen::MatrixXd XXt(Eigen::MatrixXd(n, n).setZero().selfadjointView<Eigen::Lower>().rankUpdate(X));
    arma::mat ZZt = arma::mat(, XXt.rows(), XXt.cols(), false, false);

To test their performance in R I used the following code:

n = 200
p = 1000
Z = matrix(runif(n*p), ncol = p, nrow = n)
microbenchmark(tcrossprod(Z), Crossprod_rcpp(Z), Crossprod_arma(Z), Crossprod_eigen(Z), Crossprod_armeig(Z), times = 100)

The results from running the code are:

microbenchmark(tcrossprod(Z), Crossprod_rcpp(Z), Crossprod_arma(Z), Crossprod_eigen(Z), Crossprod_armeig(Z), times = 100)
Unit: milliseconds
                expr       min        lq      mean    median        uq     max neval  cld
       tcrossprod(Z) 12.285202 12.477001 12.766616 12.562951 12.709301 21.1945   100   c 
   Crossprod_rcpp(Z) 21.764501 22.178501 22.488434 22.266651 22.404251 27.1600   100    d
   Crossprod_arma(Z) 12.025001 12.387151 12.571402 12.434302 12.514851 16.3147   100   c 
  Crossprod_eigen(Z)  3.439802  3.531801  3.801725  3.570251  3.653002 12.4534   100 a   
 Crossprod_armeig(Z)  4.364201  4.506051  4.712580  4.605751  4.746951  7.0822   100  b

all.equal(tcrossprod(Z), Crossprod_rcpp(Z), Crossprod_arma(Z), Crossprod_eigen(Z), Crossprod_armeig(Z))
[1] TRUE

From the output above, it is clear that the Eigen and Eigen-Armadillo hybrid implementation are the most efficient. However, I’m not sure the performance of the hybrid strategy is being fairly tested since I couldn’t figure out how to pass Z as const arma::mat & Z in that implementation. Also, I don’t know if there is an error in that function since the R session sometimes crashes when I used it (did not happen with the other functions). Although it does not seem as much of a difference between the pure Eigen and hybrid strategy, in real life applications, the Z matrix can be significantly larger (10000 x 50000) and thus, efficiency will matter.

I’d appreciate your input.

Source: Windows Questions C++