How to Rewrite this R Code in a Single Function

  arima, parallel-processing, r, windows

This R code searches the exact seed that produces the desired coefficient of a simulated autoregressive of order 1 AR(1) time-series data using the arima.sim() and forecast::auto.arima() functions.

The Problem that Led to the Searching of Exact Seed

If one uses arima.sim() function to simulate an AR(1) time-series data and at the same time want to know the parameter or order of such simulated data, one discovers that the specified parameter ar and the order of the simulated AR(1) is not as specified in the arima.sim() function. Take a look at this.

set.seed(1)
ar1 <- arima.sim(n=10, model=list(ar=0.8, order=c(1, 0, 0)), sd=1)
auto_ar <- forecast::auto.arima(ar1, ic="aicc")
auto_ar

#Series: ar1 
#ARIMA(0,0,0) with zero mean 

#sigma^2 estimated as 1.008:  log likelihood=-14.23
#AIC=30.46   AICc=30.96   BIC=30.76

When the seed is set at 1, (set.seed(1)) in the above, order and coefficient of autocorrelation is not (1, 0, 0) and 0.8 respectively as it was set in the arima.sim() function compare to when one set the seed to 289805 (set.seed(289805)) instead of 1 or some other seed.

set.seed(289805)
ar1 <- arima.sim(n=10, model=list(ar=0.8, order=c(1, 0, 0)), sd=1)
auto_ar <- forecast::auto.arima(ar1, ic="aicc")
auto_ar

#Series: ar1 
#ARIMA(1,0,0) with zero mean 

#Coefficients:
         #ar1
      #0.8000
#s.e.  0.1458

#sigma^2 estimated as 1.849:  log likelihood=-17.25
#AIC=38.49   AICc=40.21   BIC=39.1

This R code searched for the set of seeds that print the correct parameter

.

Here is the original function

FUN <- function(i) {
  set.seed(i)
  ar1 <- arima.sim(n=100, model=list(ar=0.8, order=c(1, 0, 0)), sd=1)
  ar2 <- auto.arima(ar1, ic="aicc")
  cf <- ar2$coef
  ## case handling
  if (length(cf) == 0) rep(NA, 2)  ## sometimes result is `character(0)` -> NA
  else if (substr(cf[1], 1, 5) %in% "0.800") c(cf, i)  ## hit, that's what we want
  else rep(NA, 2)  ## all other cases -> NA
}

R <- 1e3  ## this would be your 1e5
seedv <- 1:R  ## or use custom seed vector

library(parallel)
cl <- makeCluster(detectCores() - 1)  ## for all cores remove `- 1`
clusterExport(cl, c("FUN"), envir=environment())
clusterEvalQ(cl, suppressPackageStartupMessages(library(forecast)))

res <- `colnames<-`(t(parSapply(cl, seedv, "FUN")), c("cf", "seed"))

stopCluster(cl)

What I Want

I want to rewrite this R code to be in a single function (though it may be in a composite function) such that all I need to do is to run and call it with its arguments.

Here is what I tried

FUN2 <- function(i, n, ar, sd, arr, R) {
  set.seed(i)
  ar1 <- arima.sim(n=n, model=list(ar=ar, order=c(1, 0, 0)), sd=sd)
  ar2 <- auto.arima(ar1, ic="aicc")
  (cf <- ar2$coef)
  if (length(cf) == 0) {
    rep(NA, 2)
  }
  else if (all(grepl(c("ar1|intercept"), names(cf))) &  ## using `grepl`
           substr(cf["ar1"], 1, 5) %in% "arr") { 
    c(cf, seed=i)
  }
  else {
    rep(NA, 2)
  }
}

seedv <- 1:R

library(parallel)
cl <- makeCluster(detectCores() - 1)
clusterExport(cl, c("FUN2"), envir=environment())
clusterEvalQ(cl, suppressPackageStartupMessages(library(forecast)))

res <- parLapply(cl, seedv, "FUN")

res1 <- res[!sapply(res, anyNA)]  ## filter out NAs

stopCluster(cl)

FUN2(n = 10, ar = 0.8, sd = 1, arr = 0.800, R = 1000)
  1. n is the sample size.
  2. ar is the $phi$ value.
  3. sd is the value of standard deviation.
  4. arr is the value of $phi$ sort for.
  5. R is maximum number of seed searched starting from 1.

Source: Windows Questions

LEAVE A COMMENT