Multiple k-fold cross-validation to test for the best combination of covariates, model type and family.

findBestModel(
  x,
  datatype,
  corSpearman,
  saveWD = paste0(tempdir(), "/outputs"),
  zip.file = TRUE,
  restart = NULL,
  MPIcalc = FALSE,
  verbose = 0,
  na.max = 0.5,
  test = "wilcoxon"
)

Arguments

x

dataset (dataframe or SpatialPointsDataFrame) prepared with Prepare_dataset function. All column are supposed to be covariates except 'dataY' column. Factor/Character columns should have a name starting with 'factor_'

datatype

The data type to be chosen for modelling among 'PA', 'Density', 'ContPosNull', 'Count', 'TweedGLM', 'KrigeGLM'. (See modelselect_opt for more options and information)

corSpearman

dataframe of correlation between covariates as calculated by function Param_corr

saveWD

directory where to save all outputs of the cross-validation procedure. Folder is created if not exists. Tmp file if not defined.

zip.file

Logical or file path where to save all outputs in zip. In tmpdir by default. If FALSE, outputs are not zipped and the path to saveWD is returned.

restart

numeric vector. If you stopped the analysis for any reasons, you can restart it at the modeltype step you want. Provide a vector of values, so that all modeltypes with the corresponding positions will be re-calculated. In this procedure, the modelselect_opt.save file saved in saveWD will be loaded. You can unzip your previously saved file and define this unzip folder as saveWD.

MPIcalc

Logical. Whether the function is run within a MPI cluster or locally.

verbose

Numeric. 0: no message, 1:few messages, 2:all messages

na.max

proportion maximum of NA value allowed in one distribution. If proportion of NA is upper na.max, model is ranked at the end and no p-value is calculated

test

test used to compare distribution as used by svyranktest