% Generated by roxygen2: do not edit by hand % Please edit documentation in R/var.relations.mfi.R \name{var.relations.mfi} \alias{var.relations.mfi} \title{Investigate variable relations of a specific variable with mutual forest impact (corrected mean adjusted agreement).} \usage{ var.relations.mfi( x = NULL, y = NULL, num.trees = 500, type = "regression", s = NULL, mtry = NULL, min.node.size = 1, num.threads = NULL, status = NULL, save.ranger = FALSE, create.forest = TRUE, forest = NULL, save.memory = FALSE, case.weights = NULL, variables, candidates, p.t = 0.01, select.rel = TRUE, method = "janitza" ) } \arguments{ \item{x}{data.frame of predictor variables with variables in columns and samples in rows (Note: missing values are not allowed)} \item{y}{vector with values of phenotype variable (Note: will be converted to factor if classification mode is used). For survival forests this is the time variable.} \item{num.trees}{number of trees. Default is 500.} \item{type}{mode of prediction ("regression", "classification" or "survival"). Default is regression.} \item{s}{predefined number of surrogate splits (it may happen that the actual number of surrogate splits differs in individual nodes). Default is 1 \% of no. of variables.} \item{mtry}{number of variables to possibly split at in each node. Default is no. of variables^(3/4) ("^3/4") as recommended by (Ishwaran 2011). Also possible is "sqrt" and "0.5" to use the square root or half of the no. of variables.} \item{min.node.size}{minimal node size. Default is 1.} \item{num.threads}{number of threads used for determination of relations. Default is number of CPUs available.} \item{status}{status variable, only applicable to survival data. Use 1 for event and 0 for censoring.} \item{save.ranger}{set TRUE if ranger object should be saved. Default is that ranger object is not saved (FALSE).} \item{create.forest}{set FALSE if you want to analyze an existing forest. Default is TRUE.} \item{forest}{the random forest that should be analyzed if create.forest is set to FALSE. (x and y still have to be given to obtain variable names)} \item{save.memory}{Use memory saving (but slower) splitting mode. No effect for survival and GWAS data. Warning: This option slows down the tree growing, use only if you encounter memory problems. (This parameter is transfered to ranger)} \item{case.weights}{Weights for sampling of training observations. Observations with larger weights will be selected with higher probability in the bootstrap (or subsampled) samples for the trees.} \item{variables}{variable names (string) for which related variables should be searched for (has to be contained in allvariables)} \item{candidates}{vector of variable names (strings) that are candidates to be related to the variables (has to be contained in allvariables)} \item{p.t}{p.value threshold for selection of related variables. Default is 0.01.} \item{select.rel}{set False if only relations should be calculated and no related variables should be selected.} \item{method}{Method to compute p-values. Use "janitza" for the method by Janitza et al. (2016) or "permutation" to utilize permuted relations.} } \value{ a list containing: \itemize{ \item variables: the variables to which relations are investigated. \item surr.res: a matrix with the mutual forest impact values with variables in rows and candidates in columns. \item surr.perm: a matrix with the mutual forest impact values of the permuted variables with variables in rows and candidates in columns. \item p.rel: a list with the obtained p-values for the relation analysis of each variable. \item var.rel: a list with vectors of related variables for each variable. \item ranger: ranger objects. \item method: Method to compute p-values: "janitza" or "permutation". \item p.t: p.value threshold for selection of related variables } } \description{ This function corrects the mean adjusted agreement by a permutation approach and generates the relation parameter mutual forest impact. Subsequently p-values are determined and related variables are selected. } \examples{ # read data data("SMD_example_data") x = SMD_example_data[,2:ncol(SMD_example_data)] y = SMD_example_data[,1] \donttest{ # calculate variable relations set.seed(42) res = var.relations.mfi(x = x, y = y, s = 10, num.trees = 100, variables = c("X1","X7"), candidates = colnames(x)[1:100]) res$var.rel[[1]] } }