Newer
Older
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/var.relations.mfi.R
\name{var.relations.mfi}
\alias{var.relations.mfi}
\title{Investigate variable relations of a specific variable with mutual forest impact (corrected mean adjusted agreement).}
\usage{
var.relations.mfi(
x = NULL,
y = NULL,
num.trees = 500,
type = "regression",
s = NULL,
mtry = NULL,
min.node.size = 1,
num.threads = NULL,
status = NULL,
save.ranger = FALSE,
create.forest = TRUE,
forest = NULL,
save.memory = FALSE,
case.weights = NULL,
variables,
candidates,
p.t = 0.01,
select.rel = TRUE,
method = "janitza"
)
}
\arguments{
\item{x}{data.frame of predictor variables with variables in
columns and samples in rows (Note: missing values are not allowed)}
\item{y}{vector with values of phenotype variable (Note: will be converted to factor if
classification mode is used). For survival forests this is the time variable.}
\item{num.trees}{number of trees. Default is 500.}
\item{type}{mode of prediction ("regression", "classification" or "survival"). Default is regression.}
\item{s}{predefined number of surrogate splits (it may happen that the actual number of surrogate splits differs in individual nodes). Default is 1 \% of no. of variables.}
\item{mtry}{number of variables to possibly split at in each node. Default is no. of variables^(3/4) ("^3/4") as recommended by (Ishwaran 2011). Also possible is "sqrt" and "0.5" to use the square root or half of the no. of variables.}
\item{min.node.size}{minimal node size. Default is 1.}
\item{num.threads}{number of threads used for determination of relations. Default is number of CPUs available.}
\item{status}{status variable, only applicable to survival data. Use 1 for event and 0 for censoring.}
\item{save.ranger}{set TRUE if ranger object should be saved. Default is that ranger object is not saved (FALSE).}
\item{create.forest}{set FALSE if you want to analyze an existing forest. Default is TRUE.}
\item{forest}{the random forest that should be analyzed if create.forest is set to FALSE. (x and y still have to be given to obtain variable names)}
\item{save.memory}{Use memory saving (but slower) splitting mode. No effect for survival and GWAS data. Warning: This option slows down the tree growing, use only if you encounter memory problems. (This parameter is transfered to ranger)}
\item{case.weights}{Weights for sampling of training observations. Observations with larger weights will be selected with higher probability in the bootstrap (or subsampled) samples for the trees.}
\item{variables}{variable names (string) for which related variables should be searched for (has to be contained in allvariables)}
\item{candidates}{vector of variable names (strings) that are candidates to be related to the variables (has to be contained in allvariables)}
\item{p.t}{p.value threshold for selection of related variables. Default is 0.01.}
\item{select.rel}{set False if only relations should be calculated and no related variables should be selected.}
\item{method}{Method to compute p-values. Use "janitza" for the method by Janitza et al. (2016) or "permutation" to utilize permuted relations.}
}
\value{
a list containing:
\itemize{
\item variables: the variables to which relations are investigated.
\item surr.res: a matrix with the mutual forest impact values with variables in rows and candidates in columns.
\item surr.perm: a matrix with the mutual forest impact values of the permuted variables with variables in rows and candidates in columns.
\item p.rel: a list with the obtained p-values for the relation analysis of each variable.
\item var.rel: a list with vectors of related variables for each variable.
\item ranger: ranger objects.
\item method: Method to compute p-values: "janitza" or "permutation".
\item p.t: p.value threshold for selection of related variables
}
}
\description{
This function corrects the mean adjusted agreement by a permutation approach and generates the relation parameter mutual forest impact. Subsequently p-values are determined and related variables are selected.
}
\examples{
# read data
data("SMD_example_data")
x = SMD_example_data[,2:ncol(SMD_example_data)]
y = SMD_example_data[,1]
\donttest{
# calculate variable relations
set.seed(42)
res = var.relations.mfi(x = x, y = y, s = 10, num.trees = 100, variables = c("X1","X7"), candidates = colnames(x)[1:100])
res$var.rel[[1]]
}
}