Skip to content
Snippets Groups Projects
Verified Commit dfad067f authored by Gärber, Florian's avatar Gärber, Florian
Browse files

refactor: `addSurrogates`

parent e07d7049
No related branches found
No related tags found
No related merge requests found
Type: Package
Package: RFSurrogates
Title: Surrogate Minimal Depth Variable Importance
Version: 0.3.3.9002
Version: 0.3.3.9003
Authors@R: c(
person("Stephan", "Seifert", , "stephan.seifert@uni-hamburg.de", role = c("aut", "cre"),
comment = c(ORCID = "0000-0003-2567-5728")),
......
......@@ -6,6 +6,10 @@
* Add `num.threads` param (passed to `mc.cores` in `parallel::mclapply()`). It defaults to 1 for backward compatability.
* Add `add_layer` param to include the effect of `addLayer` within the same loop. Defaults to `FALSE` for backward compatability.
* (Internal) `getsingletree()`: Add `add_layer` param to enable adding layers within the same loop.
* `addSurrogates()`:
* Clarified default value for `num.threads` to be `parallel::detectCores()` by adding it as a default to the parameter
* Added assertion that `RF` is a `ranger` object.
* Added assertion that `RF$num.trees` and `length(trees)` are equal. This is not considered a breaking change since these values should always be equal when the function is used correctly.
# RFSurrogates 0.3.3
......
#' Add surrogate information that was created by getTreeranger
#' Add surrogate information to a tree list.
#'
#' This function adds surrogate variables and adjusted agreement values to a forest that was created by getTreeranger.
#' This function adds surrogate variables and adjusted agreement values to a forest that was created by [getTreeranger].
#'
#' @param RF random forest object created by ranger (with keep.inbag=TRUE).
#' @param trees list of trees created by getTreeranger.
#' @param s Predefined number of surrogate splits (it may happen that the actual number of surrogate splits differes in individual nodes). Default is 1 \% of no. of variables.
#' @param RF A [ranger::ranger] object which was created with `keep.inbag = TRUE`.
#' @param trees List of trees created by [getTreeranger].
#' @param s Predefined number of surrogate splits (it may happen that the actual number of surrogate splits differs in individual nodes).
#' @param Xdata data without the dependent variable.
#' @param num.threads number of threads used for parallel execution. Default is number of CPUs available.
#' @return a list with trees containing of lists of nodes with the elements:
#' \itemize{
#' \item nodeID: ID of the respective node (important for left and right daughters in the next columns)
#' \item leftdaughter: ID of the left daughter of this node
#' \item rightdaughter: ID of the right daughter of this node
#' \item splitvariable: ID of the split variable
#' \item splitpoint: splitpoint of the split variable
#' \item status: "0" for terminal and "1" for non-terminal
#' \item layer: layer information (0 means root node, 1 means 1 layer below root, etc)
#' \item surrogate_i: numbered surrogate variables (number depending on s)
#' \item adj_i: adjusted agreement of variable i
#' }
#' @param num.threads (Default: [parallel::detectCores()]) Number of threads to spawn for parallelization.
#'
#' @returns A list of trees.
#' A list of trees containing of lists of nodes with the elements:
#' * `nodeID`: ID of the respective node (important for left and right daughters in the next columns)
#' * `leftdaughter`: ID of the left daughter of this node
#' * `rightdaughter`: ID of the right daughter of this node
#' * `splitvariable`: ID of the split variable
#' * `splitpoint`: splitpoint of the split variable
#' * `status`: `0` for terminal and `1` for non-terminal
#' * `layer`: layer information (`0` means root node, `1` means 1 layer below root, etc)
#' * `surrogate_i`: numbered surrogate variables (number depending on s)
#' * `adj_i`: adjusted agreement of variable i
#'
#' @export
addSurrogates <- function(RF, trees, s, Xdata, num.threads) {
num.trees <- length(trees)
ncat <- sapply(sapply(Xdata, levels), length) # determine number of categories (o for continuous variables)
names(ncat) <- colnames(Xdata)
addSurrogates <- function(RF, trees, s, Xdata, num.threads = parallel::detectCores()) {
if (!inherits(RF, "ranger")) {
stop("`RF` must be a ranger object.")
}
if (is.null(num.threads)) {
num.threads <- parallel::detectCores()
num.trees <- RF$num.trees
if (num.trees != length(trees)) {
stop("Number of trees in ranger model `RF` does not match number of extracted trees in `trees`.")
}
ncat <- sapply(sapply(Xdata, levels), length) # determine number of categories (o for continuous variables)
names(ncat) <- colnames(Xdata)
if (any(ncat) > 0) {
Xdata[, which(ncat > 0)] <- sapply(Xdata[, which(ncat > 0)], unclass)
}
......
......@@ -2,35 +2,36 @@
% Please edit documentation in R/addSurrogates.R
\name{addSurrogates}
\alias{addSurrogates}
\title{Add surrogate information that was created by getTreeranger}
\title{Add surrogate information to a tree list.}
\usage{
addSurrogates(RF, trees, s, Xdata, num.threads)
addSurrogates(RF, trees, s, Xdata, num.threads = parallel::detectCores())
}
\arguments{
\item{RF}{random forest object created by ranger (with keep.inbag=TRUE).}
\item{RF}{A \link[ranger:ranger]{ranger::ranger} object which was created with \code{keep.inbag = TRUE}.}
\item{trees}{list of trees created by getTreeranger.}
\item{trees}{List of trees created by \link{getTreeranger}.}
\item{s}{Predefined number of surrogate splits (it may happen that the actual number of surrogate splits differes in individual nodes). Default is 1 \% of no. of variables.}
\item{s}{Predefined number of surrogate splits (it may happen that the actual number of surrogate splits differs in individual nodes).}
\item{Xdata}{data without the dependent variable.}
\item{num.threads}{number of threads used for parallel execution. Default is number of CPUs available.}
\item{num.threads}{(Default: \code{\link[parallel:detectCores]{parallel::detectCores()}}) Number of threads to spawn for parallelization.}
}
\value{
a list with trees containing of lists of nodes with the elements:
A list of trees.
A list of trees containing of lists of nodes with the elements:
\itemize{
\item nodeID: ID of the respective node (important for left and right daughters in the next columns)
\item leftdaughter: ID of the left daughter of this node
\item rightdaughter: ID of the right daughter of this node
\item splitvariable: ID of the split variable
\item splitpoint: splitpoint of the split variable
\item status: "0" for terminal and "1" for non-terminal
\item layer: layer information (0 means root node, 1 means 1 layer below root, etc)
\item surrogate_i: numbered surrogate variables (number depending on s)
\item adj_i: adjusted agreement of variable i
\item \code{nodeID}: ID of the respective node (important for left and right daughters in the next columns)
\item \code{leftdaughter}: ID of the left daughter of this node
\item \code{rightdaughter}: ID of the right daughter of this node
\item \code{splitvariable}: ID of the split variable
\item \code{splitpoint}: splitpoint of the split variable
\item \code{status}: \code{0} for terminal and \code{1} for non-terminal
\item \code{layer}: layer information (\code{0} means root node, \code{1} means 1 layer below root, etc)
\item \code{surrogate_i}: numbered surrogate variables (number depending on s)
\item \code{adj_i}: adjusted agreement of variable i
}
}
\description{
This function adds surrogate variables and adjusted agreement values to a forest that was created by getTreeranger.
This function adds surrogate variables and adjusted agreement values to a forest that was created by \link{getTreeranger}.
}
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment