Tree-Based Model

INTRODUCTION
Merriam-Webster dictionary posit a model to entail, a system of postulates, data,

and inferences presented as a mathematical description of an entity or state of
affairs or better used as, an example for imitation or emulation.
Tree-based model is what we would be looking into in this work and it is

imperative to begin this work by adopting that, Tree-based classification models
are a type of supervised machine learning algorithm that uses a series of
conditional statements to partition training data into subsets. Each successive split
adds some complexity to the model, which can be used to make predictions. The
end result model can be visualized as a roadmap of logical tests that describes the
data set. Decision trees are popular for small-to-medium-sized data sets because
they are easy to implement and even easier to interpret. (K.C Lee, 2020).
Background and related work

Tree-Based Model
Tree-based models use a decision tree to represent how different input variables
can be used to predict a target value. Machine learning uses tree-based models for
both classification and regression problems, such as the type of animal or value of
a home. The input variables are repeatedly segmented into subsets to build the
decision tree, and each branch is tested for prediction accuracy and evaluated for
efficiency and effectiveness. Splitting the variables in a different order can reduce
the number of layers and calculations required to produce an accurate prediction.
Generating a successful decision tree results in the most important variables (most
influential on the prediction) being at the top of the tree hierarchy, while irrelevant
features get dropped from the hierarchy.
Decision tree
Decision tree learning is a supervised learning approach used in statistics, data

mining and machine learning. In this formalism, a classification or regression
decision tree is used as a predictive model to draw conclusions about a set of
observations.
Tree models where the target variable can take a discrete set of values are called
classification trees; in these tree structures, leaves represent class labels and
branches represent conjunctions of features that lead to those class labels. Decision
trees where the target variable can take continuous values (typically real numbers)
are called regression trees.
Decision trees are among the most popular machine learning algorithms given their
intelligibility and simplicity.
In decision analysis, a decision tree can be used to visually and explicitly represent
decisions and decision making. In data mining, a decision tree describes data (but
the resulting classification tree can be an input for decision making)
Tree-based methods are deemed to be one of the best and powerful supervised
learning subsets used for classification and regression. These algorithms employ
predictive models with rapid performance, high accuracy, and easy interpretation.
In contrast with linear models, the methods map nonlinear relationships perfectly
and are capable of adapting to various kinds of problems in the machine learning
area.
Figure 1 Example of a decision tree based on the wine quality red data set. The goal is to
predict the quality rating of a (red) wine using chemical properties of the wine. The shaded
nodes and dashed edges indicate how an observation with citric.acid = 0.22, density = 0.993,
sulphates = 0.63, alcohol = 10.6, fixed.acidity = 4.9, is mapped to a prediction (value of 5.508).
Key issues in Tree-based model solution
Determining how deep to grow the decision tree, handling continuous attributes,
choosing an appropriate attribute selection measure, and handling training data
with missing attribute values, handling attributes with different costs, and
improving computational efficiency are all practical issues in learning decision
trees.
Overfitting the Data:
Each branch of the tree is grown just deep enough by the algorithm to properly
categorize the training instances. In reality, when there is noise in the data or when
the number of training instances is insufficient to provide a representative sample
of the underlying target function, it might cause problems. This basic technique
may yield trees that overfit the training samples in either instance.
Related work:
Tree-based based model became popular in machine learning with the introduction
of two algorithms, ID3 (iterative dichotomiser) and CART (classification and
regression Tree-based). Tree-based based model gained popularity due to their
interpretability, but were found to be generally less accurate than other models
such as linear and logistic regression.
A number of ideas were thus consequently proposed for improving the accuracy of
Tree-based-based models, which are all based on constructing an ensemble of
Tree-based-based models. Katie Gross (2020) proposed the idea of bootstrap
aggregation, or bagging, where one builds a collection of predictive models, each
trained with a bootstrapped sample of the original training set; the predictions of
each model are then aggregated into a single prediction (by majority vote for
classification and by averaging for regression). The motivation for bagging is that
it reduces the prediction error for predictive models that are unstable/highly
sensitive to the training data (such as CART); indeed, Katie Gross (2020) showed
that bagged regression Tree-based can be significantly better than ordinary
regression Tree-based in out-of-sample prediction error. Later, Yuncheng Wu et al
(2020) proposed the random forest model, where one builds a collection of bagged
CART Tree-based for which the subset of features selected for splitting at each
node of each Tree-based is randomly sampled from the set of all features (the so-
called random subspace method). Concurrently, other research has considered the
idea of boosting (Momoko and Kazuhisa, 2022), wherein one iteratively builds a
weighted collection of basic predictive models (such as CART Tree-based), with
the goal of reducing the prediction error with each iteration.
Tree-based model occupy a central place in machine learning because they

generally work very well in practice. In a systematic comparison of 179 different
prediction methods on a broad set of benchmark data sets, YAZAN et al, (2022)
found that random forests achieved best or near-best performance over all of these
data sets. Boosted Tree-based have been similarly successful: on the data science
competition website Kaggle, one popular implementation of boosted Tree-based,
XGBoost, was used in more than half of the winning solutions in the year 2015
(Mohammad et al, 2020). There exist robust and opensource software
implementations of many Tree-based models. For boosted Tree-based, the R
package gbm (Diyuan et al, 2022) and XGBoost are widely used; for random
forests, the R package random Forest (Mounira et al, 2020) is extremely popular.
There has also been a significant concurrent effort to develop a theoretical
foundation for Tree-based model methods; we briefly survey some of this work for
random forests. For random forests, the original paper of Yuncheng Wu et al
(2020) developed an upper bound on the generalization error of a random forest.
Later research studied the consistency of simplified versions of the random forest
model (e.g., Suliman et al, 2021) as well as the original model (e.g., K.C Lee
2020). For an overview of recent theoretical advances in random forests, the reader
is referred to (Priya, 2022). At the same time, there is an increasing number of
papers originating in operations research where a predictive model is used to
represent the effect of the decision, and one solves an optimization problem to find
the best decision with respect to this predictive model. Aside from the papers
already mentioned in clinical trials and promotion planning, we mention two other
examples. In pricing, data on historical prices and demand observed at those prices
is often used to build a predictive model of demand as a function of price and to
then determine the optimal price (e.g., AMIRHOSSEIN et al, 2020). In assortment
optimization, Miˇsić (2020) considers a two-step approach, where the first step
involves estimating a ranking-based choice model from historical data, and the
second involves solving an MIO model to optimize the revenue predicted under
that model. In the research literature where predictive models are used to
understand and subsequently optimize decisions, the closest paper conceptually to
this one is the paper of Ferreira et al. (2021).
To solve the problem, the paper builds a random forest model of the demand of
each style included in a sale as a function of the style’s price and the average price
of the other styles. The paper then formulates an MIO problem to determine the
optimal prices for the styles to be included in the sale, where the revenue is based
on the price and the demand (as predicted by the random forest) of each style. The
MIO formulation does not explicitly model the random forest prediction
mechanism–instead, one computes the random forest prediction for each style at
each of its possible prices and at each possible average price of the other styles,
and these predictions enter the MIO model as coefficients in the objective
function. (The predictions could just as easily have come from a different form of
predictive model, without changing the structure of the optimization problem.) In
contrast, our MIO formulation explicitly represents the structure of each Tree-
based in the model, allowing the prediction of each Tree-based model to be
determined through the variables and constraints of the MIO. Although the
modeling approach of Ferreira et al. (2021) is feasible for their pricing problem, it
is difficult to extend this approach when there are many independent variables, as
one would need to enumerate all possible combinations of values for them and
compute the Tree-based ensemble’s prediction for each combination of values. To
the best of our knowledge, our paper is the first to conceptualize the problem of
how to optimize an objective function that is given by a Tree-based model.
Methodologically, the present paper is most related to the paper of Bertsimas and
Miˇsić (2020), which considers the problem of designing a product line or
assortment to optimize revenue under a ranking-based choice model. The ranking-
based model considered in Bertsimas and Miˇsić (2020) can be understood as a
type of Tree-based ensemble model; as such, the MIO formulation of Bertsimas
and Miˇsić (2020) can be regarded as a special case of the more general
formulation that we analyze here. Some of the theoretical results found in
Bertsimas and Miˇsić (2020) – specifically, those results on the structure of the
Benders cuts – are generalized in the present paper to Tree-based models. Despite
this similarity, the goals of the two papers are different. Bertsimas and Miˇsić
(2020) consider an optimization approach specifically for product line decisions,
whereas in the present paper, our methodology can be applied to any Tree-based
model, thus spanning a significantly broader range of application domains.
Research method
This Research is developed on the foundation of the literature review of tools and
technologies with methods and techniques used in analysing tree-based model
Research question
The study tries to answer the following research questions:
Research question 1: what are the methods used in optimizing tree-based model,
Research question 2: what are the tools used in optimizing tree-based model,
Research question 3: what are the techniques used in optimizing tree-based model,
Research question 4: what are the gap tree-based models,
Research question 1: what are the methods used in optimizing tree-based

model
In order to answer this research question, we will need to provide some
background on tree-based models. We are given the task of predicting a dependent
variable Y using the independent variables X 1 … . X n; for convenience, we use X to
denote the vector of independent variables. We let X i denote the domain of
n
independent variable i and let X = ∏ X i denote the domain of X. An independent
i=1
variable i may be a numeric variable or a categorical variable.
A decision tree is a model for predicting the dependent variable Y using the
independent variable X by checking a collection of splits. A split is a condition or
query on a single independent variable that is either true or false. More precisely,
for a numeric variable i, a split is a query of the form Is
X i ≤ a?
for some a ∈ R. For a categorical variable i, a split is a query of the form
Is X i ∈ A?
where A ⊆ X i is a set of levels of the categorical variable. The splits are arranged
in the form of a tree, with each split node having two child nodes. The left child
corresponds to the split condition being true, while the right child corresponds to
the condition being false. To make a prediction for an observation with the
independent variable X, we start at the root of the tree and check whether X
satisfies the split condition; if it is true, we move to the left child, and if it is false,
we move to the right child. At the new node, we check the split again, and move
again to the corresponding node. This process continues until we reach a leaf of the
tree. The prediction that we make is the value corresponding to the leaf we have
reached. In this paper, we will focus on predictive models that are ensembles or
collections of decision trees. We assume that there are T trees, where each tree is
indexed from 1 to T. Each tree t has a weight λ t, and its prediction is denoted by the
function ft; for the independent variable X, the prediction of tree t is f t (X). For an
observation with independent variable X, the prediction of the ensemble of trees is
given by
T
∑ λt f t ( X)
t =1
The optimization problem that we would like to solve is to find the value of the
independent variable X that maximizes the ensemble prediction:
(1)
T
maximize
X∈X ∑ λ t f t (X )
t =1
We shall make two key assumptions about the tree-based model ∑ λt f t ( X)and our
t =1
tree-based model optimization problem (1):
1. First, we assume that we are only making a single, one-time decision and
that the tree-based model is fixed. Extending our approach to the multistage
setting is an interesting direction for future research.
2. Second, we assume that the tree-based model is an accurate representation
of the outcome when we make the decision X. In practice, some care must
be taken here because depending on how the tree-based model is estimated
T
and the nature of the data, the prediction ∑ λt f t ( X) may not necessarily be
t =1
an accurate estimate of the causal effect of setting the independent variables

to X. This issue has been the focus of some recent work in prescriptive
analytics (see Bertsimas and Kallus 2016, Kallus 2016). Our goal in this
paper is to address only the question of optimization – how to e fficiently
T
and scalably optimize a tree-based model function ∑ λt f t ( X) – which is

t =1
independent of such statistical questions. As such, we will assume that the
tree-based model we are given at the outset is beyond suspicion.
Problem (1) is very general, and one question we may have is whether it is
theoretically tractable or not. Our first theoretical result answers this question in the
negative.
Proposition 1. The tree-based model optimization problem (1) is NP-Hard.
Optimization model
We now present an MIO formulation of (1). Before we present the model, we will
require some additional notation. We let N denote the set of numeric variables and
C denote the set of categorical variables; we have that N ∪ C = {1,.. .,n}.
For each numeric variable i ∈ N, let Ai denote the set of unique split points,
that is, the set of values a such that X i ≤ a is a split condition in some tree in the
T
Model { f t}t=11. Let K i = | Ai | be the number of unique split points. Let a i , j denote
the jth smallest split point of variable i, so that a i ,1 < a i ,2< · · · < a i ,k .
i
For each categorical variable i ∈ C, recall that X i is the set of possible values
of i. For convenience, we use K i in this case to denote the size of X i (i.e., K i = | X i |)
and use the values 1,2,.. ., K i to denote the possible levels of variable i.
Let leaves(t) be the set of leaves or terminal nodes of tree t. Let splits(t)
denote the set of splits of tree t (non-terminal nodes). Recall that the left branch of
the split corresponds to “yes” or “true” to the split query, and the right branch
corresponds to “no” or “false”. Therefore, for each split s in St , we let left(s) be the
set of leaves that are accessible from the left branch (all of the leaves for which the
condition of split s must be true), and right(s) be the set of leaves that are
accessible from the right branch (all of the leaves for which the condition of split s
must be false). For each split s, we let V(s) ∈ {1,.. .,n} denote the variable that
participates in split s, and let C(s) denote the set of values of variable i that
participate in the split query of s. Specifically, if V(s) is numeric, then C(s) = {j}
for some j ∈ {1,.. ., K V (s) }, which corresponds to the split query X i ≤ Ai , j. If V(s) is
categorical, then C(s) ⊆ {1,.. ., K V (s)}, which corresponds to the query X i ∈ C(s).
For each leaf ℓ ∈ leaves(t), we use pt ,l to denote the prediction that tree t makes
when an observation reaches leaf ℓ.
We now define the decision variables of the problem. There are two sets of
decision variables. The first set is used to specify the independent variable value X.
For each categorical independent variable i ∈ C and each category/level j ∈ X i , we
let x i , j be 1 if independent variable i is set to level j, and 0 otherwise. For each
numeric independent variable i ∈ N and each j ∈ {1,. .., K i}, we let x i , j be 1 if
independent variable i is set to a value less than or equal to the jth split point, and 0
otherwise. Mathematically,
x i , j= I{ X i = j}, ∀i ∈ C, j ∈ {1,.. ., K i },
x i , j = I{ X i ≤ a i , j }, ∀i ∈ N , j ∈ {1,.. ., K i}.
We use x to denote the vector of x i , j values. The second set of decision variables is
used to specify the prediction of each tree t. For each tree t and each leaf ℓ ∈
leaves(t), we let y t ,l be a binary decision variable that is 1 if the observation
encoded by x belongs to/falls into leaf ℓ of tree t, and 0 otherwise.
Research question 2: what are the tools used in optimizing tree-based model,
We specifically focus on random forest models. Unless otherwise stated, all

random forests are estimated in R using the random Forest package (Liaw and
Wiener 2002), using the default parameters. All linear and mixed-integer
optimization models are formulated in the Julia programming language (Bezanson
et al. 2012), using the JuMP package (Julia for Mathematical Programming; see
Lubin and Dunning 2015), and solved using Gurobi 6.5 (Gurobi Optimization, Inc.
2015). All experiments were executed on a late 2013 Apple Macbook Pro Retina
laptop, with a quad-core 2.6GHz Intel i7 processor and 16GB of memory.
Data set Source Num Vars Num.Obs. Description

Winequalityre Cortez et al. 11 1599 Predict quality
of (red) wine
d (2020)
Concrete Yeh (2019) 8 1030 Predict
strength of
concrete
Permeability Kansy et al. 1069 165 Predict
(2020) permeability
of compound
Solubility Tetko et al. 228 951 Predict
(2021), solubility of
Huuskonen compound
(2020)
Table 1 Summary of real data sets used in numerical experiments. Note: * =
accessed via UCI Machine Learning Repository (Lichman 2019); ** = accessed
via Applied Predictive Modeling package in R (Kuhn and Johnson 2020).
Research question 3: what are the techniques used in optimizing tree-based

model,
As part of our first experiment, we consider solving the unconstrained tree

ensemble problem for each data set. For each data set, we consider optimizing the
default random forest model estimated in R which uses 500 trees (the parameter n
tree in random Forest is set to 500). For each data set, we also consider solving the
tree ensemble problem using only the first T trees of the complete forest, where T
ranges in {10,50,100,200}. For each data set and each value of T, we solve the
MIO formulation (2), as well as its LO relaxation.
We compare our MIO formulation against two other approaches:
1. Local search: We solve the tree ensemble problem (1) using a local search
heuristic. At a high level, it starts from a randomly chosen initial solution and
iteratively improves the solution, one independent variable at a time, until a local
optimum is reached. The heuristic is repeated from ten starting points, out of which
we only retain the best (highest objective value) solution. We test such an
approach to establish the value of our MIO-based approach, which obtains a

globally optimal solution, as opposed to a locally optimal solution.
2. Standard linearization MIO: We solve the standard linearization MIO and its
relaxation, in order to obtain a relative sense of the strength of formulation. Due to
this formulation being much harder to solve, we impose a 30-minute time limit on
the solution time of the integer formulation.
We consider several metrics:

n
• N Levels: the number of levels (i.e., dimension of x), defined as N Levels =∑ K i

i=1
• N Levels: the number of leaves (i.e., dimension of y), defined as N Levels =

T
∑ ¿leaves (t)∨¿ ¿
t =1
• T MIO : the time (in seconds) to solve our MIO.
• T TStdLin, MIO the time (in seconds) to solve the standard linearization MIO
• T LS : the time (in seconds) to run the local search procedure (value reported is the
total for ten starting points).
• G LS: the gap of the local search solution; if Z LS is the objective value of the local
search solution and Z¿ is the optimal objective value of, then
G LS = 100% × ( Z¿ − Z LS)/ Z¿ .
• G LO: the gap of the LO relaxation of; if Z LO is the objective value of the LO
relaxation and Z¿ is the optimal integer objective as before, then
G LO = 100% × ( Z LO− Z¿ )/ Z¿ .
• GStdLin ,LO : the gap of the LO relaxation of the standard linearization MIOif Z StdLin , LO
is the optimal value of the relaxation, then
GStdLin ,LO = 100% × ( Z StdLin , LO − Z¿ )/ Z¿ .
• GStdLin ,MIO : the optimality gap of the standard linearization MIO if Z StdLin ,UB and
Z StdLin , LB are the best upper and lower bounds, respectively, of problem upon
termination, then
GStdLin ,MIO = 100% × ( Z StdLin ,UB − Z StdLin, LB)/ Z StdLin ,UB .
Data set T N Levels N Levels T MIO T TStdLin, MIO T LS
solubility 10 1253 3157 0.1 215.2 0.2

50 2844 15933 0.8 1800.3 1.8
100 4129 31720 1.7 1801.8 8.8
200 6016 63704 4.5 1877.8 33.7
500 9646 159639 177.9 1800.3 147.0
permeability 10 2138 604 0.0 122.6 1.0
50 2138 3056 0.1 1800.0 1.9
100 2138 6108 0.2 1800.3 3.1
200 2138 12214 0.5 1800.0 6.1
500 2138 30443 2.7 1800.0 19.1
Winequalityre 10 1370 3246 1.8 1800.1 0.0
d 50 2490 16296 18.5 1800.1 0.6
100 3000 32659 51.6 1800.1 2.5
200 3495 65199 216.0 1800.2 11.4
500 3981 162936 1159.7 1971.8 34.6
concrete 10 1924 2843 0.2 1800.8 0.1
50 5614 14547 22.7 1800.1 1.3
100 7851 29120 67.8 1800.1 4.3
200 10459 58242 183.8 1800.2 20.2
500 13988 145262 846.9 1809.4 81.6
Table 2 Solution times for tree ensemble optimization experiment.
Research question 4: what are the gaps in tree-based model,
Table 2 shows solution times and problem size metrics, while Table 3 shows the
gap metrics. From these two tables, we can draw several conclusions. First, the
time required to solve is very reasonable; in the most extreme case
(winequalityred, T = 500), can be solved to full optimality in about 20 minutes.
(Note that no time limit was imposed on all values of TMIO correspond to the time
required to solve to full optimality.) In contrast, the standard linearization problem
was only solved to full optimality in two out of twenty cases within the 30-minute
time limit. In addition, for those instances where the solver reached the time limit,
the optimality gap of the final integer solution, GStdLin ,MIO , was quite poor, ranging
from 50 to over 100%.
Second, the integrality gap G LO is quite small – on the order of a few percent in
most cases. This suggests that the LO relaxation of is quite tight. In contrast, the
LO relaxation bound from is weaker than that of, as predicted by Proposition 2,
and strikingly so. The weakness of the relaxation explains why the corresponding
integer problem cannot be solved to a low optimality gap within the 30-minute
time limit. These results, together with the results above on the MIO solution times
and the final optimality gaps of, show the advantages of our formulation over the
standard linearization formulation.
Third, although there are many cases where the local search solution
performs quite well, there are many where it can be quite suboptimal, even when
repeated with ten starting points. Moreover, while the local search time T LS is
generally smaller than the MIO time T MIO , in some cases it is not substantially
lower (for example, solubility for T = 500), and the additional time required by the
MIO formulation may therefore be justified for the guarantee of provable
optimality.
Data set T G LO GStdLin ,LO GStdLin ,MIO G LS
Solubility 10 0.0 485.5 0.0 18.6

50 0.0 498.0 50.1 9.5
100 0.0 481.2 70.5 0.3
200 0.0 477.5 77.7 0.2
500 0.0 501.3 103.2 0.2
Permeability 10 0.0 589.5 0.0 6.1
50 0.0 619.4 71.9 3.5
100 0.0 614.1 75.0 1.8
200 0.0 613.0 80.0 0.1
500 0.0 610.4 85.9 0.0
winequalityred 10 1.5 11581.3 89.8 1.2
50 3.4 11873.6 98.3 2.3
100 4.3 12014.9 98.8 0.6
200 4.3 12000.6 99.0 1.2
500 4.5 12031.8 99.2 1.4
Concrete 10 0.0 6210.6 72.5 0.0
50 1.8 6657.1 95.0 0.0
100 2.6 6706.6 98.3 0.0
200 1.6 6622.2 98.5 0.0
500 2.2 6652.6 98.8 0.0
Table 3 Gaps for tree ensemble optimization experiment.
Discussion
Finally, we note that there is a growing literature on the use of mixed-integer

optimization for the purpose of estimating Tree-based model and other forms of
statistical models. For example, Bertsimas and Dunn (2020) consider an exact MIO
approach to constructing CART Tree-based, while Bertsimas and King (2021)
consider an MIO approach to model selection in linear regression. While the
present paper is related to this previous work in that it also leverages the
technology of MIO, the goal of the present paper is different. The above papers
focus on the estimation of Tree-based and other statistical models, whereas our
paper is focused on optimization, namely, how to determine the optimal decision
with respect to a given, fixed State of the act Tree-based model.
Conclusion
In this paper, we developed a modern optimization approach to the problem of

finding the decision that optimizes the prediction of a tree ensemble model. At the
heart of our approach is a mixed-integer optimization formulation that models the
action of each tree in the ensemble. We showed that this formulation is better than
a general alternate formulation, that one can construct an hierarchy of
approximations to the formulation with bounded approximation quality through
depth- based truncation and that one can exploit the structure of the formulation to
derive efficient solution methods, based on Benders decomposition and split
constraint generation. We demonstrated the utility of our approach using real data
sets, including two case studies in drug design and customized pricing. Given the
prevalence of tree ensemble models, we believe that this methodology will become
an important asset in the modern business analytics toolbox and is an exciting
starting point for future research at the intersection of optimization and machine
learning.
REFERENCES
1. K.C Lee, 2020. The evolution of tree-based classification models.

2. Privacy Preserving Vertical Federated Learning for Tree-based Models
by Yuncheng Wu, Shaofeng Cai, Xiaokui Xiao, Gang Chen, Beng Chin Ooi,
2020.
3. A Tree-based Machine Learning Model for Go-around Detection and
Prediction by Imen Dhief, Sameer Alam, Chan Chea Mean and Nimrod
Lilith, 2021.
4. forgeNet: a graph deep neural network model using tree-based
ensemble classifiers for feature graph construction by Yunchuan Kong
and Tianwei Yu
5. FIST: A Feature-Importance Sampling and Tree-Based Method for
Automatic Design Flow Parameter Tuning by Zhiyao Xie, Guan-Qi Fang,
Yu-Hung Huang, Haoxing Ren, Yanqing Zhang, Brucek Khailany, Shao-
Yun Fang, Jiang Hu, Yiran Chen, Erick Carvajal Barboza.
6. Functionalization of Remote Sensing and On-site Data for Simulating
Surface Water Dissolved Oxygen: Development of Hybrid Tree-Based
Artificial Intelligence Models by Tiyasha Tiyasha, Tran Minh Tung, Suraj
Kumar Bhagat, Mou Leong Tan, Ali H. Jawad, Wan Hanna Melini Wan
Mohtar, Zaher Mundher Yaseen, 2021.
7. Tree-based nonlinear ensemble technique to predict energy dissipation
in stepped spillways by Ömer Ekmekcioğlu , Eyyup Ensar Başakın &
Mehmet Özger, 2020.
8. An Evaluation of Preprocessing Steps and Tree-based Ensemble
Machine Learning for Analysing Sentiment on Indonesian YouTube
Comments by A. S. Aribowo, H. Basiron, N. S. Herman, S. Khomsah.
9. Novel Ensemble Tree Solution for Rockburst Prediction Using Deep
Forest by Diyuan Li, Zida Liu, Danial Jahed Armaghani, Peng Xiao and
Jian Zhou, 2022.
10.Long-Term Wind Power Forecasting Using Tree-Based Learning
Algorithms by AMIRHOSSEIN AHMADI, MOJTABA NABIPOUR,
BEHNAM MOHAMMADI-IVATLOO, ALI MORADI AMANI,
SEUNGMIN RHO AND MD. JALIL PIRAN, 2020.
11.The generalized minimum spanning tree problem: An overview of
formulations, solution procedures and latest advances by Petrică C.Pop
12.New M5P model tree-based control for doubly fed induction generator
in wind energy conversion system by Mounira Ali, Abdelaziz Talha, El
madjid Berkouk, 2020.
13.A Continuous Cuffless Blood Pressure Estimation Using Tree-Based
Pipeline Optimization Tool by Suliman Mohamed Fati, Amgad Muneer,
Nur Arifin Akbar and Shakirah Mohd Taib, 2021.
14.TreeCaps: Tree-Based Capsule Networks for Source Code Processing
by Nghi D. Q. Bui, Yijun Yu, Lingxiao Jiang, 2021.
15.Robust Counterfactual Explanations for Tree-Based Ensembles by
Sanghamitra Dutta, Jason Long, Saumitra Mishra, Cecilia Tilli, Daniele
Magazzeni, 2022.
16.RDTIDS: Rules and Decision Tree-Based Intrusion Detection System
for Internet-of-Things Networks by Mohamed Amine Ferrag, Leandros
Maglaras, Ahmed Ahmim, Makhlouf Derdour and Helge Janicke, 2020.
17.Ranking top-k trees in tree-based phylogenetic networks by Momoko
Hayamizu and Kazuhisa Makino, 2020.
18.A gradient boosted decision tree-based sentiment classification of
twitter data by S. Neelakandan and D. Paulraj, 2020.
19.Artificial Flora Algorithm-Based Feature Selection with Gradient
Boosted Tree Model for Diabetes Classification by Nagaraj P,
Deepalakshmi P, Romany F Mansour, Ahmed Almazroa, 2021.
20.Performance Evaluation of Deep Learning-Based Gated Recurrent
Units (GRUs) and Tree-Based Models for Estimating ETo by Using
Limited Meteorological Variables by Mohammad Taghi Sattari, Halit
Apaydin and Shahaboddin Shamshirband, 2020.
21.A novel tree-based dynamic heterogeneous ensemble method for credit
scoring by Yufei Xia, Mengyi Niu, 2020.
22.Decision tree-based user-centric security solution for critical IoT
infrastructure by Deepak Puthal, Mukesh Prasad, 2022.
23.Piracema.io: A rules-based tree model for phishing prediction by Carlo
Marcelo Revoredo da Silva, Vinicius Cardoso Garcia, 2022.
24.Slope Stability Classification under Seismic Conditions Using Several
Tree-Based Intelligent Techniques by Panagiotis G. Asteris, Fariz
Iskandar Mohd Rizal, Mohammadreza Koopialipoor, Panayiotis C. Roussis,
Maria Ferentinou, Danial Jahed Armaghani and Behrouz Gordan, 2022.

Tree-Based Model

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Tree-Based Model

Uploaded by

Copyright:

Available Formats

INTRODUCTION

Merriam-Webster dictionary posit a model to entail, a system of postulates, data,

Tree-based model is what we would be looking into in this work and it is

Background and related work

Decision tree learning is a supervised learning approach used in statistics, data

Key issues in Tree-based model solution

Overfitting the Data:

Tree-based model occupy a central place in machine learning because they

The study tries to answer the following research questions:

Research question 4: what are the gap tree-based models,

Research question 1: what are the methods used in optimizing tree-based

variable i may be a numeric variable or a categorical variable.

for some a ∈ R. For a categorical variable i, a split is a query of the form

tree-based model optimization problem (1):

an accurate estimate of the causal eﬀect of setting the independent variables

and scalably optimize a tree-based model function ∑ λt f t ( X) – which is

Proposition 1. The tree-based model optimization problem (1) is NP-Hard.

We speciﬁcally focus on random forest models. Unless otherwise stated, all

Data set Source Num Vars Num.Obs. Description

Research question 3: what are the techniques used in optimizing tree-based

As part of our ﬁrst experiment, we consider solving the unconstrained tree

We compare our MIO formulation against two other approaches:

approach to establish the value of our MIO-based approach, which obtains a

We consider several metrics:

• N Levels: the number of levels (i.e., dimension of x), deﬁned as N Levels =∑ K i

• N Levels: the number of leaves (i.e., dimension of y), deﬁned as N Levels =

• T MIO : the time (in seconds) to solve our MIO.

GStdLin ,LO = 100% × ( Z StdLin , LO − Z¿ )/ Z¿ .

GStdLin ,MIO = 100% × ( Z StdLin ,UB − Z StdLin, LB)/ Z StdLin ,UB .

Data set T N Levels N Levels T MIO T TStdLin, MIO T LS

solubility 10 1253 3157 0.1 215.2 0.2

Research question 4: what are the gaps in tree-based model,

Data set T G LO GStdLin ,LO GStdLin ,MIO G LS

Solubility 10 0.0 485.5 0.0 18.6

Finally, we note that there is a growing literature on the use of mixed-integer

In this paper, we developed a modern optimization approach to the problem of

1. K.C Lee, 2020. The evolution of tree-based classification models.

You might also like