Decision Trees Compared to Regression and Neural Networks
Methods for analyzing and modeling data can be divided into two groups: “supervised learning” and “unsupervised learning.” Supervised learning requires input data that has both predictor (independent) variables and a target (dependent) variable whose value is to be estimated. By various means, the process “learns” how to model (predict) the value of the target variable based on the predictor variables. Decision trees, regression analysis and neural networks are examples of supervised learning. If the goal of an analysis is to predict the value of some variable, then supervised learning is recommended approach.
Unsupervised learning does not identify a target (dependent) variable, but rather treats all of the variables equally. In this case, the goal is not to predict the value of a variable but rather to look for patterns, groupings or other ways to characterize the data that may lead to understanding of the way the data interrelates. Cluster analysis, correlation, factor analysis (principle components analysis) and statistical measures are examples of unsupervised learning.
Linear Regression
One of the simplest and most popular modeling methods is linear regression. Linear regression fits a straight line (known linear function) to a set of data values. The form of the function fitted by linear regression is:
y = a0 + a1*x1 + a2*x2 + …
Where a0, a1, etc. are parameters whose values are determined so the function best fits the data. Linear regression is a popular modeling technique, and there are many programs available to perform linear regression. However, linear regression is appropriate only if the data can be modeled by a straight line function, which is often not the case. Also, linear regression cannot easily handle categorical variables nor is it easy to look for interactions between variables.
Nonlinear Regression
Nonlinear regression extends linear regression to fit general (nonlinear) functions of the form:
y = f(x1,x2,…,a1,a2,…)
Here are few examples of functions that can be modeled using nonlinear regression:
y = a0 + a1*exp(x1)
y = a0 + a1*sin(x1)
As with linear regression, nonlinear regression is not well suited for categorical variables or variables with interactions. The other challenge involved in using nonlinear regression analysis is that the form (model) of the function must be specified. For engineering and scientific problems, the function model may be dictated by theory, but for marketing, behavioral and medical problems, it can be very difficult to develop an appropriate nonlinear model. The program recommended for linear or nonlinear regression analysis is NLREG
Logistic Regression
Logistic regression is a variant of nonlinear regression that is appropriate when the target (dependent) variable has only two possible values (e.g., live/die, buy/don’t-buy, infected/not-infected). Logistic regression fits an S-shaped logistic function to the data. As with general nonlinear regression, logistic regression cannot easily handle categorical variables nor is it good for detecting interactions between variables. Classification trees are well suited to modeling target variables with binary values, but – unlike logistic regression – they also can model variables with more than two discrete values, and they handle variable interactions.
Neural Networks
Neural networks (also called “multilayered perceptron”) provide models of data relationships through highly interconnected, simulated “neurons” that accept inputs, apply weighting coefficients and feed their output to other “neurons” which continue the process through the network to the eventual output. Some neurons may send feedback to earlier neurons in the network. Neural networks are “trained” to deliver the desired result by an iterative (and often lengthy) process where the weights applied to each input at each neuron are adjusted to optimize the desired output.
Neural networks are often compared to decision trees because both methods can model data that has nonlinear relationships between variables, and both can handle interactions between variables. However, neural networks have a number of drawbacks compared to decision trees.
Binary categorical input data for neural networks can be handled by using 0/1 (off/on) inputs, but categorical variables with multiple classes (for example, marital status or the state in which a person resides) are awkward to handle. Classifying a result into multiple categories usually is done by setting arbitrary value thresholds for discriminating one category from another. It would be difficult to devise a neural network to classify the location of residence into the 50 U.S. states. Classification trees, on the other hand, handle this type of problem naturally.
Neural networks do not present an easily-understandable model. When looking at a decision tree, it is easy to see that some initial variable divides the data into two categories and then other variables split the resulting child groups. This information is very useful to the researcher who is trying to understand the underlying nature of the data being analyzed.
A neural network is more of a “black box” that delivers results without an explanation of how the results were derived. Thus, it is difficult or impossible to explain how decisions were made based on the output of the network.
If a challenge is made to a decision based on a neural network, it is very difficult to explain and justify to non-technical people how decisions were made. In contrast, a decision tree is easily explained, and the process by which a particular decision “flows” through the decision tree can be readily shown.
It is difficult to incorporate a neural network model into a computer system without using a dedicated “interpreter” for the model. So if the goal is to produce a program that can be distributed with a built-in predictive model, it is usually necessary to send along some additional module or library just for the neural network interpretation. In contrast, once a decision tree model has been built, it can be converted to if…then…else statements that can be implemented easily in most computer languages without requiring a separate interpreter.