fda {mda}R Documentation

Flexible Discriminant Analysis

Usage

fda(formula, data, weights, theta, dimension, eps, method, ...)

Arguments

formula of the form y~x it describes the response and the predictors. The formula can be more complicated, such as y~log(x)+z etc (type ?formula for more details). The response should be a factor or category representing the response variable, or any vector that can be coerced to such (such as a logical variable).
data data frame containing the variables in the formula (optional).
weights an optional vector of observation weights.
theta an optional matrix of class scores, typically with less than J-1 columns.
dimension The dimension of the solution, no greater than J-1, where J is the number classes. Default is J-1.
eps a threshold for small singular values for excluding discriminant variables; default is .Machine$double.eps.
method regression method used in optimal scaling. Default is linear regression via the function polyreg, resulting in linear discriminant analysis. Other possibilities are mars and bruto. For Penalized Discriminant analysis gen.ridge is appropriate.
keep.fitted a logical variable, which determines whether the (sometimes large) component "fitted.values" of the "fit" component of the returned fda object should be kept. The default is TRUE if n * dimension < 1000
... additional arguments to method().

Value

an object of class "fda". Use predict to extract discriminant variables, posterior probabilities or predicted class memberships. Other extractor functions are coef, confusion and plot.

The object has the following components:
percent.explained the percent between-group variance explained by each dimension (relative to the total explained.)
values optimal scaling regresssion sum-of-squares for each dimension (see reference). The usual discriminant analysis eigenvalues are given by values/(1-values), which are used to define percent.explained
means class means in the discriminant space. These are also scaled versions of the final theta's or class scores, and can be used in a subsequent call to fda() (this only makes sense if some columns of theta are omitted—see the references)
theta.mod (internal) a class scoring matrix which allows predict to work properly.
dimension dimension of discriminant space
prior class proprotions for the training data
fit fit object returned by "method"
call the call that created this object (allowing it to be update()-able)
confusion confusion matrix when classifying the training data

The method functions are required to take arguments x and y where both can be matrices, and should produce a matrix of fitted.values the same size as y. They can take additional arguments weights and should all have a ...{} for safety sake. Any arguments to method() can be passed on via the ...{} argument of fda(). The default method polyreg() has a degree argument which allows polynomial regression of the required total degree. See the documentation for predict.fda() for further requirements of method.

Note

This software it is not well-tested, we would like to hear of any bugs.

Author(s)

Trevor Hastie and Robert Tibshirani

References

``Flexible Disriminant Analysis by Optimal Scoring'' by Hastie, Tibshirani and Buja, 1994, JASA, 1255-1270.

``Penalized Discriminant Analysis'' by Hastie, Buja and Tibshirani, Annals of Statistics, 1995 (in press).

See Also

predict.fda, mars, bruto, polyreg, softmax, confusion,

Examples

data(iris)
irisfit <- fda(Species ~ ., data = iris)
irisfit
## fda(formula = Species ~ ., data = iris)
##
## Dimension: 2 
##
## Percent Between-Group Variance Explained:
##     v1     v2 
##  99.12 100.00 
##
## Degrees of Freedom (per dimension): 5 
##
## Training Misclassification Error: 0.02 ( N = 150 )

confusion(irisfit, iris)
##            Setosa Versicolor Virginica 
##     Setosa     50          0         0
## Versicolor      0         48         1
##  Virginica      0          2        49
## attr(, "error"):
## [1] 0.02

plot(irisfit)

coef(irisfit)
##           [,1]        [,2]
## [1,] -2.126479 -6.72910343
## [2,] -0.837798  0.02434685
## [3,] -1.550052  2.18649663
## [4,]  2.223560 -0.94138258
## [5,]  2.838994  2.86801283

marsfit <- fda(Species ~ ., data = iris, method = mars)
marsfit2 <- update(marsfit, degree = 2)
marsfit3 <- update(marsfit, theta = marsfit$means[, 1:2]) 
## this refits the model, using the fitted means (scaled theta's)
## from marsfit to start the iterations


[Package Contents]