没有合适的资源?快使用搜索试试~ 我知道了~
首页lasso程序,从matlab提取
资源详情
资源评论
资源推荐

function [B,stats] = lasso(X,Y,varargin)
%LASSO Perform lasso or elastic net regularization for linear regression.
% [B,STATS] = lasso(X,Y,...) Performs L1-constrained linear least
% squares fits (lasso) or L1- and L2-constrained fits (elastic net)
% relating the predictors in X to the responses in Y. The default is a
% lasso fit, or constraint on the L1-norm of the coefficients B.
%
% Positional parameters:
%
% X A numeric matrix (dimension, say, NxP)
% Y A numeric vector of length N
%
% Optional input parameters:
%
% 'Weights' Observation weights 观察值. Must be a vector of non-negative
% values, of the same length as columns of X. At least
% two values must be positive. (default ones(N,1) or
% equivalently (1/N)*ones(N,1)).
% 'Alpha' Elastic net mixing value, or the relative balance
% between L2 and L1 penalty (default 1, range (0,1]).
% Alpha=1 ==> lasso, otherwise elastic net.
% Alpha near zero ==> nearly ridge regression.
% 'NumLambda' The number of lambda values to use, if the parameter
% 'Lambda' is not supplied (default 100). Ignored
% if 'Lambda' is supplied. LASSO may return fewer
% fits than specified by 'NumLambda' if the residual
% error of the fits drops below a threshold percentage
% of the variance of Y.
% 'LambdaRatio' Ratio between the minimum value and maximum value of
% lambda to generate, if the parameter "Lambda" is not
% supplied. Legal range is [0,1). Default is 0.0001.
% If 'LambdaRatio' is zero, LASSO will generate its
% default sequence of lambda values but replace the
% smallest value in this sequence with the value zero.
% 'LambdaRatio' is ignored if 'Lambda' is supplied.
% 'Lambda' Lambda values. Will be returned in return argument
% STATS in ascending order. The default is to have LASSO
% generate a sequence of lambda values, based on 'NumLambda'
% and 'LambdaRatio'. LASSO will generate a sequence, based
% on the values in X and Y, such that the largest LAMBDA
% value is just sufficient to produce all zero coefficients B.
% You may supply a vector of real, non-negative values of
% lambda for LASSO to use, in place of its default sequence.
% If you supply a value for 'Lambda', 'NumLambda' and
% 'LambdaRatio' are ignored.
% 'DFmax' Maximum number of non-zero coefficients in the model.
% Can be useful with large numbers of predictors.
% Results only for lambda values that satisfy this
% degree of sparseness will be returned. Default is
% to not limit the number of non-zero coefficients.
% 'Standardize' Whether to scale X prior to fitting the model 是否缩放

% sequence. This affects whether the regularization is
% applied to the coefficients on the standardized
% scale or the original scale. The results are always
% presented on the original data scale. Default is
% TRUE, do scale X.
% Note: X and Y are always centered.
% 'RelTol' Convergence threshold for coordinate descent algorithm.
% The coordinate descent iterations will terminate
% when the relative change in the size of the
% estimated coefficients B drops below this threshold.
% Default: 1e-4. Legal range is (0,1).
% 'CV' If present, indicates the method used to compute MSE.
% When 'CV' is a positive integer K, LASSO uses K-fold
% cross-validation. Set 'CV' to a cross-validation
% partition, created using CVPARTITION, to use other
% forms of cross-validation. You cannot use a
% 'Leaveout' partition with LASSO.
% When 'CV' is 'resubstitution', LASSO uses X and Y
% both to fit the model and to estimate the mean
% squared errors, without cross-validation.
% The default is 'resubstitution'.
% 'MCReps' A positive integer indicating the number of Monte-Carlo
% repetitions for cross-validation. The default value is 1.
% If 'CV' is 'resubstitution' or a cvpartition of type
% 'resubstitution', 'MCReps' must be 1. If 'CV' is a
% cvpartition of type 'holdout', then 'MCReps' must be
% greater than one.
% 'MaxIter' Maximum number of iterations allowed. Default is 1e5.
% 'PredictorNames' A cell array of names for the predictor variables,
% in the order in which they appear in X.
% Default: {}
% 'Options' A structure that contains options specifying whether to
% conduct cross-validation evaluations in parallel, and
% options specifying how to use random numbers when computing
% cross validation partitions. This argument can be created
% by a call to STATSET. CROSSVAL uses the following fields:
% 'UseParallel'
% 'UseSubstreams'
% 'Streams'
% For information on these fields see PARALLELSTATS.
% NOTE: If supplied, 'Streams' must be of length one.
%
% Return values:
% B The fitted coefficients for each model.
% B will have dimension PxL, where
% P = size(X,2) is the number of predictors, and
% L = length(lambda).
% STATS STATS is a struct that contains information about the
% sequence of model fits corresponding to the columns
% of B. STATS contains the following fields:
%
% 'Intercept' The intercept term for each model. Dimension 1xL.
% 'Lambda' The sequence of lambda penalties used, in ascending order.
% Dimension 1xL.
% 'Alpha' The elastic net mixing value that was used.
% 'DF' The number of nonzero coefficients in B for each

% value of lambda. Dimension 1xL.
% 'MSE' The mean squared error of the fitted model for each
% value of lambda. If cross-validation was performed,
% the values for 'MSE' represent Mean Prediction
% Squared Error for each value of lambda, as calculated
% by cross-validation. Otherwise, 'MSE' is the mean
% sum of squared residuals obtained from the model
% with B and STATS.Intercept.
%
% If cross-validation was performed, STATS also includes the following
% fields:
%
% 'SE' The standard error of MSE for each lambda, as
% calculated during cross-validation. Dimension 1xL.
% 'LambdaMinMSE' The lambda value with minimum MSE. Scalar.
% 'Lambda1SE' The largest lambda such that MSE is within
% one standard error of the minimum. Scalar.
% 'IndexMinMSE' The index of Lambda with value LambdaMinMSE.
% 'Index1SE' The index of Lambda with value Lambda1SE.
%
% Examples:
%
% % (1) Run the lasso on data obtained from the 1985 Auto Imports Database
% % of the UCI repository.
% % http://archive.ics.uci.edu/ml/machine-learning-databases/autos/imports-85.names
% load imports-85;
% Description
%
% % Extract Price as the response variable and extract non-categorical
% % variables related to auto construction and performance
% %
% X = X(~any(isnan(X(:,1:16)),2),:);
% Y = X(:,16);
% Y = log(Y);
% X = X(:,3:15);
% predictorNames = {'wheel-base' 'length' 'width' 'height' ...
% 'curb-weight' 'engine-size' 'bore' 'stroke' 'compression-ratio' ...
% 'horsepower' 'peak-rpm' 'city-mpg' 'highway-mpg'};
%
% % Compute the default sequence of lasso fits.
% [B,S] = lasso(X,Y,'CV',10,'PredictorNames',predictorNames);
%
% % Display a trace plot of the lasso fits.
% axTrace = lassoPlot(B,S);
% % Display the sequence of cross-validated predictive MSEs.
% axCV = lassoPlot(B,S,'PlotType','CV');
% % Look at the kind of fit information returned by lasso.
% S
%
% % What variables are in the model corresponding to minimum
% % cross-validated MSE, and in the sparsest model within one
% % standard error of that minimum.
% minMSEModel = S.PredictorNames(B(:,S.IndexMinMSE)~=0)
% sparseModel = S.PredictorNames(B(:,S.Index1SE)~=0)
%
% % Fit the sparse model and examine residuals.

% Xplus = [ones(size(X,1),1) X];
% fitSparse = Xplus * [S.Intercept(S.Index1SE); B(:,S.Index1SE)];
% corr(fitSparse,Y-fitSparse)
% figure
% plot(fitSparse,Y-fitSparse,'o')
%
% % Consider a slightly richer model. A model with 6 variables may be a
% % reasonable alternative. Find the index for a corresponding fit.
% df6index = min(find(S.DF==6));
% fitDF6 = Xplus * [S.Intercept(df6index); B(:,df6index)];
% corr(fitDF6,Y-fitDF6)
% plot(fitDF6,Y-fitDF6,'o')
%
% % (2) Run lasso on some random data with 250 predictors
% %
% n = 1000; p = 250;
% X = randn(n,p);
% beta = randn(p,1); beta0 = randn;
% Y = beta0 + X*beta + randn(n,1);
% lambda = 0:.01:.5;
% [B,S] = lasso(X,Y,'Lambda',lambda);
% lassoPlot(B,S);
%
% % compare against OLS
% %
% figure
% bls = [ones(size(X,1),1) X] \ Y;
% plot(bls,[S.Intercept; B],'.');
%
% % Run the same lasso fit but restricting the number of
% % non-zero coefficients in the fitted model.
% %
% [B2,S2] = lasso(X,Y,'Lambda',lambda,'DFmax',12);
%
% See also lassoPlot, ridge, parallelstats.
% References:
% [1] Tibshirani, R. (1996) Regression shrinkage and selection
% via the lasso. Journal of the Royal Statistical Society,
% Series B, Vol 58, No. 1, pp. 267-288.
% [2] Zou, H. and T. Hastie. (2005) Regularization and variable
% selection via the elastic net. Journal of the Royal Statistical
% Society, Series B, Vol. 67, No. 2, pp. 301-320.
% [3] Friedman, J., R. Tibshirani, and T. Hastie. (2010) Regularization
% paths for generalized linear models via coordinate descent.
% Journal of Statistical Software, Vol 33, No. 1,
% http://www.jstatsoft.org/v33/i01.
% [4] Hastie, T., R. Tibshirani, and J. Friedman. (2008) The Elements
% of Statistical Learning, 2nd edition, Springer, New York.
% Copyright 2011-2016 The MathWorks, Inc.
% --------------------------------------
% Sanity check the positional parameters
% --------------------------------------
剩余17页未读,继续阅读


















安全验证
文档复制为VIP权益,开通VIP直接复制

评论3