


locfitraw locfit helper function to call from matlab
Usage: [x,y,e]=locfitraw( data ) {most basic usage, all defaults}
Additional arguments are attached as name-value pairs, ie:
[x,y,e]=locfitraw( data, 'alpha',[0.7,1.5] , 'family','rate' , 'ev','grid' , 'mg',100 );
====================================================================
Argument types:
The first set of arguments ('x', 'y', 'weights', 'cens', and
'base') specify the regression variables and associated
quantities.
Another set ('scale', 'alpha', 'deg', 'kern', 'kt', 'acri' and
'basis') control the amount of smoothing: bandwidth, smoothing
weights and the local model.
'deriv' and 'dc' relate to derivative (or local slope) estimation.
'family' and 'link' specify the likelihood family.
'xlim' and 'renorm' may be used in density estimation.
'ev', 'flim', 'mg' and 'cut' control the set of evaluation points.
'maxk', 'itype', 'mint', 'maxit' and 'debug' control the Locfit
algorithms, and will be rarely used.
'geth' and 'sty' are used by other functions calling 'locfit.raw',
and should not be used directly.
=========================================================================
Arguments in detail:
x: Vector (or matrix) of the independent variable(s).
******************************
NOTE: The first argument is placed in the first function slot without a name...
All other arguments require 'name',value notation
******************************
y: Response variable for regression models. For density
families, 'y' can be omitted.
weights: Prior weights for observations (reciprocal of variance, or
sample size).
cens: Censoring indicators for hazard rate or censored regression.
The coding is '1' (or 'TRUE') for a censored observation, and
'0' (or 'FALSE') for uncensored observations.
base: Baseline parameter estimate. If provided, the local
regression model is fitted as Y_i = b_i + m(x_i) + epsilon_i,
with Locfit estimating the m(x) term. For regression models,
this effectively subtracts b_i from Y_i. The advantage of the
'base' formulation is that it extends to likelihood
regression models.
scale: A scale to apply to each variable. This is especially
important for multivariate fitting, where variables may be
measured in non-comparable units. It is also used to specify
the frequency for 'ang' terms. If 'scale=F' (the default) no
scaling is performed. If 'scale=T', marginal standard
deviations are used. Alternatively, a numeric vector can
provide scales for the individual variables.
alpha: Smoothing parameter. A single number (e.g. 'alpha=0.7') is
interpreted as a nearest neighbor fraction. With two
componentes (e.g. 'alpha=c(0.7,1.2)'), the first component is
a nearest neighbor fraction, and the second component is a
fixed component. A third component is the penalty term in
locally adaptive smoothing.
deg: Degree of local polynomial. Default: 2 (local quadratic).
Degrees 0 to 3 are supported by almost all parts of the
Locfit code. Higher degrees may work in some cases.
kern: Weight function, default = '"tcub"'. Other choices are
'"rect"', '"trwt"', '"tria"', '"epan"', '"bisq"' and
'"gauss"'. Choices may be restricted when derivatives are
required; e.g. for confidence bands and some bandwidth
selectors.
kt: Kernel type, '"sph"' (default); '"prod"'. In multivariate
problems, '"prod"' uses a simplified product model which
speeds up computations.
acri: Criterion for adaptive bandwidth selection.
basis: User-specified basis functions. See 'lfbas' for more details
on this argument.
deriv: Derivative estimation. If 'deriv=1', the returned fit will be
estimating the derivative (or more correctly, an estimate of
the local slope). If 'deriv=c(1,1)' the second order
derivative is estimated. 'deriv=2' is for the partial
derivative, with respect to the second variable, in
multivariate settings.
dc: Derivative adjustment.
family: Local likelihood family; '"gaussian"'; '"binomial"';
'"poisson"'; '"gamma"' and '"geom"'. Density and rate
estimation families are '"dens"', '"rate"' and '"hazard"'
(hazard rate). If the family is preceded by a ''q'' (for
example, 'family="qbinomial"'), quasi-likelihood variance
estimates are used. Otherwise, the residual variance ('rv')
is fixed at 1. The default family is '"qgauss"' if a response
'y' is provided; '"density"' if no response is provided.
link: Link function for local likelihood fitting. Depending on the
family, choices may be '"ident"', '"log"', '"logit"',
'"inverse"', '"sqrt"' and '"arcsin"'.
xlim: For density estimation, Locfit allows the density to be
supported on a bounded interval (or rectangle, in more than
one dimension). The format should be 'c(ll,ul)' where 'll' is
a vector of the lower bounds and 'ur' the upper bounds.
Bounds such as [0,infty) are not supported, but can be
effectively implemented by specifying a very large upper
bound.
renorm: Local likelihood density estimates may not integrate exactly
to 1. If 'renorm=T', the integral will be estimated
numerically and the estimate rescaled. Presently this is
implemented only in one dimension.
ev: Evaluation Structure, default = '"tree"'. Also available are
'"phull"', '"data"', '"grid"', '"kdtree"', '"kdcenter"' and
'"crossval"'. 'ev="none"' gives no evaluation points,
effectively producing the global parametric fit. A vector or
matrix of evaluation points can also be provided.
flim: A vector of lower and upper bounds for the evaluation
structure, specified as 'c(ll,ur)'. This should not be
confused with 'xlim'. It defaults to the data range.
mg: For the '"grid"' evaluation structure, 'mg' specifies the
number of points on each margin. Default 10. Can be either a
single number or vector.
cut: Refinement parameter for adaptive partitions. Default 0.8;
smaller values result in more refined partitions.
maxk: Controls space assignment for evaluation structures. For the
adaptive evaluation structures, it is impossible to be sure
in advance how many vertices will be generated. If you get
warnings about `Insufficient vertex space', Locfit's default
assigment can be increased by increasing 'maxk'. The default
is 'maxk=100'.
itype: Integration type for density estimation. Available methods
include '"prod"', '"mult"' and '"mlin"'; and '"haz"' for
hazard rate estimation problems. The available integration
methods depend on model specification (e.g. dimension, degree
of fit). By default, the best available method is used.
mint: Points for numerical integration rules. Default 20.
maxit: Maximum iterations for local likelihood estimation. Default
20.
debug: If > 0; prints out some debugging information.
geth: Don't use!
sty: Style for special terms ('left', 'ang' e.t.c.). Do not try to
set this directly; call 'locfit' instead.
==========================================================================
Requires windows since R-(D)COM is windows-specific
I am working on a platform-independent replacement
Requires that Matlab-R link Matlab package be installed from
http://www.mathworks.com/matlabcentral/fileexchange/loadFile.do?objectId=5051&objectType=file
file MATLAB_RLINK.zip
Requires that R be installed see http://r-project.org first
file rw1091.exe
Requires that R locfit package be installed first
From within R in menu do "Packages" then "Install from CRAN"
Requires that R-(D)COM be installed first from
http://lib.stat.cmu.edu/R/CRAN/contrib/extra/dcom/
(get latest EXE file approx 3 MB)
file RSrv135.exe
The above packages should come bundled with this software for convenience
with the exception of locfit which is easiest to install from within R
In values:

0001 function [x,y,e]=locfitraw(varargin) 0002 % locfitraw locfit helper function to call from matlab 0003 % 0004 % Usage: [x,y,e]=locfitraw( data ) {most basic usage, all defaults} 0005 % 0006 % Additional arguments are attached as name-value pairs, ie: 0007 % [x,y,e]=locfitraw( data, 'alpha',[0.7,1.5] , 'family','rate' , 'ev','grid' , 'mg',100 ); 0008 % 0009 %==================================================================== 0010 % 0011 % Argument types: 0012 % 0013 % The first set of arguments ('x', 'y', 'weights', 'cens', and 0014 % 'base') specify the regression variables and associated 0015 % quantities. 0016 % 0017 % Another set ('scale', 'alpha', 'deg', 'kern', 'kt', 'acri' and 0018 % 'basis') control the amount of smoothing: bandwidth, smoothing 0019 % weights and the local model. 0020 % 0021 % 'deriv' and 'dc' relate to derivative (or local slope) estimation. 0022 % 0023 % 'family' and 'link' specify the likelihood family. 0024 % 0025 % 'xlim' and 'renorm' may be used in density estimation. 0026 % 0027 % 'ev', 'flim', 'mg' and 'cut' control the set of evaluation points. 0028 % 0029 % 'maxk', 'itype', 'mint', 'maxit' and 'debug' control the Locfit 0030 % algorithms, and will be rarely used. 0031 % 0032 % 'geth' and 'sty' are used by other functions calling 'locfit.raw', 0033 % and should not be used directly. 0034 % 0035 %========================================================================= 0036 % 0037 % Arguments in detail: 0038 % 0039 % x: Vector (or matrix) of the independent variable(s). 0040 % ****************************** 0041 % NOTE: The first argument is placed in the first function slot without a name... 0042 % All other arguments require 'name',value notation 0043 % ****************************** 0044 % 0045 % y: Response variable for regression models. For density 0046 % families, 'y' can be omitted. 0047 % 0048 % weights: Prior weights for observations (reciprocal of variance, or 0049 % sample size). 0050 % 0051 % cens: Censoring indicators for hazard rate or censored regression. 0052 % The coding is '1' (or 'TRUE') for a censored observation, and 0053 % '0' (or 'FALSE') for uncensored observations. 0054 % 0055 % base: Baseline parameter estimate. If provided, the local 0056 % regression model is fitted as Y_i = b_i + m(x_i) + epsilon_i, 0057 % with Locfit estimating the m(x) term. For regression models, 0058 % this effectively subtracts b_i from Y_i. The advantage of the 0059 % 'base' formulation is that it extends to likelihood 0060 % regression models. 0061 % 0062 % scale: A scale to apply to each variable. This is especially 0063 % important for multivariate fitting, where variables may be 0064 % measured in non-comparable units. It is also used to specify 0065 % the frequency for 'ang' terms. If 'scale=F' (the default) no 0066 % scaling is performed. If 'scale=T', marginal standard 0067 % deviations are used. Alternatively, a numeric vector can 0068 % provide scales for the individual variables. 0069 % 0070 % alpha: Smoothing parameter. A single number (e.g. 'alpha=0.7') is 0071 % interpreted as a nearest neighbor fraction. With two 0072 % componentes (e.g. 'alpha=c(0.7,1.2)'), the first component is 0073 % a nearest neighbor fraction, and the second component is a 0074 % fixed component. A third component is the penalty term in 0075 % locally adaptive smoothing. 0076 % 0077 % deg: Degree of local polynomial. Default: 2 (local quadratic). 0078 % Degrees 0 to 3 are supported by almost all parts of the 0079 % Locfit code. Higher degrees may work in some cases. 0080 % 0081 % kern: Weight function, default = '"tcub"'. Other choices are 0082 % '"rect"', '"trwt"', '"tria"', '"epan"', '"bisq"' and 0083 % '"gauss"'. Choices may be restricted when derivatives are 0084 % required; e.g. for confidence bands and some bandwidth 0085 % selectors. 0086 % 0087 % kt: Kernel type, '"sph"' (default); '"prod"'. In multivariate 0088 % problems, '"prod"' uses a simplified product model which 0089 % speeds up computations. 0090 % 0091 % acri: Criterion for adaptive bandwidth selection. 0092 % 0093 % basis: User-specified basis functions. See 'lfbas' for more details 0094 % on this argument. 0095 % 0096 % deriv: Derivative estimation. If 'deriv=1', the returned fit will be 0097 % estimating the derivative (or more correctly, an estimate of 0098 % the local slope). If 'deriv=c(1,1)' the second order 0099 % derivative is estimated. 'deriv=2' is for the partial 0100 % derivative, with respect to the second variable, in 0101 % multivariate settings. 0102 % 0103 % dc: Derivative adjustment. 0104 % 0105 % family: Local likelihood family; '"gaussian"'; '"binomial"'; 0106 % '"poisson"'; '"gamma"' and '"geom"'. Density and rate 0107 % estimation families are '"dens"', '"rate"' and '"hazard"' 0108 % (hazard rate). If the family is preceded by a ''q'' (for 0109 % example, 'family="qbinomial"'), quasi-likelihood variance 0110 % estimates are used. Otherwise, the residual variance ('rv') 0111 % is fixed at 1. The default family is '"qgauss"' if a response 0112 % 'y' is provided; '"density"' if no response is provided. 0113 % 0114 % link: Link function for local likelihood fitting. Depending on the 0115 % family, choices may be '"ident"', '"log"', '"logit"', 0116 % '"inverse"', '"sqrt"' and '"arcsin"'. 0117 % 0118 % xlim: For density estimation, Locfit allows the density to be 0119 % supported on a bounded interval (or rectangle, in more than 0120 % one dimension). The format should be 'c(ll,ul)' where 'll' is 0121 % a vector of the lower bounds and 'ur' the upper bounds. 0122 % Bounds such as [0,infty) are not supported, but can be 0123 % effectively implemented by specifying a very large upper 0124 % bound. 0125 % 0126 % renorm: Local likelihood density estimates may not integrate exactly 0127 % to 1. If 'renorm=T', the integral will be estimated 0128 % numerically and the estimate rescaled. Presently this is 0129 % implemented only in one dimension. 0130 % 0131 % ev: Evaluation Structure, default = '"tree"'. Also available are 0132 % '"phull"', '"data"', '"grid"', '"kdtree"', '"kdcenter"' and 0133 % '"crossval"'. 'ev="none"' gives no evaluation points, 0134 % effectively producing the global parametric fit. A vector or 0135 % matrix of evaluation points can also be provided. 0136 % 0137 % flim: A vector of lower and upper bounds for the evaluation 0138 % structure, specified as 'c(ll,ur)'. This should not be 0139 % confused with 'xlim'. It defaults to the data range. 0140 % 0141 % mg: For the '"grid"' evaluation structure, 'mg' specifies the 0142 % number of points on each margin. Default 10. Can be either a 0143 % single number or vector. 0144 % 0145 % cut: Refinement parameter for adaptive partitions. Default 0.8; 0146 % smaller values result in more refined partitions. 0147 % 0148 % maxk: Controls space assignment for evaluation structures. For the 0149 % adaptive evaluation structures, it is impossible to be sure 0150 % in advance how many vertices will be generated. If you get 0151 % warnings about `Insufficient vertex space', Locfit's default 0152 % assigment can be increased by increasing 'maxk'. The default 0153 % is 'maxk=100'. 0154 % 0155 % itype: Integration type for density estimation. Available methods 0156 % include '"prod"', '"mult"' and '"mlin"'; and '"haz"' for 0157 % hazard rate estimation problems. The available integration 0158 % methods depend on model specification (e.g. dimension, degree 0159 % of fit). By default, the best available method is used. 0160 % 0161 % mint: Points for numerical integration rules. Default 20. 0162 % 0163 % maxit: Maximum iterations for local likelihood estimation. Default 0164 % 20. 0165 % 0166 % debug: If > 0; prints out some debugging information. 0167 % 0168 % geth: Don't use! 0169 % 0170 % sty: Style for special terms ('left', 'ang' e.t.c.). Do not try to 0171 % set this directly; call 'locfit' instead. 0172 % 0173 %========================================================================== 0174 % 0175 % Requires windows since R-(D)COM is windows-specific 0176 % I am working on a platform-independent replacement 0177 % 0178 % Requires that Matlab-R link Matlab package be installed from 0179 % http://www.mathworks.com/matlabcentral/fileexchange/loadFile.do?objectId=5051&objectType=file 0180 % file MATLAB_RLINK.zip 0181 % 0182 % Requires that R be installed see http://r-project.org first 0183 % file rw1091.exe 0184 % 0185 % Requires that R locfit package be installed first 0186 % From within R in menu do "Packages" then "Install from CRAN" 0187 % 0188 % Requires that R-(D)COM be installed first from 0189 % http://lib.stat.cmu.edu/R/CRAN/contrib/extra/dcom/ 0190 % (get latest EXE file approx 3 MB) 0191 % file RSrv135.exe 0192 % 0193 % The above packages should come bundled with this software for convenience 0194 % with the exception of locfit which is easiest to install from within R 0195 % 0196 % In values: 0197 % 0198 % 0199 0200 % Check for toolboxes 0201 if not(exist('putRdata','file')); 0202 fprintf('You need to install Matlab-R Link first (do: "help locfitraw" for info)\nThen Install R-(D)COM\nThen install R\nThen install locfit from within R\nOnly works on Windoze\n'); 0203 return 0204 end 0205 0206 % 0207 % Connect to R only if not done so already, never disconnect 0208 global RCONNECTED; 0209 if isempty( RCONNECTED ) 0210 % Try the open command 0211 [status,msg] = openR; 0212 if status ~= 1 0213 disp(['Problem connecting to R: ' msg]); 0214 return 0215 end 0216 evalR('library("locfit")') % attach locfit library 0217 RCONNECTED = 1; 0218 end 0219 0220 0221 % Minimal input validation 0222 if nargin < 1 0223 error( 'At least one input argument required' ); 0224 end 0225 if mod(nargin,2)==0 0226 error( 'Argument count must be odd' ); 0227 end 0228 0229 putRdata( 'xdata', varargin{1}(:) ); 0230 args = ''; 0231 0232 n = 2; 0233 while n < length(varargin) 0234 if isa(varargin{n+1},'char') 0235 args = sprintf( '%s,%s="%s"',args, varargin{n}, varargin{n+1} ); 0236 else 0237 putRdata( sprintf('%sval',varargin{n}), varargin{n+1} ); 0238 args = sprintf( '%s,%s=%sval',args, varargin{n}, varargin{n} ); 0239 end 0240 n=n+2; 0241 end 0242 0243 command=sprintf( 'fit<-locfit.raw( xdata %s )', args ); 0244 evalR( command ); 0245 evalR( 'out<-knots(fit,what=c("x","coef","nlx"))' ); 0246 %evalR( 'plot(fit)' ); 0247 0248 out = getRdata( 'out' ); 0249 [x,ind]=sort(out(:,1),1); 0250 y=out(ind,2); 0251 e=out(ind,3); 0252 %aic=getRdata('-2*$fit$dp$lk+2*$fit$dp$df1'); 0253 0254 0255 return;