Friday, September 26, 17:30,
room 40.035 (Sala Albert Calsamiglia, Roger
Lluria building).
Trevor Hastie
(Department of Statistics, Stanford University)
Least Angle Regression, Forward Stagewise and the Lasso
(Joint work with Brad Efron, Iain Johnstone and Rob Tibshirani)
After Trevor Hastie's seminar we invite all those present to some drinks and
snacks.
Abstract:
Least Angle Regression (LARS) is a new model selection algorithm. It
is a useful and less greedy version of traditional forward selection
methods. Three main properties of LARS are derived.
(1) A simple modification of the LARS algorithm implements the Lasso,
an attractive alternative to OLS that constrains the sum of the
absolute regression coefficients. The LARS modification calculates all
possible Lasso estimates for a given problem in an order of magnitude
less computer time than previous methods.
(2) A different LARS modification efficiently implements epsilon
Forward Stagewise linear regression, another promising new model
selection method; this connection explains the similar numerical
results previously observed for the Lasso and Stagewise, and helps
understand the properties of both methods, which are seen as
constrained versions of the simpler LARS algorithm.
(3) A simple approximation for the degrees of freedom of a LARS
estimate is available, from which we derive a Cp estimate of
prediction error; this allows a principled choice among the range of
possible LARS estimates.
LARS and its variants are computationally efficient. We provide R and
Splus software which enables one to fit the entire coefficient path
for LAR, Lasso or Forward Stagewise at the cost of a single least
squares fit.
There are strong connections between the epsilon forward stagewise
regression and the boosting technique popular in machine learning.
These connections offer new explanations for the success of boosting.
Thursday, November 6, 12:00, room 20.237.
Leila Mohammadi
(University of Leiden)
On the statistical theory of classification
Abstract:
This lecture contains some theories in statistical learning problems in
the nonparametric setting.
Suppose we are given n i.i.d. copies
of a random variable
(X,Y), where X is an instance and Y
is a label, -1 or 1 . We define a classifier h as a function
with values -1 and 1 and we assume H denotes a class of classifiers.
If X is one dimensional and for some parametric cases of H such as
the classifiers with K thresholds, we estimate the parameters
by the minimizer of the classification error in the
sample and we show that the cube
root asymptotic results hold under some conditions
(see also
Mohammadi and van de Geer (2003)). We obtain the asymptotic
distributions of the estimators.
If one of the thresholds is at the border of the space of X,
then the asymptotic result is different and convergence is quicker.
We also consider the case that X is multidimensional
and show that similar results hold when the classifiers are 1 on halfspaces.
In a simple case, we show that the rate of convergence of the empirical risk
minimizer is optimal.
We also propose some algorithms to find the empirical risk
minimizers in one dimensional case.
Thursday, November 13, 12:00, room 20.237.
Nicolas
Vayatis (Université Paris 6)
The price of convexity in statistical learning
(joint work with Gilles Blanchard and Gábor Lugosi)
Abstract:
In the last ten years two approaches have been considered for the
classification of high-dimensional data: large-margin algorithms (known as
Support Vector Machines) and combination algorithms (or boosting methods).
Both were initially developped on the basis of heuristics and geometric
arguments inside the machine learning community and turned out to be very
powerful methods for dealing with practical problems. The key factor for
their applicability was convexity since both were formulated in terms of convex
optimization algorithms. In this talk, I will review some of the
theoretical results explaining the efficiency of these methods from a
statistical point of view and attempt to track the impact of using convex
cost functions on the rates of convergence for a regularized boosting
method. The case of additive models in high dimensions will also be
discussed.
Thursday, November 20, 12:00, room 20.237.
Michael Wolf (UPF)
Stepwise Multiple Testing as Formalized Data Snooping
Abstract:
It is common in econometric applications that several hypothesis tests are
carried out at the same time. The problem then becomes how to decide which
hypotheses to reject, accounting for the multitude of tests. In this paper,
we suggest a stepwise multiple testing procedure which asymptotically
controls the familywise error rate at a desired level. Compared to related
single-step methods, our procedure is more powerful in the sense that it
often will reject more false hypotheses. In addition, we advocate the use of
studentization when it is feasible. Unlike some stepwise methods, our method
implicitly captures the joint dependence structure of the test statistics,
which results in increased ability to detect alternative hypotheses. We prove
our method asymptotically controls the familywise error rate under minimal
assumptions. We present our methodology in the context of comparing several
strategies to a common benchmark and deciding which strategies actually beat
the benchmark. However, our ideas can easily be extended and/or modi ed to
other contexts, such as making inference for the individual regression coe
cients in a multiple regression framework. Some simulation studies show the
improvements of our methods over previous proposals. We also provide an
application to a set of real data.
Thursday, November 27, 12:00, room 20.237.
Javier Martínez Moguerza (Universidad Rey Juan Carlos, Madrid)
and
Alberto Muñoz,
(Statistics Department, Universidad Carlos III de Madrid)
Estimation of high density regions using Support Neighbour Machines
Abstract:
We investigate the problem of estimating high density regions from
univariate or multivariate data samples. To be more precise, we estimate the
minimum volume sets known in the literature as density contour clusters.
This problem arises in outlier detection, scenario selection in stochastic
programming or cluster analysis, and is strongly related to One-Class
Support Vector Machines (SVM). In this paper we propose a new method to
solve this problem, the Support Neighbour Machine (SNM). We show its
properties, introduce a new class of kernels, and demonstrate an important
relation of the One-Class SVM with density estimation. Finally numerical
results illustrating the advantage of the new method are shown.
Thursday, December 4, 18:00, room 20.137.
Pedro Delicado,
(Departament d'Estadística i Investigació Operativa
Universitat Politècnica de Catalunya)
Remarks on local likelihood density estimation
Abstract:
Local likelihood is recognized as a very successful and
intuitively appealing method for nonparametric regression, but on
the other hand the density estimation version of local likelihood is
rarely used, in spite of the fact that theoretical properties are
comparable to those in regression. It is our belief that the main reason
for that is the lack of straightforward motivation for the
formulas recently put forward by the literature in the topic of local
likelihood density estimation. An alternative approach to achieve
these formulas is considered in this paper. It is based on truncation
arguments. Moreover, it is established that apparently different
approaches to local likelihood density estimation are equivalent when
the parametric model they are based on is closed for product
by constants.
Thursday, March 11, 12:00, room 20.237.
Nicholas Longford (De Montfort University, Leicester, UK)
Pattern of change and stability of household income in
European countries in the 1990's
(with M.G. Pittau, University of Rome 'la Sapienza', Rome, Italy)
Abstract:
The talk explores the patterns of change in the annual household
income in the countries of the European Community during the years
1994-1999. The income is modelled by mixtures of log-normal
distributions, and the mixture components are interpreted as
representing one subpopulation with steady increments and others
with various levels of volatility. The method is extended to models
for a combination of log-normal and categorical variables. An index
of income stability is defined for the countries. Graphical summaries
of the results are emphasized. The advantages of mixtures and kernel
smoothing are critically evaluated. Current and planned research in other
applications of mixture models
will be outlined.
Thursday, April 29, 12:00, room 20.237
Heinz Neudecker, (University of Amsterdam)
On Best Affine Unbiased Covariance-Preserving Prediction of
Factor Scores
Abstract:
This paper gives a generalization of results presented by ten Berge,
Krijnen, Wansbeek & Shapiro (BKWS).
They examined procedures and results as proposed by Anderson & Rubin,
McDonald, Green and Krijnen, Wansbeek & ten Berge (KWB). We shall
consider the same matter, under weaker rank assumptions. We allow some
moments, viz the variance matrix of the observable scores vector,
$\Omega$, and that of the unique factors,$ \Psi$, to be singular. We
require $T^\prime \Psi T > 0$, where $T \Lambda T^\prime$ is a Schur
decomposition of $\Omega$. As usual the variance matrix of the common
factors, $\Phi$, and the loadings matrix $A$ will have full column rank.
Thursday, May 13, 17:30, room 20.137.
Lúcia P. Barroso
(Statistics Department, Universidade de São Paulo, Brasil)
Data mining in large networks: cohesion surfaces over a multidimensionally
scaled base
Abstract:
with the advance of informatics and telecommunications, the
volume of available information about every aspect of human existence and
in every scientific area has increased considerably. In this talk, a
graphical representation of large networks based on the use of cohesion
surfaces over a multidimensionally scaled thematic base is proposed as a
tool for collaborative filtering. For its development classic
multidimensional scaling and Procrustes Analysis are combined in an
iterative algorithm, which consolidates partial solutions into an overall
continuous representation. Tested on a set of book-lending transactions at
the Karl A. Boedecker Library (Fundação Getúlio Vargas in Brasil), the
algorithm produces an output that is thematically interpretable and
consistent, with a stress measure smaller than the classic MDS solution.
Thursday, May 20, 17:30, room 20.137.
Stephen Fienberg, (Carnegie-Mellon University)
Bayesian Mixed Membership Models for Soft Classification
Abstract:
he paper describes and applies a fully Bayesian approach to mixed
membership
modeling for soft classification Our model structure has assumptions on four
levels:
population, subject, latent variable, and sampling scheme. Population level
assumptions
describe the general structure of the population that is common to all
subjects. Subject level
assumptions specify the distribution of observable responses given individual
membership
scores. Membership scores are usually unknown and hence we can also view them
as latent
variables, treating them as either fixed or random in the model. Finally, the
last level of
assumptions specifies the number of distinct observed characteristics
(attributes) and the
number of replications for each characteristic. We illustrate the flexibility
and utility of the
general model through two applications: (i) The first focuses on data from the
National Long
Term Care Survey where we explore disability classifications. (ii) The second
analyses the
structure of articles in The Proceedings of the National Academy of Sciences
based on
semantic decompositions of abstracts and bibliographies. In the first
application we carry out
estimation using a Monte Carlo Markov chain implementation for sampling from
the
posterior distribution and in the second application, because of the size and
complexity of
the data base, we use a variational approximation to the posterior. We also
include a guide
other applications of this form of mixed membership modeling in genetics and
machine
learning.
Thursday, June 3, 12:00, room 20.237.
Peter Hansen (Stanford University)
Model confidence sets for forecasting models
The paper
Abstract:
The paper introduces the model confidence set (MCS) and applies
it to the selection of forecasting models. A MCS is a set of models that
is constructed such that it will contain the best forecasting model,
given a level of confidence. Thus, a MCS is analogous to a confidence
interval for a parameter. The MCS acknowledges the limitations of the
data, such that uninformative data yields a MCS with many models, whereas
informative data yields a MCS with only a few models. We revisit the
empirical application in Stock and Watson (1999) and apply the MCS
procedure to their set of inflation forecasts. Although the MCS contains
only a few models in the first subsample, there is little information in
the second post-1984 subsample, which results in a large MCS. Yet, the
random walk forecast is not contained in the MCS for either of the
samples. This shows that the random walk forecast is inferior to principal
component-based inflation forecasts. --
Thursday, June 10, 12:00, room 20.047.
Karim Abadir (University of York)
The memory of financial and macro markets
(joint work with Gabriel Talmain).
Abstract:
Macroeconomists have often come up with strong predictions about
relationships that should hold between some aggregate economic variables.
However, the data may seem to tell a different story. The discrepancy
between economic theory and empirical investigation is reflected in a number
of well-known "paradoxes" or "puzzles". Despite the lukewarm empirical
support, macroeconomists are weary of giving up the theoretical predictions,
often feeling that these discrepancies must be due to inadequate statistical
analysis. Even predictions that seem so intuitive, such as the Uncovered
Interest Parity (UIP) theorem, and that should survive in a wide range of
economic environments, have been strongly rejected.
In this talk, we show that these puzzles are often caused by the very
high persistence of macroeconomic and financial series obscuring the true
relationship between variables. We believe that these "paradoxes" can arise
because current econometric methods do not take explicit account of the form
of long-memory and nonlinear dynamics that we have uncovered in a previous
paper, Abadir and Talmain (2002, Review of Economic Studies), and that we
find frequently in macro data. In this paper, we devise a simple new method
to disentangle the long-run co-movements of variables from the effects of
persistence, and we apply it to two well-known puzzles. After correcting for
the effect of persistence, we find that the data support the predictions of
economic theory and that the puzzles evaporate.
Slides of the talk