Statistics Seminars 2006-2007,
Department of Economics, Pompeu Fabra University
Schedule.
Thursday, November 2, 15:00,
room 40.273.
Gerard Biau
(Université Montpellier II)
On the performance of clustering in Hilbert spaces
(Joint work with Luc Devroye and Gábor Lugosi.)
Abstract:
Based on n randomly drawn vectors in a separable Hilbert space,
one may construct a k-means clustering scheme by minimizing an empirical
squared error. We investigate the risk of such a clustering
scheme, defined as the expected squared distance of a random
vector X from the set of cluster centers.
Our main result states that, for an almost surely bounded X, the expected excess clustering risk is O(\sqrt{1/n}). Since clustering in high (or even infinite)-dimensional spaces may lead to
severe computational problems, we examine the
properties of a dimension reduction strategy for clustering based on
Johnson-Lindenstrauss-type random projections.
Our results reflect a tradeoff between accuracy and computational
complexity when one uses k-means clustering after random
projection of the data to a low-dimensional space.
We argue that random
projections work better than other simplistic dimension reduction schemes
Thursday, November 9, 15:00,
room 40.273.
Kees van Montfort
(Department of
Econometrics, Free University Amsterdam and University Nyenrode
Amsterdam)
Multivariate nonlinear time series
modelling of exposure and risk in road safety research (with
F. Bijleveld, J. Commandeur and S.J. Koopman)
Abstract:
We consider a
multivariate time series model for the analysis of traffic volumes and
road casualties inside and outside urban areas. The model consists of
dynamic unobserved factors for exposure and risk that are nonlinearly
related. The multivariate nature of the model is due to the inclusion
of different time series for inside and outside urban areas. The
analysis is based on the extended Kalman filter. The model parameters
are estimated by quasi-maximum likelihood. The latent factors are
estimated by extended smoothing methods. We present a case study of
annual time series of numbers of fatal accidents and numbers of
kilometers driven by motor vehicles in the Netherlands between 1961
and 2000. The analysis accounts for missing entries in the
disaggregated numbers of kilometers driven, although the aggregate
numbers are observed throughout. It is concluded that the salient
features of the observed time series are captured by the model in a
satisfactory way.
Thursday, November 16, 15:00,
room 40.273.
Ula Nur
(London School of Hygiene and Tropical Medicine, London, UK)
Handling missing data in the analysis of alcohol consumption
in the UK Women Cohort Study
Abstract:
Missing values are a problem in large-scale surveys with extensive
questionnaires. The analysis of the complete records may yield
inferences substantially different from those obtained had no data
been missing. Three approaches to handling missing dietary information
on alcohol consumption are compared. Ignoring nonresponse by analyzing
only complete cases produces bias (lower means). Imputing zero (an extreme
value), as is customary at present, underestimates the actual alcohol
consumption. The mean alcohol nutrient intake computed by the three
methods increased from 7.5g/week in the complete case analysis to 8.6g/week
by the zero imputation to 11.3 g/week after applying multiple imputation.
By multiple imputation most of the information in the incomplete records
is used leading to substantial increase of the mean alcohol nutrient
intake computed otherwise, while preserving power and taking into account
the un-certainty due to missing data.
Thursday, November 23, 15:00,
room 40.273.
David Delgado Gomez
(Computational Imaging Lab, Universitat Pompeu Fabra)
Independent histogram pursuit for segmentation of skin lesions
Abstract:
Exploratory data analysis (EDA) follows the philosophy that data should
first be explored without assumptions about probabilistic models, error
distributions, number of groups, relationships between the variables, and
the like, for the purpose of discovering what they can tell us about the
phenomena we are investigating. The goal of EDA is to explore the data to
reveal patterns and features that will help the analyst to better
understand analyze and model the data.
Principal component analysis, projection pursuit or independent component
analysis are some of the most frequently used EDA techniques. They have
been applied to many data classification problems. The linear
combinations that these techniques yield project the data into a lower
dimensional subspace where the data is easier to classify and interpret.
However, prior knowledge available about the data is difficult to
incorporate with them. In this talk, the histogram pursuit algorithm is
presented. It can be regarded as a histogram-based projection pursuit
that includes prior information about the number of classes embedded in
the data. The algorithm can be extended to find several projections in a
way similar to independent component analysis. It can be easily modified
to non-linear projections based on Hastie's principal curves. The
performance of the proposed technique is assessed by several computer
generated and real databases.
Thursday, November 30, 15:00,
room 40.273.
Susanne Raessler
(Institut fuer Arbeitsmarkt und Berufforschung, Nuremberg,
and Universitat Erlangen, Nuremberg, Germany)
Extensive evaluation of the German active labour market policy
Abstract: During the past few years the German labour market was subject to an
intensive process of restructuring. One part of this process are the so called
Hartz reforms trying to improve the efficiency of the Federal Employment
Agency (BA). Important topics of these reforms were the re-structuring of the
system of financial support for unemployed and the re-organization of the
BA. Furthermore the reforms imposed evaluating the huge variety of instruments
of active labour market policy (ALMP) provided and financed by the BA.
To accomplish the aim of measuring and improving the effectiveness of ALMP,
the BA together with the Institute for Employment Research (IAB) and Harvard
University established the project TrEffeR ? Treatment Effect and
PRediction. This project comprises two components: · the retrospective
evaluation of instruments of ALMP; its goal is to identify successful policies
? successful in a global sense as well as with regard to specific regions or
population groups; · a targeting system; this is designed to assist case
workers in local agencies in allocating unemployed people to instruments most
suitable for their specific needs.
For evaluating the effects of labour market policies the outcomes of those
receiving some kind of training or support have to be compared with the
outcomes they would have obtained when not being supported. Since any
individual either receives treatment or does not, only one outcome can be
observed at one person and a surrogate for the potential outcomes under
counterfactual situations has to be found. A common way of solving this
problem is to calculate the average treatment effect, i.e., comparing a group
of those benefiting from ALMP instruments with corresponding ? and in
important characteristics comparable ? persons in a control group of
non-treated. Since average effects for groups of treated people are not
sufficient for the BA's day-to-day business, TrEffeR tries to refine this
approach by calculating individual effects of ALMP.
To estimate the potential outcomes and treatment effects TrEffeR uses
propensity score matching algorithms combined with a subsequent regression
adjustment. Compared to using regression procedures directly, this has the
advantage of being "blind to the answer" in the matching step and yields
robust results. Despite its technical complexity the results are intuitive and
transparent to the user.
Friday, March 2, 16:30,
room 40.273.
Omar Besbes
(Columbia University)
Blind Nonparametric Revenue Management
Abstract:
In most revenue management studies one assumes knowledge of how
consumers react to prices, or alternatively, that the demand function
is known (the demand function maps prices into instantaneous demand
rate). The main focus of the talk is on the implications on
performance and design of pricing strategies associated with removing
this assumption. In the first part of the talk, we present an
empirical example that highlights some of the shortcomings of
parametric approaches and illustrates the need for nonparametric
modeling of the demand function. In the second part of the talk, we
move on to see how such ideas come into play in a dynamic pricing
problem. For that purpose we consider a prototypical revenue
management problem where the decision maker observes realized demand
over time, but is otherwise "blind" to the underlying demand
function. Few structural assumptions are made with regard to the
demand function, in particular, it need not admit any parametric
representation. We introduce a general method for solving such blind
revenue management problems that is based on learning the demand
function "on the fly" and optimizing prices based on that. The
analysis, which involves the classical trade off between exploration
and exploitation, leads to several qualitative and operational
insights with regard to dynamic optimization problems under
uncertainty in general, and the practice of price testing in the
particular context of blind revenue management problems.
Wednesday, April 11, 11:00,
room 20.137
Rosario Romera
(Universidad Carlos III de Madrid).
Robust Partial Least Squares (PLS) Estimation with Applications
Abstract: Partial least squares regression (PLS) is a linear
regression technique developed to relate many regressors to one or
several response variables. Robust methods are introduced to reduce or
remove the effect of outlying data points. In this paper we show that
if the sample covariance matrix is properly robustified further
robustification of the linear regression steps of the PLS algorithm
becomes unnecessary. The robust estimate of the covariance matrix is
computed by searching for outliers in univariate projections of the
data on a combination of random directions (Stahel-Donoho) and
specific directions obtained by maximizing and minimizing the kurtosis
coefficient of the projected data, as proposed by Pen~a and Prieto
(2006). It is shown that this procedure is fast to apply and provides
better results than other procedures proposed in the literature. Its
performance is illustrated by Monte Carlo and by an example, where the
algorithm is able to show features of the data which were undetected
by previous methods.
Thursday, April 12, 17:00,
room 40.273.
Thomas Archibald
(Management School, University of Edinburgh)
Modelling the transshipment decision in retail networks
Abstract:
In multi-location inventory systems, transshipments are often used
to improve customer service and reduce cost. Determining
optimal transshipment policies for such systems involves a complex
optimisation problem that is only tractable for systems with few
locations. Consequently simple heuristic transshipment policies are
often applied in practice. This paper develops an approximate
solution method which applies decomposition to reduce a Markov
decision process model of a multi-location inventory system into a
number of models involving only two locations. The value functions
from the subproblems are used to estimate the fair charge for the
inventory provided in a transshipment. This estimate of the fair
charge is used as the decision criterion in a heuristic transshipment
policy for the multi-location system. A numerical study shows that
the proposed heuristic can deliver considerable cost savings compared
to the simple heuristics often used in practice.
Monday, May 7, 10:00,
room 20.287.
Garud Iyengar
(Columbia University)
Adword Auction Models
Abstract: In this talk we will discuss models for auctions relevant
for pricing advertising slots on the search engines such as Google,
Yahoo! etc. We begin with a general problem formulation which allows
the privately known valuation per click to be a function of both the
advertiser and the slot. We present a compact characterization of the
set of all deterministic incentive compatible direct mechanisms for
this model. This characterization allows us to conclude in this model
there exist incentive compatible mechanisms that are not affine
maximizers. Next, we focus on two interesting special cases:
slot-independent valuation and slot-independent valuation up to a
privately known slot and zero thereafter. For both of these special
cases, we characterize revenue maximizing and efficiency maximizing
mechanisms and show that for a market with n bidders and m advertising
slots these mechanisms can be computed in O(n^2m^2) time. We will
conclude by presenting the results of a numerical study comparing the
proposed optimal mechanisms with rank-based mechanisms and a new
mechanism that we call the customized rank-based mechanism. Joint
work with Anuj Kumar.
Monday, May 21, 17:00,
room 20.137.
Christopher Kirkbride
(Lancaster University)
Allocation models and heuristics for the outsourcing of repairs for a dynamic warranty population.
Abstract: We consider a scenario in which a large equipment
manufacturer wishes to outsource the work involved in repairing
purchased goods while under warranty. Several external service vendors
are available for this work. We develop models and analyses to support
decisions concerning how responsibility for the warranty population
should be divided between them. These also allow the manufacturer to
resolve related questions concerning, for example, whether the service
capacities of the contracted vendors are sufficient to deliver an
effective post-sales service. Static allocation models yield
information concerning the proportions of the warranty population for
which the vendors should be responsible overall. Dynamic allocation
models enable consideration of how such overall workloads might be
delivered to the vendors over time in a way which avoids excessive
variability in the repair burden. We apply dynamic programming policy
improvement to develop an effective dynamic allocation heuristic. This
is evaluated numerically and is also used as a yardstick to assess two
simple allocation heuristics suggested by static models. A dynamic
greedy allocation heuristic is found to perform well. Dividing the
workload equally among vendors with different service capacities can
lead to serious losses.
Co-authors are K.D. Glazebrook (Lancaster) and L.Ding (Edinburgh)