Statistics Seminars

Statistics Seminars 2006-2007,
Department of Economics, Pompeu Fabra University

Schedule.

Thursday, November 2, 15:00, room 40.273. Gerard Biau (Université Montpellier II) On the performance of clustering in Hilbert spaces (Joint work with Luc Devroye and Gábor Lugosi.)
Abstract: Based on n randomly drawn vectors in a separable Hilbert space, one may construct a k-means clustering scheme by minimizing an empirical squared error. We investigate the risk of such a clustering scheme, defined as the expected squared distance of a random vector X from the set of cluster centers. Our main result states that, for an almost surely bounded X, the expected excess clustering risk is O(\sqrt{1/n}). Since clustering in high (or even infinite)-dimensional spaces may lead to severe computational problems, we examine the properties of a dimension reduction strategy for clustering based on Johnson-Lindenstrauss-type random projections. Our results reflect a tradeoff between accuracy and computational complexity when one uses k-means clustering after random projection of the data to a low-dimensional space. We argue that random projections work better than other simplistic dimension reduction schemes
Thursday, November 9, 15:00, room 40.273. Kees van Montfort (Department of Econometrics, Free University Amsterdam and University Nyenrode Amsterdam) Multivariate nonlinear time series modelling of exposure and risk in road safety research (with F. Bijleveld, J. Commandeur and S.J. Koopman)
Abstract: We consider a multivariate time series model for the analysis of traffic volumes and road casualties inside and outside urban areas. The model consists of dynamic unobserved factors for exposure and risk that are nonlinearly related. The multivariate nature of the model is due to the inclusion of different time series for inside and outside urban areas. The analysis is based on the extended Kalman filter. The model parameters are estimated by quasi-maximum likelihood. The latent factors are estimated by extended smoothing methods. We present a case study of annual time series of numbers of fatal accidents and numbers of kilometers driven by motor vehicles in the Netherlands between 1961 and 2000. The analysis accounts for missing entries in the disaggregated numbers of kilometers driven, although the aggregate numbers are observed throughout. It is concluded that the salient features of the observed time series are captured by the model in a satisfactory way.
Thursday, November 16, 15:00, room 40.273. Ula Nur (London School of Hygiene and Tropical Medicine, London, UK) Handling missing data in the analysis of alcohol consumption in the UK Women Cohort Study
Abstract: Missing values are a problem in large-scale surveys with extensive questionnaires. The analysis of the complete records may yield inferences substantially different from those obtained had no data been missing. Three approaches to handling missing dietary information on alcohol consumption are compared. Ignoring nonresponse by analyzing only complete cases produces bias (lower means). Imputing zero (an extreme value), as is customary at present, underestimates the actual alcohol consumption. The mean alcohol nutrient intake computed by the three methods increased from 7.5g/week in the complete case analysis to 8.6g/week by the zero imputation to 11.3 g/week after applying multiple imputation. By multiple imputation most of the information in the incomplete records is used leading to substantial increase of the mean alcohol nutrient intake computed otherwise, while preserving power and taking into account the un-certainty due to missing data.
Thursday, November 23, 15:00, room 40.273. David Delgado Gomez (Computational Imaging Lab, Universitat Pompeu Fabra) Independent histogram pursuit for segmentation of skin lesions
Abstract: Exploratory data analysis (EDA) follows the philosophy that data should first be explored without assumptions about probabilistic models, error distributions, number of groups, relationships between the variables, and the like, for the purpose of discovering what they can tell us about the phenomena we are investigating. The goal of EDA is to explore the data to reveal patterns and features that will help the analyst to better understand analyze and model the data. Principal component analysis, projection pursuit or independent component analysis are some of the most frequently used EDA techniques. They have been applied to many data classification problems. The linear combinations that these techniques yield project the data into a lower dimensional subspace where the data is easier to classify and interpret. However, prior knowledge available about the data is difficult to incorporate with them. In this talk, the histogram pursuit algorithm is presented. It can be regarded as a histogram-based projection pursuit that includes prior information about the number of classes embedded in the data. The algorithm can be extended to find several projections in a way similar to independent component analysis. It can be easily modified to non-linear projections based on Hastie's principal curves. The performance of the proposed technique is assessed by several computer generated and real databases.
Thursday, November 30, 15:00, room 40.273. Susanne Raessler (Institut fuer Arbeitsmarkt und Berufforschung, Nuremberg, and Universitat Erlangen, Nuremberg, Germany) Extensive evaluation of the German active labour market policy
Abstract: During the past few years the German labour market was subject to an intensive process of restructuring. One part of this process are the so called Hartz reforms trying to improve the efficiency of the Federal Employment Agency (BA). Important topics of these reforms were the re-structuring of the system of financial support for unemployed and the re-organization of the BA. Furthermore the reforms imposed evaluating the huge variety of instruments of active labour market policy (ALMP) provided and financed by the BA. To accomplish the aim of measuring and improving the effectiveness of ALMP, the BA together with the Institute for Employment Research (IAB) and Harvard University established the project TrEffeR ? Treatment Effect and PRediction. This project comprises two components: · the retrospective evaluation of instruments of ALMP; its goal is to identify successful policies ? successful in a global sense as well as with regard to specific regions or population groups; · a targeting system; this is designed to assist case workers in local agencies in allocating unemployed people to instruments most suitable for their specific needs. For evaluating the effects of labour market policies the outcomes of those receiving some kind of training or support have to be compared with the outcomes they would have obtained when not being supported. Since any individual either receives treatment or does not, only one outcome can be observed at one person and a surrogate for the potential outcomes under counterfactual situations has to be found. A common way of solving this problem is to calculate the average treatment effect, i.e., comparing a group of those benefiting from ALMP instruments with corresponding ? and in important characteristics comparable ? persons in a control group of non-treated. Since average effects for groups of treated people are not sufficient for the BA's day-to-day business, TrEffeR tries to refine this approach by calculating individual effects of ALMP. To estimate the potential outcomes and treatment effects TrEffeR uses propensity score matching algorithms combined with a subsequent regression adjustment. Compared to using regression procedures directly, this has the advantage of being "blind to the answer" in the matching step and yields robust results. Despite its technical complexity the results are intuitive and transparent to the user.
Friday, March 2, 16:30, room 40.273. Omar Besbes (Columbia University) Blind Nonparametric Revenue Management
Abstract: In most revenue management studies one assumes knowledge of how consumers react to prices, or alternatively, that the demand function is known (the demand function maps prices into instantaneous demand rate). The main focus of the talk is on the implications on performance and design of pricing strategies associated with removing this assumption. In the first part of the talk, we present an empirical example that highlights some of the shortcomings of parametric approaches and illustrates the need for nonparametric modeling of the demand function. In the second part of the talk, we move on to see how such ideas come into play in a dynamic pricing problem. For that purpose we consider a prototypical revenue management problem where the decision maker observes realized demand over time, but is otherwise "blind" to the underlying demand function. Few structural assumptions are made with regard to the demand function, in particular, it need not admit any parametric representation. We introduce a general method for solving such blind revenue management problems that is based on learning the demand function "on the fly" and optimizing prices based on that. The analysis, which involves the classical trade off between exploration and exploitation, leads to several qualitative and operational insights with regard to dynamic optimization problems under uncertainty in general, and the practice of price testing in the particular context of blind revenue management problems.
Wednesday, April 11, 11:00, room 20.137 Rosario Romera (Universidad Carlos III de Madrid). Robust Partial Least Squares (PLS) Estimation with Applications
Abstract: Partial least squares regression (PLS) is a linear regression technique developed to relate many regressors to one or several response variables. Robust methods are introduced to reduce or remove the effect of outlying data points. In this paper we show that if the sample covariance matrix is properly robustified further robustification of the linear regression steps of the PLS algorithm becomes unnecessary. The robust estimate of the covariance matrix is computed by searching for outliers in univariate projections of the data on a combination of random directions (Stahel-Donoho) and specific directions obtained by maximizing and minimizing the kurtosis coefficient of the projected data, as proposed by Pen~a and Prieto (2006). It is shown that this procedure is fast to apply and provides better results than other procedures proposed in the literature. Its performance is illustrated by Monte Carlo and by an example, where the algorithm is able to show features of the data which were undetected by previous methods.
Thursday, April 12, 17:00, room 40.273. Thomas Archibald (Management School, University of Edinburgh) Modelling the transshipment decision in retail networks
Abstract: In multi-location inventory systems, transshipments are often used to improve customer service and reduce cost. Determining optimal transshipment policies for such systems involves a complex optimisation problem that is only tractable for systems with few locations. Consequently simple heuristic transshipment policies are often applied in practice. This paper develops an approximate solution method which applies decomposition to reduce a Markov decision process model of a multi-location inventory system into a number of models involving only two locations. The value functions from the subproblems are used to estimate the fair charge for the inventory provided in a transshipment. This estimate of the fair charge is used as the decision criterion in a heuristic transshipment policy for the multi-location system. A numerical study shows that the proposed heuristic can deliver considerable cost savings compared to the simple heuristics often used in practice.
Monday, May 7, 10:00, room 20.287. Garud Iyengar (Columbia University) Adword Auction Models
Abstract: In this talk we will discuss models for auctions relevant for pricing advertising slots on the search engines such as Google, Yahoo! etc. We begin with a general problem formulation which allows the privately known valuation per click to be a function of both the advertiser and the slot. We present a compact characterization of the set of all deterministic incentive compatible direct mechanisms for this model. This characterization allows us to conclude in this model there exist incentive compatible mechanisms that are not affine maximizers. Next, we focus on two interesting special cases: slot-independent valuation and slot-independent valuation up to a privately known slot and zero thereafter. For both of these special cases, we characterize revenue maximizing and efficiency maximizing mechanisms and show that for a market with n bidders and m advertising slots these mechanisms can be computed in O(n^2m^2) time. We will conclude by presenting the results of a numerical study comparing the proposed optimal mechanisms with rank-based mechanisms and a new mechanism that we call the customized rank-based mechanism. Joint work with Anuj Kumar.
Monday, May 14, 11:30, room 40.039. Sylvain Sorin (Université Pierre et Marie Curie and École Polytechnique) Consistent Procedures in Continuous Time
Monday, May 21, 17:00, room 20.137. Christopher Kirkbride (Lancaster University) Allocation models and heuristics for the outsourcing of repairs for a dynamic warranty population.
Abstract: We consider a scenario in which a large equipment manufacturer wishes to outsource the work involved in repairing purchased goods while under warranty. Several external service vendors are available for this work. We develop models and analyses to support decisions concerning how responsibility for the warranty population should be divided between them. These also allow the manufacturer to resolve related questions concerning, for example, whether the service capacities of the contracted vendors are sufficient to deliver an effective post-sales service. Static allocation models yield information concerning the proportions of the warranty population for which the vendors should be responsible overall. Dynamic allocation models enable consideration of how such overall workloads might be delivered to the vendors over time in a way which avoids excessive variability in the repair burden. We apply dynamic programming policy improvement to develop an effective dynamic allocation heuristic. This is evaluated numerically and is also used as a yardstick to assess two simple allocation heuristics suggested by static models. A dynamic greedy allocation heuristic is found to perform well. Dividing the workload equally among vendors with different service capacities can lead to serious losses. Co-authors are K.D. Glazebrook (Lancaster) and L.Ding (Edinburgh)

Last years' seminars: 1999-2000, 2000-2001, 2001-2002 2002-2003 2003-2004 2004-2005

Back to Lugosi's homepage