A new and alternative quantile regression estimator is developed and it is shown that the estimator is root n-consistent and asymptotically normal. The estimator is based on a minimax ‘deviance function’ and has asymptotically equivalent properties to the usual quantile regression estimator. It is, however, a different and therefore new estimator. It allows for both linear- and nonlinear model specifications. A simple algorithm for computing the estimates is proposed. It seems to work quite well in practice but whether it has theoretical justification is still an open question.
Quantile regression, non-linear quantile regression, estimating functions, minimax estimation, empirical process theory
Oracle Efficient Variable Selection in Random and Fixed Effects Panel Data Models
This paper generalizes the results for the Bridge estimator of Huang et al. (2008) to linear random and fixed effects panel data models which are allowed to grow in both dimensions. In particular we show that the Bridge estimator is oracle efficient. It can correctly distinguish between relevant and irrelevant variables and the asymptotic distribution of the estimators of the coefficients of the relevant variables is the same as if only these had been included in the model, i.e. as if an oracle had revealed the true model prior to estimation. In the case of more explanatory variables than observations, we prove that the Marginal Bridge estimator can asymptotically correctly distinguish between relevant and irrelevant explanatory variables. We do this without restricting the dependence between covariates and without assuming sub Gaussianity of the error terms thereby generalizing the results of Huang et al. (2008). Further more, the number of relevant variables is allowed to be larger than the sample size.
Dominique Guegan (CES – Centre d’économie de la Sorbonne – CNRS : UMR8174 – Université Panthéon-Sorbonne – Paris I, EEP-PSE – Ecole d’Économie de Paris – Paris School of Economics – Ecole d’Économie de Paris)
Patrick Rakotomarolahy (CES – Centre d’économie de la Sorbonne – CNRS : UMR8174 – Université Panthéon-Sorbonne – Paris I)
An empirical forecast accuracy comparison of the non-parametric method, known as multivariate Nearest Neighbor method, with parametric VAR modelling is conducted on the euro area GDP. Using both methods for nowcasting and forecasting the GDP, through the estimation of economic indicators plugged in the bridge equations, we get more accurate forecasts when using nearest neighbor method. We prove also the asymptotic normality of the multivariate k-nearest neighbor regression estimator for dependent time series, providing confidence intervals for point forecast in time series.
Forecast – Economic indicators – GDP – Euro area – VAR – Multivariate k nearest neighbor regression – Asymptotic normality
« On Properties of Separating Information Maximum Likelihood Estimation of Realized Volatility and Covariance with Micro-Market Noise »
Naoto Kunitomo (Faculty of Economics, University of Tokyo)
Seisho Sato (Institute of Statistical Mathematics)
For estimating the realized volatility and covariance by using high frequency data, we have introduced the Separating Information Maximum Likelihood (SIML) method when there are possibly micro-market noises by Kunitomo and Sato (2008a, 2008b, 2010a, 2010b). The resulting estimator is simple and it has the representation as a specific quadratic form of returns. We show that the SIML estimator has reasonable asymptotic properties; it is consistent and it has the asymptotic normality (or the stable convergence in the general case) when the sample size is large under general conditions including some non-Gaussian processes and some volatility models. Based on simulations, we find that the SIML estimator has reasonable finite sample properties and thus it would be useful for practice. The SIML estimator has the asymptotic robustness properties in the sense it is consistent when the noise terms are weakly dependent and they are endogenously correlated with the efficient market price process. We also apply our method to an analysis of Nikkei-225 Futures, which has been the major stock index in the Japanese financial sector.
Quantile-Based Inference for Elliptical Distributions
We estimate the parameters of an elliptical distribution by means of a multivariate exten- sion of the Method of Simulated Quantiles (MSQ) of Dominicy and Veredas (2010). The multivariate extension entails the challenge of the construction of a function of quantiles that is informative about the covariation parameters. The interquantile range of a projection of pairwise random variables onto the 45 degree line is very informative about the covariation of the two random variables. MSQ provides the asymptotic theory for the estimators and a Monte Carlo study reveals good nite sample properties of the estimators. An empirical application to 22 worldwide nancial market returns illustrates the usefulness of the method.
Quantiles; Elliptical Distribution; Heavy Tails
Detecting Structural Breaks using Hidden Markov Models
Christos Ntantamis (School of Economics and Management, University of Aarhus and CREATES)
Testing for structural breaks and identifying their location is essential for econometric modeling. In this paper, a Hidden Markov Model (HMM) approach is used in order to perform these tasks. Breaks are defined as the data points where the underlying Markov Chain switches from one state to another. The estimation of the HMM is conducted using a variant of the Iterative Conditional Expectation-Generalized Mixture (ICE-GEMI) algorithm proposed by Delignon et al. (1997), that permits analysis of the conditional distributions of economic data and allows for different functional forms across regimes. The locations of the breaks are subsequently obtained by assigning states to data points according to the Maximum Posterior Mode (MPM) algorithm. The Integrated Classification Likelihood-Bayesian Information Criterion (ICL-BIC) allows for the determination of the number of regimes by taking into account the classification of the data points to their corresponding regimes. The performance of the overall procedure, denoted IMI by the initials of the component algorithms, is validated by two sets of simulations; one in which only the parameters are permitted to differ across regimes, and one that also permits differences in the functional forms. The IMI method performs well in both sets. Moreover, when it is compared to the Bai and Perron (1998) method its performance is superior in the assessing the number of breaks and their respective locations. Finally, the methodology is applied for the detection of breaks in the monetary policy of United States, the di erent functional form being variants of the Taylor (1993) rule.
Researchers in economics and other disciplines are often interested in the causal effect of a binary treatment on outcomes. Econometric methods used to estimate such effects are divided into one of two strands depending on whether they require the conditional independence assumption (i.e., independence of potential outcomes and treatment assignment conditional on a set of observable covariates). When this assumption holds, researchers now have a wide array of estimation techniques from which to choose. However, very little is known about their performance – both in absolute and relative terms – when measurement error is present. In this study, the performance of several estimators that require the conditional independence assumption, as well as some that do not, are evaluated in a Monte Carlo study. In all cases, the data-generating process is such that conditional independence holds with the ‘real’ data. However, mea surement error is then introduced. Specifically, three types of measurement error are considered: (i) errors in treatment assignment, (ii) errors in the outcome, and (iii) errors in the vector of covariates. Recommendations for researchers are provided.
treatment effects, propensity score, unconfoundedness, selection on observables, measurement error
Banded and Tapered Estimates for Autocovariance Matrices and the Linear Process Bootstrap
We address the problem of estimating the autocovariance matrix of a stationary process. Under short range dependence assumptions, convergence rates are established for a gradually tapered version of the sample autocovariance matrix and for its inverse. The proposed estimator is formed by leaving the main diagonals of the sample autocovariance matrix intact while gradually down-weighting oï¿½-diagonal entries towards zero. In addition we show the same convergence rates hold for a positive deï¿½nite version of the estimator, and we introduce a new approach for selecting the banding parameter. The new matrix estimator is shown to perform well theoretically and in simulation studies. As an application we introduce a new resampling scheme for stationary processes termed the linear process bootstrap (LPB). The LPB is shown to be asymptotically valid for the sample mean and related statistics. The eï¿½ectiveness of the proposed methods are demonstrated in a simulation study.
Empirical Likelihood (EL) and other methods that operate within the Empirical Estimating Equations (E3) approach to estimation and inference are challenged by the Empty Set Problem (ESP). ESP concerns the possibility that a model set, which is data-dependent, may be empty for some data sets. To avoid ESP we return from E3 back to the Estimating Equations, and explore the Bayesian infinite-dimensional Maximum A-posteriori Probability (MAP) method. The Bayesian MAP with Dirichlet prior motivates a Revised EL (ReEL) method. ReEL i) avoids ESP as well as the convex hull restriction, ii) attains the same basic asymptotic properties as EL, and iii) its computation complexity is comparable to that of EL.
empirical estimating equations, generalized minimum contrast, empirical likelihood, generalized empirical likelihood, empty set problem, convex hull restriction, estimating equations, maximum aposteriori probability
The PCSE Estimator is Good — Just Not as Good as You Think
W. Robert Reed (University of Canterbury)
This paper investigates the properties of the Panel-Corrected Standard Error (PCSE) estimator. The PCSE estimator is commonly used when working with time-series, crosssectional (TSCS) data. In an influential paper, Beck and Katz (1995) (henceforth BK) demonstrated that FGLS produces coefficient standard errors that are severely underestimated. They report Monte Carlo experiments in which the PCSE estimator produces accurate standard error estimates at no, or little, loss in efficiency compared to FGLS. Our study further investigates the properties of the PCSE estimator. We first reproduce the main experimental results of BK using their Monte Carlo framework. We then show that the PCSE estimator does not perform as well when tested in data environments that better resemble “practical research situations.” When (i) the explanatory variable(s) are characterized by substantial persistence, (ii) there is serial correlation in the errors, and (iii) the time span of the data series is relatively short, coverage rates for the PCSE estimator frequently fall between 80 and 90 percent. Further, we find many “practical research situations” where the PCSE estimator compares poorly with FGLS on efficiency grounds.
Panel data estimation; Monte Carlo analysis; FGLS; Parks; PCSE; finite sample
Identifying All Distinct Sample P-P Plots, with an Application to the Exact Finite Sample Distribution of the L1-FCvM Test Statistic
Jeroen Hinloopen (University of Amsterdam)
Rien Wagenvoort (European Investment Bank, Luxemburg)
P-p plots contain all the information that is needed for scale-invariant comparisons. Indeed, Empirical Distribution Function (EDF) tests translate sample p-p plots into a single number. In this paper we characterize the set of all distinct p-p plots for two balanced sample of size <I>n</I> absent ties. Distributions of EDF test statistics are embedded in this set. It is thus used to derive the exact finite sample distribution of the L<sub>1</sub>-version of the Fisz-Cramér-von Mises test. Comparing this distribution with the (known) limiting distribution shows that the latter can always be used for hypothesis testing: although for finite samples the critical percentiles of the limiting distribution differ from the exact values, this will not lead to differences in the rejection of the underlying hypothesis.
Sample p-p plot; EDF test; finite sample distribution; limiting distribution
The asymptotic distribution of maximum likelihood estimators is derived for a class of exponential generalized autoregressive conditional heteroskedasticity (EGARCH) models. The result carries over to models for duration and realised volatility that use an exponential link function. A key feature of the model formulation is that the dynamics are driven by the score.
Duration models; gamma distribution; general error distribution; heteroskedasticity; leverage; score; Student’s t.
Fixed-b asymptotics for the studentized mean from time series with short, long or negative memory
This paper considers the problem of distribution estimation for the studentized sample mean in the context of Long Memory and Negative Memory time series dynamics, adopting the fixed-bandwidth approach now popular in the econometrics literature. The distribution theory complements the Short Memory results of Kiefer and Vogelsang (2005). In particular, our results highlight the dependence on the employed kernel, whether or not the taper is nonzero at the boundary, and most importantly whether or not the process has short memory. We also demonstrate that small-bandwidth approaches fail when long memory or negative memory is present since the limiting distribution is either a point mass at zero or degenerate. Extensive numerical work provides approximations to the quantiles of the asymptotic distribution for a range of tapers and memory parameters; these quantiles can be used in practice for the construction of confidence in tervals and hypothesis tests for the mean of the time series.
In this thesis some new, nonparametric methods are introduced to explore uni- or bivariate data. First, we introduce the « shorth plot », a graphical method to depict the main features of one-dimensional probability distributions. Second, we define the « Half-Half plot », a useful tool for analyzing regression data. Furthermore, a test for spherical symmetry in an empirical likelihood framework is presented. For all methods the asymptotic behavior is derived. The good performance of the methods is demonstrated through simulated and real data examples.
Methods, like Maximum Empirical Likelihood (MEL), that operate within the Empirical Estimating Equations (E3) approach to estimation and inference are challenged by the Empty Set Problem (ESP). We propose to return from E3 back to the Estimating Equations, and to use the Maximum Likelihood method. In the discrete case the Maximum Likelihood with Estimating Equations (MLEE) method avoids ESP. In the continuous case, how to make ML-EE operational is an open question. Instead of it, we propose a Patched Empirical Likelihood, and demonstrate that it avoids ESP. The methods enjoy, in general, the same asymptotic properties as MEL.
maximum likelihood, estimating equations, empirical likelihood
We study the empirical behaviour of semi-parametric log-periodogram estimation for long memory models when the true process exhibits a change in persistence. Simulation results confirm theoretical arguments which suggest that evidence for long memory is likely to be found. A recently proposed test by Sibbertsen and Kruse (2009) is shown to exhibit noticeable power to discriminate between long memory and a structural change in autoregressive parameters.
Long memory; changing persistence; structural break; semi-parametric estimation
Cliometrics and Time Series Econometrics: Some Theory and Applications
The paper discusses a range of modern time series methods that have become popular in the past 20 years and considers their usefulness for cliometrics research both in theory and via a range of applications. Issues such as, spurious regression, unit roots, cointegration, persistence, causality, structural time series methods, including time varying parameter models, are introduced as are the estimation and testing implications that they involve. Applications include a discussion of the timing and potential causes of the British Industrial Revolution, income „convergence? and the long run behaviour of English Real Wages 1264 – 1913. Finally some new and potentially useful developments are discussed including the mildly explosive processes; graphical modelling and long memory.
Time series; cointegration; unit roots; persistence; causality; cliometrics; convergence; long memory; graphical modelling; British Industrial Revolution
Endogenous Treatment Effects for Count Data Models with Sample Selection or Endogenous Participation
In this paper we propose a method to estimate models in which an endogenous dichotomous treatment affects a count outcome in the presence of either sample selection or endogenous participation using maximum simulated likelihood. We allow for the treatment to have an effect on both the sample selection or the participation rule and the main outcome. Applications of this model are frequent in many fields of economics, such as health, labor, and population economics. We show the performance of the model using data from Kenkel and Terza (2001), which investigates the effect of physician advice on the amount of alcohol consumption. Our estimates suggest that in these data (i) neglecting treatment endogeneity leads to a perversely signed effect of physician advice on drinking intensity, (ii) neglecting endogenous participation leads to an upward biased estimator of the treatment effect of physician advice on drinking intensity.
The problem of prediction is revisited with a view towards going beyond the typical nonparametric setting and reaching a fully model-free environment for predictive inference, i.e., point predictors and predictive intervals. A basic principle of model-free prediction is laid out based on the notion of transforming a given set-up into one that is easier to work with, namely i.i.d. or Gaussian. As an application, the problem of nonparametric regression is addressed in detail; the model-free predictors are worked out, and shown to be applicable under minimal assumptions. Interestingly, model-free prediction in regression is a totally automatic technique that does not necessitate the search for an optimal data transformation before model fitting. The resulting model-free predictive distributions and intervals are compared to their corresponding model-based analogs, and the use of cross-validation is extensively discussed. As an aside, improved prediction intervals in linear regression are also obtained.
In this paper we consider regression models with forecast feedback. Agents’ expectations are formed via the recursive estimation of the parameters in an auxiliary model. The learning scheme employed by the agents belongs to the class of stochastic approximation algorithms whose gain sequence is decreasing to zero. Our focus is on the estimation of the parameters in the resulting actual law of motion. For a special case we show that the ordinary least squares estimator is consistent.
Adaptive learning; forecast feedback; stochastic approximation; linear regression with stochastic regressors; consistency
Testing Homogeneity in Demand Systems Nonparametrically: Theory and Evidence
Berthold R. Haag (HypoVereinsbank)
Stefan Hoderlein (Boston College)
Sonya Mihaleva (Brown University)
Homogeneity of degree zero has often been rejected in empirical studies that employ parametric models. This paper proposes a test for homogeneity that does not depend on the correct specification of the functional form of the empirical model. The test statistic we propose is based on kernel regression and extends nonparametric specification tests to systems of equations with weakly dependent data. We discuss a number of practically important issues and further extensions. In particular, we focus on a novel bootstrap version of the test statistic. Moreover, we show that the same test also allows to assess the validity of functional form assumptions. When we apply the test to British household data, we find homogeneity generally well accepted. In contrast, we reject homogeneity with a standard almost ideal parametric demand system. Using our test for functional form we obtain however that it it precisely this functional for m assumption which is rejected. Our findings indicate that the rejections of homogeneity obtained thus far are due to misspecification of the functional form and not due to incorrectness of the homogeneity assumption.
Homogeneity, Nonparametric, Bootstrap, Specification Test, System of Equations
Recession Forecasting with Dynamic Probit Models under Real Time Conditions
Christian R. Proano (IMK at the Hans Boeckler Foundation)
In this paper a dynamic probit model for recession forecasing under pseudo-real time is set up using a large set of macroeconomic and financial monthly indicators for Germany. Using different initial sets of explanatory variables, alternative dynamic probit specifications are obtained through an automatized general-to-specific lag selection procedure, which are then pooled in order to decrease the volatility of the estimated recession probabilities and increase their forecasting accuracy. As it is shown in the paper, this procedure does not only feature good in-sample forecast statistics, but has also good out-of-sample performance, as pseudo-real time evaluation exercises show.
In this paper we propose a new method of single imputation, reconstruction, and estimation of non-reported, incorrect or excluded values both in the target and in the auxiliary variables where the first is on ratio or interval scale and the last are heterogeneous in measurement scale. Our technique is a variation of the popular nearest neighbor hot deck imputation (NNHDI) where « nearest » is defined in terms of a global distance obtained as a convex combination of the partial distance matrices computed for the various types of variables. In particular, we address the problem of proper weighting the partial distance matrices in order to reflect their significance, reliability and statistical adequacy. Performance of several weighting schemes is compared under a variety of settings in coordination with imputation of the least power mean. We have demonstrated, through analysis of simulated and actual data sets, the appropriat eness of this approach. Our main contribution has been to show that mixed data may optimally be combined to allow accurate reconstruction of missing values in the target variable even in the absence of some data in the other fields of the record.
hot-deck imputation, nearest neighbor, general distance coefficient, least power mean.
A Heap of Trouble? Accounting for Mismatch Bias in Retrospectively Collected Data on Smoking
When event data are retrospectively reported, more temporally distal events tend to get “heaped” on even multiples of reporting units. Heaping may introduce a type of attenuation bias because it causes researchers to mismatch time-varying right-hand side variables. We develop a model-based approach to estimate the extent of heaping in the data, and how it affects regression parameter estimates. We use smoking cessation data as a motivating example to describe our approach, but the method more generally facilitates the use of retrospective data from the multitude of cross-sectional and longitudinal studies worldwide that already have and potentially could collect event data.
In this paper we consider general rank minimization problems with rank appearing in either objective function or constraint. We first show that a class of matrix optimization problems can be solved as lower dimensional vector optimization problems. As a consequence, we establish that a class of rank minimization problems have closed form solutions. Using this result, we then propose penalty decomposition methods for general rank minimization problems in which each subproblem is solved by a block coordinate descend method. Under some suitable assumptions, we show that any accumulation point of the sequence generated by our method when applied to the rank constrained minimization problem is a stationary point of a nonlinear reformulation of the problem. Finally, we test the performance of our methods by applying them to matrix completion and nearest low-rank correlation matrix problems. The computational results demonstrate that our methods generally outperform the existing methods in terms of solution quality and/or speed.
Estimating Subjective Probabilities
Glenn W. Harrison
E. Elisabet RutstrÃ¶m
Subjective probabilities play a role in many economic decisions. There is a large theoretical literature on the elicitation of subjective probabilities, and an equally large empirical literature. However, there is a gulf between the two. The theoretical literature proposes a range of procedures that can be used to recover subjective probabilities, but stresses the need to make strong auxiliary assumptions or « calibrating adjustments » to elicited reports in order to recover the latent probability. With some notable exceptions, the empirical literature seems intent on either making those strong assumptions or ignoring the need for calibration. We illustrate how the joint estimation of risk attitudes and subjective probabilities using structural maximum likelihood methods can provide the calibration adjustments that theory calls for. This allows the observer to make inferences about the latent subjective probability, calibra ting for virtually any well-specified model of choice under uncertainty. We demonstrate our procedures with experiments in which we elicit subjective probabilities. We calibrate the estimates of subjective beliefs assuming that choices are made consistently with expected utility theory or rank-dependent utility theory. Inferred subjective probabilities are significantly different when calibrated according to either theory, thus showing the importance of undertaking such exercises. Our findings also have implications for the interpretation of probabilities inferred from prediction markets.
Estimation of Poverty Transition Matrices with Noisy Data
This paper investigates potential measurement error biases in estimated poverty transition matrices. Transition matrices based on survey expenditure data has been compared to transition matrices based on measurement-error-free simulated expenditure. The simulation model uses estimates that correct for measurement error in expenditure. This dynamic model needs error-free initial conditions that can not be derived from these estimates. [Working Paper No. 270]
The problem of modeling housing prices has attracted considerable attention due to its importance in terms of households’ wealth and in terms of public revenues through taxation. One of the main concerns raised in both the theoretical and the empirical literature is the existence of spatial association between prices that can be attributed, among others, to unobserved neighborhood effects. In this paper, a model of spatial association for housing markets is introduced. Spatial association is treated in the context of spatial heterogeneity, which is explicitly modeled in both a global and a local framework. The global form of heterogeneity is incorporated in a Hedonic Price Index model that encompasses a nonlinear function of the geographical coordinates of each dwelling. The local form of heterogeneity is subsequently modeled as a Finite Mixture Model for the residuals of the Hedonic Index. The identified mixtures are con sidered as the different spatial housing submarkets. The main advantage of the approach is that submarkets are recovered by the housing prices data compared to submarkets imposed by administrative or geographical criteria. The Finite Mixture Model is estimated using the Figueiredo and Jain (2002) approach due to its ability in endogenously identifying the number of the submarkets and its efficiency in computational terms that permits the consideration of large datasets. The different submarkets are subsequently identified using the Maximum Posterior Mode algorithm. The overall ability of the model to identify spatial heterogeneity is validated through a set of simulations. The model was applied to Los Angeles county housing prices data for the year 2002. The results suggests that the statistically identified number of submarkets, after taking into account the dwellings’ structural characteristics, are considerably fewer that the ones imposed either by geographical or adminis trative boundaries.
Value at Risk Computation in a Non-Stationary Setting
Dominique Guegan (CES – Centre d’économie de la Sorbonne – CNRS : UMR8174 – Université Panthéon-Sorbonne – Paris I, EEP-PSE – Ecole d’Économie de Paris – Paris School of Economics – Ecole d’Économie de Paris)
This chapter recalls the main tools useful to compute Value at Risk associated with a m-dimensional portfolio. Then, the limitations of the use of these tools is explained, as soon as non-stationarities are observed in time series. Indeed, specific behaviours observed by financial assets, like volatility, jumps, explosions, and pseudo-seasonalities, provoke non-stationarities which affect the distribution function of the portfolio. Thus, a new way for computing VaR is proposed which allows the potential non-invariance of the m-dimensional portfolio distribution function to be avoided.
Non-stationarity – Value-at-Risk – Dynamic copula –Meta-distribution – POT method.
An application of local linear regression with asymmetric kernels to regression discontinuity designs
To match the NBER business cycle features it is necessary to employ Gen- eralised dynamic categorical (GDC) models that impose certain phase re- strictions and permit multiple indexes. Theory suggests additional shape re- strictions in the form of monotonicity and boundedness of certain transition probabilities. Maximum likelihood and constraint weighted bootstrap esti- mators are developed to impose these restrictions. In the application these estimators generate improved estimates of how the probability of recession varies with the yield spread.
Extreme Value Theory as a Theoretical Background for Power Law Behavior
Power law behavior has been recognized to be a pervasive feature of many phenomena in natural and social sciences. While immense research efforts have been devoted to the analysis of behavioral mechanisms responsible for the ubiquity of power-law scaling, the strong theoretical foundation of power laws as a very general type of limiting behavior of large realizations of stochastic processes is less well known. In this chapter, we briefly present some of the key results of extreme value theory, which provide a statistical justification for the emergence of power laws as limiting behavior for extreme fluctuations. The remarkable generality of the theory allows to abstract from the details of the system under investigation, and therefore allows its application in many diverse fields. Moreover, this theory offers new powerful techniques for the estimation of the Pareto index, detailed in the second part of this chapter.