JEL Classification: C1; C51.
1. Introduction
The existence of nonlinear relationships among economic variables is always controversial. This issue becomes more complicated when applied researchers find nonlinearities that the economic theory does not predict or estimate linear models that do not properly fit the data. Here, we argue that, from a statistical point of view, we can avoid controversies by using conditional probability densities
In other words, we show how to use a statistical procedure to derive proper econometric models that capture genuine nonlinear relationships. To do so, we assume suitable conditional probability distributions that give rise to nonlinear regression functions (Spanos, 1986). We exemplify this approach by deriving a population regression model that can be useful to analyze a nonlinear relationship between economic variables. Specifically, we use a Gumbel conditional distribution as the basis to derive the nonlinear regression curves that might describe such type of phenomena. The Gumbel regression model allows us to briefly illustrate two interesting facts. First, we show how a nonlinear relationship might be well represented by an exponential distribution and its associated regression model (Gumbel 1960); which has a nonlinear conditional mean and a heteroskedastic conditional variance. Second, we show that a non linear model like the Gumbel regression exhibits changing partial effects of the explanatory variables over the entire distribution of the explained variable, which is not the case in a normal-linear model. These facts might be useful to elucidate controversial economic arguments when the empirical data exhibit nonlinearities.
This paper is structured as follows. The second section briefly discusses the general statistical approach to derive linear or non-linear econometric models in a stochastic setting. In the third section, we exemplify the statistical approach by discussing the specification, estimation, and validation of the Gumbel regression model. In the last section, we make some remarks on the implications of the employed approach.
2. Deriving Nonlinear Regression Models
In the context of a modern approach to econometrics any linear or nonlinear model can be specified by making assumptions on two components: 1) the population regression model, and 2) the sampling model (Wooldrige, 2010). The first assumption refers to the functional form of the conditional mean that describes the stochastic relationship between y and x. The second assumption refers to the probabilistic behavior of the sample. Here, we only deal with the derivation of the population regression model that gives rise to the nonlinear relationships among a set of economic variables so, for the sake of simplicity, we assume that we have an independent and identically distributed sample (iid) in the rest of the paper.
Let us define the population regression model by assuming that any stochastic variable can be decomposed, by definition, into two parts: a conditional expectation
Equation (2) implies that the unconditional error is a random variable with zero mean and is not correlated with each of the explanatory variables and any functions of them. It is worth mentioning that these two equations also imply a set of testable statistical assumptions, while working with real data.
Under the assumption of a random sample, equation (1) implies that the applied econometrician needs to propose a specific functional form for the conditional mean
In order to fill this gap we propose to assume a conditional distribution, based on the empirical distribution of the data on hand, and derive its regression function; rather than assuming an arbitrary functional form of the conditional mean in equation (1), (Spanos, 1986). To illustrate the workings of this approach, we first derive the typical normal-linear econometric model, not only based on equations (1) and (2), but also on a normal conditional density that gives rise to the conditional mean that defines equation (1). So, let us assume that the data is described by a conditional normal distribution of y given x and that the variances of the involved normal random variables y and x are constant
Then, we proceed to compute the conditional expectation of y given x based on such conditional density function
which gives rise to the following genuine linear conditional mean:
with the following statistical parameterization
Once we have mathematically obtained the conditional mean in equation (5), we are in a position to specify the normal-linear-homoskedastic regression model by using equations (1) and (2)
where e is a Normal, independent, and identically Distributed (Niid) error.
Sometimes, we are interested in a particular function called partial effect (i.e., marginal effect) that shows the response of the conditional mean to a unit change in one of the explanatory variables. Equation (8) below shows that, in the normal-linear regression model, the effect of x on y is constant
Note that, when the probabilistic features of the data are not compatible with the normality assumption, the conditional mean will not always be a linear function. This implies that the partial effect, in equation (8), will not necessarily be a constant in models with different distributive assumptions. Thus, an advantage of assuming conditional densities, rather than the functional forms of the mean, is that we can use densities with genuine nonlinear means; which we can choose by assessing the empirical features of the data on hand.
In the next section we show how we can use this setup to derive other econometric model by incorporating different conditional distributions, where the mean or average causal effect of the explanatory variable on the explained variable will not be linear. Specifically, we change the assumption of normality not only for the partial densities of y and x, but also for the conditional density; so that we are able to obtain a valid nonlinear population regression model.
3. The Gumbel Linear Regression Model
Here, we exemplify the workings of the previous approach by specifying the Gumbel regression model, although we can easily use other conditional densities to specify other population regression models. To do this, we first assume that a Gumbel joint density is a good representation of the joint stochastic behavior of y and x. Then, we derive the conditional density of y given x and derive its nonlinear conditional expectation by integration. Finally, we embed our derived conditional expectation in our econometric setup, given by equations (1) and (2), to end up with a proper nonlinear econometric model with heterogeneous partial effects. In what follows we describe such statistical procedure step by step.
A. Observational Data and Model Specification
A preliminary step to specify a proper conditional model that accounts for nonlinear relationships, among a set of random variables, is to discuss the statistical properties of such variables. That is, in selecting a proper econometric model we should take into account not only theoretical issues, but also all the statistical systematic information in the data (Spanos 1986). In fact, a brief analysis, of different types of graphs, might reveal the empirical distribution that could be a good assumption for the data on hand. Kernel estimates of the univariate empirical densities (Silverman, 1998) can also be useful to assess departures from the normality assumption. We can get more information about the underlying joint density of the data by looking at the kernel estimate of the empirical joint distribution and the probability contour plot with the potential empirical regression curves. In other words, we can anticipate the presence of a potential nonlinear conditional distribution and its associated regression function by using a set of graphical tools.
B. Model Specification
Let us assume that a good empirical representation of the distribution that governs the joint behavior of the data on hand is a Gumbel distribution (Gumbel, 1960: Castillo, 2005). Thus, in what follows we can specify the Gumbel regression model that implies a nonlinear regression curve with non constant partial effects (Kotz, et al. 2000).
We start with the bivariate Gumbel distribution function, which is defined for positive values of the involved random variables:
where δ is the parameter that describes the probabilistic dependence between the two random variables y and x, which is limited to take values between 0 and 1. The joint probability density function of the Gumbel model can be derived by differentiating equation (9)
Then, the conditional density function of y given x can be derived by dividing the joint density by the marginal density of x
The conditional expectation of y given x is given by:
Similarly, we could also get the conditional expectation of x given y,
It is worth mentioning that the distribution
for
Note that the analytical form of the marginal distribution in equation (17) includes the so- called error function:
Equation (13) suggests that the mean or average causal effect of the explanatory variable on the dependent variable is not linear. Moreover, the conditional variance in equation (16) is heteroskedastic. Besides, the negative marginal effect in equation (17) is heterogeneous and decreasing. Therefore, we can see that the model in equation (13) is completely different to the model in equation (6), since the Gumbel regression model does not imply a constant effect of the explanatory variable over the entire density of the dependent variable. The economic meaning of a negative nonlinear relationship, in this context, is that the mean or average causal effect of x on y is negative and decreasing. That is, the average value of y does not change at a constant rate as x changes, which means that we can have heterogeneous partial effects (changing partial effects). Even more important is the fact that the suggested model implies that there might be a decreasing tradeoff between y and x, which s clearly associated to the nonlinear nature of the Gumbel model.
C. Estimation Method
In the previous section we propose a specification of a non linear and heteroskedatic regression model derived from an Gumbel distribution. Now, we need to estimate the value of the dependence parameter
As we discuss above, the Gumbel regression model takes the form:
Where e
From these equations, we can see that the conditional mean of the Gumbel model is not linear in its conditioning variable x and unknown parameter
On the other hand, a moment estimator of δ (Hosking, 1985) can also be obtained as the solution of the following equation:
Thus, the δ parameter has a close relationship with the classical correlation coefficient that stands as:
Where E is the well-known exponential integral. So, the correlation parameter is given by:
When
When δ = 1, the association parameter ρ is equal to -0.4036 and reaches its lower limit. So, this model is only suitable for representing a joint density of two correlated Gumbel distributed variables whose correlation parameter takes values in
D) Misspecification Tests
In order to ensure the statistical validity of our model assumptions in relation to the real data, we can define some potential misspecification tests for the Gumbel regression model. The set of tests we discuss will allow us to ensure that there are no departures from the underlying assumptions of the Gumbel model while working with real data (Spanos, 2006; Wang, 2005).
The potential misspecification tests that can be applied to the regression model can be based on the following F type tests:
a) Additional non-linearity in the conditional mean
To test for the presence of additional non-linearities in the conditional mean we can test if
Where ŷ is a vector of the Gumbel model fitted values. Furthermore, we can also expect that
b)Trend in conditional mean
To test for the presence of additional non-linearities, like a linear trend in the conditional mean, we can test if
Where ŷ are the Gumbel model fitted values. Furthermore, we can also expect that
4. Concluding Remarks
Here, we propose to use conditional probability densities as the basis to derive nonlinear the conditional means that give rise to reliable econometric models, rather than assuming the standard functional forms of such mean suggested in the econometrics textbook. We show a procedure to derive econometric models that capture genuine nonlinear relationships by using empirical suitable conditional probability distributions that give rise to different regression functions. We illustrate this approach by deriving a regression model that might be useful to analyze a nonlinear relationship. Specifically, we use a Gumbel conditional distribution as the basis to derive the nonlinear regression curve that can be suitable to analyze highly volatile economic data. We show that a non linear model like the Gumbel regression exhibits changing partial effects of the explanatory variables over the entire distribution of the explained variable, which is not the case for a normal-linear model.