Why path analysis




















Path analysis was slow to catch on in the world of biology, but in the second half of the 20th century found an avid following among social scientists and economists. Social and life course epidemiologists subsequently adopted the method as an effective way to distinguish direct from indirect effects and to test the strength of hypothesized patterns of causal relationships.

Path analysis is based on a closed system of nested relationships among variables that are represented statistically by a series of structured linear regression equations.

As such, path analysis is bound by the same set of assumptions as linear regression, as well as some additional restrictions that describe the allowable pattern of relations among variables. Variables are either exogenous, meaning their variance is not dependent on any other variable in the model, or endogenous, meaning their variance is determined by other variables in the model.

Exogenous variables may or may not be correlated with other exogenous variables. The pattern of relationships among variables is described by a path diagram, a type of directed graph. Variables are linked by straight arrows that indicate the directions of the causal relationships between them. Straight arrows may only point in one direction, as it is assumed that a variable cannot be both a cause and an effect of another variable; i.

Curved, double-headed arrows indicate correlation between exogenous variables. In addition to the arrows between variables in the model, there are arrows pointing toward each endogenous variable from points outside the model, indicating variance contributed by error and any unmeasured variables. In figure The structural equation that would describe the relationship between variables 1 and 3 is:.

A note on notation: the first number in the path or standardized beta coefficient subscript represents the dependent variable the head of the arrow and the second number represents the independent variable the tail of the arrow in a causal relationship.

The total variance along any particular pathway equals the product of the variance along the different segments of that pathway. Similarly, the structural equation that would describe the relationship between variables 2 and 3 is:. This is called a just-identified model. Once the path and correlation coefficients have been filled in, the utility of path analysis become clear. Path analysis is always theory-driven; the same data can describe many different causal patterns, so it is essential to have an a priori idea of the causal relationships among the variables under consideration.

That being said, path analysis can be used to refine a causal hypothesis. If, for example, a path coefficient is very small and the standardized beta is not statistically significant, it may make sense to eliminate that pathway.

Failure to reject the null hypothesis indicates that the trimmed model still fits the data. Path coefficients come from a series of multiple regressions rather than from just 1 regression. Or, if you like, regression is the simplest form of path analysis, where we have 1 DV and k IVs, all of which are freely intercorrelated, so that no relations among the IVs are analyzed.

C In the correlated cause model A , part of the correlation between 1 and 3 is due to the direct effect of 1 on 3 through p Part of the correlation will be due to the correlation of 1 with 2, because 2 also affects 3, that is, r 12 p However, we will leave that part unanalyzed because 1 and 2 are exogenous, and therefore the correlation between them is unanalyzed.

In the mediated model B , only variable 1 is exogenous. We can now decompose all the correlations into direct and indirect effects. In this model, 1 affects 3 directly p 31 but also indirectly through 2 p 21 and p The correlation between 1 and 3 can be composed into two parts: direct effects and indirect effects.

Some people call the sum of direct and indirect effects the total effect. Now in model B, there will be a correlation between 2 and 3 r This correlation will reflect the direct effect of 2 on 3 p But it will also reflect the influence of variable 1 on both. If a third variable causes the correlation between two variables, their relation is said to be spurious e.

If the path from 2 to 3 were zero, the entire correlation between 2 and 3 would be spurious because all of it would be due to variable 1. However, in the current example, only part of the correlation between 2 and 3 is spurious. The spurious part is r 23 -p 32 or p 31 p In model C, the two IVs are independent. In such a case, the path coefficient is equal to the observed correlation. Because r 12 is due to a single path that indicates a direct effect, r 12 is composed solely of DE, a direct effect.

What is the point of this decomposition? The point is to better understand the correlations that we observe. How much is due to direct effects, indirect effects and third variables? It may help us to better understand theoretical processes, to gain leverage in the business of change, etc. The paths from 1 and 2 to 3 are betas from the regression of 3 on 1 and 2. The beta weights are 0 and. Because the correlations are decomposed into the 4 kinds of effects, we can build up correlations from path models.

For example, for Model. Notice that the path diagram implies a set of equations that allows us to estimate each of the paths. But also notice new concept that the path diagram implies a set of equations that would let us estimate a correlation matrix in the absence of data if we knew the path coefficients. In the case of the path diagram we just drew, the correlations are. Our dependent variable is 3. Our theory says that 3 is strongly predicted by the IVs. Further, most of the effects of variable 1 are explained through the mediating effects of 2.

Our predicted correlations are:. Notice that we can now collect data, compute a correlation matrix, and compare it to what we predicted based on our theory. This is to some of us, at least enormously exiting because we can make quantitative, point predictions and then compare them to actual data.

This is analogous to cross-validation. In cross validation we predict values of Y given a previously estimated set of regression coefficients and then compare the predicted values to the actual values. In path analysis, we can generate values of correlations based on a theory and then compare them to actual values.

We could actually generate an R-square based on predicted and actual values of r in the off-diagonal matrix. If our predicted and actual values were:. Predicted R Actual R 1. Suppose we collected data, computed the correlation matrix, and then found the matrix shown under Actual R.

As you can see, the correspondence is not very close. To compute r , the correlation between off-diagonal entries, we could find:. If we compute the correlation between these two columns, we find it to be -. However, such an r is not the customary means of evaluating predicted correlations against observed correlations. The problem with such a method of evaluation is that it takes no account of differences in means between the predicted and actual correlations.

While path analysis is useful for evaluating causal hypotheses, this method cannot determine the direction of causality. It clarifies correlation and indicates the strength of a causal hypothesis, but does not prove direction of causation.

In order to fully understand the direction of causality, researchers can consider conducting experimental studies in which participants are randomly assigned to a treatment and control group. Updated by Nicki Lisa Cole, Ph. Actively scan device characteristics for identification.

Use precise geolocation data. Select personalised content. Create a personalised content profile. Measure ad performance. Select basic ads. Create a personalised ads profile. Select personalised ads. Apply market research to generate audience insights.

Measure content performance. Develop and improve products. List of Partners vendors. Share Flipboard Email. By Ashley Crossman. Updated March 28,



0コメント

  • 1000 / 1000