Response
Surface Analysis Using STATA
In my
published work, I have conducted response surface analyses using SYSTAT. However, SYSTAT is less popular than STATA,
and people who ask me questions about response surface methodology often use STATA
for their research. Moreover, STATA has
many useful features, and its syntax is relatively straightforward. This page provides guidelines for conducting
response surface analyses using STATA, focusing on the following quadratic
polynomial regression equation.
(1) Z = b0
+ b1X + b2Y + b3X2 + b4XY
+ b5Y2 + e
Eq. 1 can
be estimated using the regress command.
The syntax is:
regress Z X Y X2 XY Y2
where Z is the dependent variable, X and Y are scale-centered component
measures, and X2, XY, and Y2 are the three quadratic terms formed from x and
y. This syntax will produce coefficient
estimates and their associated F-tests as well as an F-test for the overall
model (i.e., the test of the R2 from the regression equation). Now, suppose we wanted to test the shape of
the surface corresponding to the quadratic regression equation along the Y = X
line. As explained elsewhere (Edwards,
2002; Edwards & Parry, 1993), the shape of the surface along this line can
be found by substituting Y = X into Eq. 1 and simplifying:
(2) Z = b0
+ b1X + b2X + b3X2 + b4XX
+ b5X2 + e
= b0 + (b1 + b2)X + (b3 + b4 + b5)X2
+ e
The
linear combinations of coefficients that precede X and
X2 can be tested with this postestimation
command:
test (X+Y=0) (X2+XY+Y2=0)
In this
syntax, X, Y, X2, XY, and Y2 refer to the coefficients on the corresponding variables,
i.e., b1, b2, b3, b4, and b5. The resulting test has two numerator degrees
of freedom and evaluates whether the shape of the surface along the Y = X line
has no slope or curvature, meaning the surface is flat along this line.
A
similar approach can be used to test the shape of the surface along the Y = -X
line. The shape of the surface along
this line can be derived by substituting Y = -X into Eq. 1 and simplifying:
(3) Z = b0
+ b1X - b2X + b3X2 - b4XX
+ b5X2 + e
= b0 + (b1 - b2)X + (b3 - b4 + b5)X2
+ e
The
corresponding postestimation command is:
test (X-Y=0) (X2-XY+Y2=0)
The
procedures described above are appropriate only when testing linear
combinations of coefficients. To test
nonlinear combinations of coefficients, such as those involved in testing the
locations of the stationary point and principal axes, the bootstrap should be
used (Edwards, 2002), which can be implemented using the bootstrap command in
STATA. To illustrate, consider the
following syntax, which applies to the quadratic equation for travel shown in
Table 11.3 of Edwards (2002):
set seed 54321
bootstrap, rep(10000) saving(bootcoef): reg Z X Y X2 XI Y2
The
first line sets the seed used to draw the bootstrap samples at 54321. By setting a seed, the same bootstrap samples
will be drawn each time you run the analysis, such that you can repeat your
analysis at a later time. You can choose
any number for the seed. In the second
line, rep(10000) specifies that 10,000 bootstrap
samples should be drawn, and saving(bootcoef) saves
the bootstrap coefficients in a STATA data file named bootcoef.dta.
To
illustrate the application of the bootstrap, I applied the preceding syntax to
the data corresponding to the example in Table 11.3 of Edwards (2002) in which
X and Y are scale-centered measures of actual and desired travel and Z is
anticipated job satisfaction. The syntax
is as follows:
set seed 54321
bootstrap, rep(10000) saving(tvlboot): reg sat tvlca tvlcd
tvlca2 tvlcad tvlcd2
In this
syntax, sat is anticipated job satisfaction, tvlca
and tvlcd are scale-centered measures of actual and
desired travel, tvlca2 and tvlcd2 are the squares of actual and desired travel,
and tvlcad is the product of actual and desired
travel.
After
these commands are executed, the file tvlboot.dta
will contain 10,000 sets of coefficients for the quadratic equation. The coefficients will appear as rows of data
in the file, which can be read as a data file by STATA. The data file that was generated when I
executed these commands is here, and coefficients are
saved as an Excel file here. These coefficients can be used to construct
confidence intervals for nonlinear combinations of regression coefficients
involved in response surface analysis.
This procedure is demonstrated by the Excel file given here, which explains the structure of the file in the
READ ME tab and uses the coefficients from tvlboot.dta
in the STATA tab.
You can
adapt this syntax and the associated files for your own purposes. Before you proceed, I strongly
recommended that you read the sources cited in the READ ME tab of the Excel
file and carefully study the structure of the file as described in the READ ME
tab. Taking these steps should help you
become self-sufficient with this procedure.
Edwards, J. R. (2002). Alternatives to difference scores: Polynomial regression analysis and response surface methodology. In F. Drasgow & N. W. Schmitt (Eds.), Advances in measurement and data analysis (pp. 350-400). San Francisco: Jossey-Bass.
Edwards, J. R., & Parry, M. E. (1993). On the use of polynomial regression equations as an alternative to difference scores in organizational research. Academy of Management Journal, 36, 1577-1613.