Response Surface Analysis Using STATA

In my published work, I have conducted response surface analyses using SYSTAT.  However, SYSTAT is less popular than STATA, and people who ask me questions about response surface methodology often use STATA for their research.  Moreover, STATA has many useful features, and its syntax is relatively straightforward.  This page provides guidelines for conducting response surface analyses using STATA, focusing on the following quadratic polynomial regression equation.

(1)      Z = b0 + b1X + b2Y + b3X2 + b4XY + b5Y2 + e

Eq. 1 can be estimated using the regress command.  The syntax is:

regress  Z X Y X2 XY Y2

where Z is the dependent variable, X and Y are scale-centered component measures, and X2, XY, and Y2 are the three quadratic terms formed from x and y.  This syntax will produce coefficient estimates and their associated F-tests as well as an F-test for the overall model (i.e., the test of the R2 from the regression equation).  Now, suppose we wanted to test the shape of the surface corresponding to the quadratic regression equation along the Y = X line.  As explained elsewhere (Edwards, 2002; Edwards & Parry, 1993), the shape of the surface along this line can be found by substituting Y = X into Eq. 1 and simplifying:

(2)     Z = b0 + b1X + b2X + b3X2 + b4XX + b5X2 + e

= b0 + (b1 + b2)X + (b3 + b4 + b5)X2 + e

The linear combinations of coefficients that precede X and X2 can be tested with this postestimation command:

test (X+Y=0) (X2+XY+Y2=0)

In this syntax, X, Y, X2, XY, and Y2 refer to the coefficients on the corresponding variables, i.e., b1, b2, b3, b4, and b5.  The resulting test has two numerator degrees of freedom and evaluates whether the shape of the surface along the Y = X line has no slope or curvature, meaning the surface is flat along this line.

A similar approach can be used to test the shape of the surface along the Y = -X line.  The shape of the surface along this line can be derived by substituting Y = -X into Eq. 1 and simplifying:

(3)     Z = b0 + b1X - b2X + b3X2 - b4XX + b5X2 + e

= b0 + (b1 - b2)X + (b3 - b4 + b5)X2 + e

The corresponding postestimation command is:

test (X-Y=0) (X2-XY+Y2=0)

The procedures described above are appropriate only when testing linear combinations of coefficients.  To test nonlinear combinations of coefficients, such as those involved in testing the locations of the stationary point and principal axes, the bootstrap should be used (Edwards, 2002), which can be implemented using the bootstrap command in STATA.  To illustrate, consider the following syntax, which applies to the quadratic equation for travel shown in Table 11.3 of Edwards (2002):

set seed 54321

bootstrap, rep(10000) saving(bootcoef): reg Z X Y X2 XI Y2

The first line sets the seed used to draw the bootstrap samples at 54321.  By setting a seed, the same bootstrap samples will be drawn each time you run the analysis, such that you can repeat your analysis at a later time.  You can choose any number for the seed.  In the second line, rep(10000) specifies that 10,000 bootstrap samples should be drawn, and saving(bootcoef) saves the bootstrap coefficients in a STATA data file named bootcoef.dta.

To illustrate the application of the bootstrap, I applied the preceding syntax to the data corresponding to the example in Table 11.3 of Edwards (2002) in which X and Y are scale-centered measures of actual and desired travel and Z is anticipated job satisfaction.  The syntax is as follows:

set seed 54321

bootstrap, rep(10000) saving(tvlboot): reg sat tvlca tvlcd tvlca2 tvlcad tvlcd2

In this syntax, sat is anticipated job satisfaction, tvlca and tvlcd are scale-centered measures of actual and desired travel, tvlca2 and tvlcd2 are the squares of actual and desired travel, and tvlcad is the product of actual and desired travel.

After these commands are executed, the file tvlboot.dta will contain 10,000 sets of coefficients for the quadratic equation.  The coefficients will appear as rows of data in the file, which can be read as a data file by STATA.  The data file that was generated when I executed these commands is here, and coefficients are saved as an Excel file here.  These coefficients can be used to construct confidence intervals for nonlinear combinations of regression coefficients involved in response surface analysis.  This procedure is demonstrated by the Excel file given here, which explains the structure of the file in the READ ME tab and uses the coefficients from tvlboot.dta in the STATA tab.

You can adapt this syntax and the associated files for your own purposes.  Before you proceed, I strongly recommended that you read the sources cited in the READ ME tab of the Excel file and carefully study the structure of the file as described in the READ ME tab.  Taking these steps should help you become self-sufficient with this procedure.

Edwards, J. R.  (2002).  Alternatives to difference scores: Polynomial regression analysis and response surface methodology. In F. Drasgow & N. W. Schmitt (Eds.), Advances in measurement and data analysis (pp. 350-400).  San Francisco: Jossey-Bass.

Edwards, J. R., & Parry, M. E.  (1993).  On the use of polynomial regression equations as an alternative to difference scores in organizational research. Academy of Management Journal, 36, 1577-1613.