You have seen the response function contours through the stationary point, and you have seen the 3D response function visualizations. You may recall that I have two competing models, one is derived from the experimental data design for the response surface, and the other is all that data and the screening results too. Is either model any good? Which one is better?
First, a quick review of the model:
q = b_{0}T^{2} + b_{1}t^{2} + b_{2}r^{2} + b_{3}Tt + b_{4}tr + b_{5}Tr + b_{6}T + b_{7}t + b_{8}r + b_{9 }
Fitting produces numbers for all those coefficients, b_{0}, b_{1}, etc. Look at the goodness of individual coefficients in the following table. The more asterisks in the significance column, the better the coefficient corresponds to the data. Notice that the “all data” model has an overall terrible goodness, with the residual standard error of 0.7, compared to the residual error of 0.358 for the RSO data only.
All Data 
RSO Data Only 

Std Error 
0.749 
0.358 

Estimate  Std. Error  Signif.  Estimate  Std. Error  Signif.  
(Intercept) 
3.344 
0.3963 
*** 
3.6667 
0.2064 
*** 
temp 
0.355 
0.2369 
0.4375 
0.1264 
* 

time 
0.235 
0.2369 
0.0375 
0.1264 

cwrat 
0.4 
0.2369 
0.275 
0.1264 
. 

temp:time 
0.5 
0.2901 
0.75 
0.1788 
** 

temp:cwrat 
0.02 
0.2901 
0.375 
0.1788 
. 

time:cwrat 
0.09 
0.2901 
0.075 
0.1788 

temp^2 
0.401 
0.3592 
0.6833 
0.1861 
* 

time^2 
0.201 
0.3592 
0.4833 
0.1861 
* 

cwrat^2 
0.174 
0.3592 
0.1083 
0.1861 

Significance codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ‘ 1  
The two graphs below show the quantile normal plots for the residuals from each model. The alldata model looks more systematically erroneous than the RSM data only. The systematic error suggests that the choice of model is not particularly good. Furthermore, it suggests that the estimates of Fstatistic should be considered hesitantly, as Fstatistics are calculated by assuming normally distributed independent errors.
With hesitation then, let’s examine the Fstatistic for these models. The RSO data model produces an Fstatistic of 6.4, well above the Box, Hunter, and Hunter criterion that the Fstatistic exceeds 4. On the other hand, the all data model’s Fstatistic is a miserly value of 1.
All Data  RSO Data Only  
Degrees of Freedom  9  5 
Residual Standard Error  0.749  0.358 
Fstatistic  1.06  6.43 
In conclusion, I get a better fit—to the indicated model—using only some of the data. This is, frankly, an indictment of the model. Anyone can carefully select data to produce a good fit to a model, and the fact that I have done so may be considered a round condemnation of my objectivity. I may reconsider the model, and attempt to drop the least important terms while adding either a threeway interaction or twoway involving squares. Perhaps more creative math will be required.