Design of Experiments with Minitab
The following is an excerpt from Dewar - Characterization and Evaluation of Aged 20Cr32Ni1Nb Stainless Steels, and references a stainless steel used in this thesis. The content of this article can also be applied to a general case, and should not be limited to to the presented example.
In a factorial experiment the goal is to investigate all possible combinations for all of the factors, and levels of factors in the experiment. For compositional analysis of a 2032Nb alloy, seven elements, or factors need to be considered as well as their given compositional ranges listed in ASTM A351, grade CT15C, and displayed in Table 1. For this experiment three levels are chosen for each factor, a low, mid, and high composition. The low and high level are defined as the compositional range limits in Table 1, and the mid level is the median value of these ranges. To determine which individual factors are the most significant, called the main effects, and which interactions are the most significant, an Analysis of variance (ANOVA) test much be carried out on the factorial design. “Factorial experiments are the only way to discover interactions between variables” , which is very important for alloy composition optimization as interactions between components has been proven to be vital to the resulting microstructure (e.g. Nb∕(C + 6∕7N) = 7.7).
The most common factorial design used is the 2k factorial design, where for each factor only two levels (low/high) are considered. However, this assumes that the response variable, or phase fraction, and the individual compositional elements have an approximately linear relationship, which cannot be assumed in the present study. To compensate for non-linearity, a third level was added at the median value for each range of factor level. 3k experiments can be quite large, and for most instances where the data is accumulated manually this option is too cumbersome for most experiments. Several alternatives to a 3k factorial experiment have been developed, such as adding a center point to a 2k design, or blocking a 3k design. Adding a center point to a 2k design is used to test a curvature hypothesis to see if the main factor or interaction has significant curvature. More information about this can be found in Montgomery and Runger,2006 . Blocking is typically used to split up and group certain treatments if some of the experiments were not all performed under homogenous conditions. Blocking could also be used to reduce a 3k experiment down to multiple 2k experiments, were low/high, low/med, and med/high levels are run independently.
A 33 factorial matrix illustrated in Figure 1 visually represent a 33 model where each treatment of the design can be calculated out from the formula:
where μ is the contribution from the overall mean effect, τ, β, ψ and the effects from the three individual factors, plus their interactions, and ϵ is the random component error . i,j, and k represent the various levels of each factor (high, medium, and low). In Figure 1 each node of the cube lattice represents the response variable of the associated treatment. For example the node in the center of the cube (Nb1Si1C1), represents the the response, or the phase fraction of a specific phase, when the composition of the alloy is 1wt%Nb, 1wt%Si, and 0.1wt%C. To test whether any of the single elements (main effects), or any of the interactions between the components significantly effect changes in the response variable, a set hypotheses will need to be proven by the resulting ANOVA model:
- H0 : τi = 0∀i (No Main effect of factor A)
H1 : at least one τi≠0
- H0 : βj = 0∀j (No Main effect of factor B)
H1 : at least one βi≠0
- H0 : ψj = 0∀k (No Main effect of factor C)
H1 : at least one ψi≠0
- H0 : τβij = 0∀i,j (No two-way interaction)
H1 : at least one τβij≠0
- H0 : τβψijk = 0∀i,j,k (No three-way interaction)
H1 : at least one τβψij≠0
If any of the null hypotheses are rejected then that factor, or interaction has a significant effect on the response variable. To reject or accept a hypothesis the total sum of squares must first be decomposed, SST = SSA + SSB + SSC + SSAB + SSAC + SSBC + SSE, where SSE represents the error term. The sum of squares for any effect or interaction is calculated as,
where the contrast is calculated from the effect,
where n is the number of replicates, and k is the number of factors. The contrast of the main-effects can be solved for by:
Where Y is the effect of a single factor, ci is the coefficient of factor y at level i (eg. Nb1 = 0.5, Nb2 = 1.0), and yijk is the treatment for levels i = 0,1,2, j = 0,1,2, and k = 0,1,2 (eg. phase fraction where NbiSijCk, Figure 1). Eq. 34 is visually represented in Figure 2a. Figure 2b, and Figure 2c represent contrast of two-factor, and three-factor interactions, assuming a linear relationship between the levels and the response variable. Solving for the contrasts in a two-factor, or three-factor interaction in a 3k design is much more complicated, and is outside of the scope of this work. However, more information on 3k formulations can be read from Spliid, 2002 .
Mean squares for each main effect, and interactions, are calculated as:
where x is the factor or interaction, and a = 1 if a main effect, a = 2 if a two-factor interaction ,and a = 3 if a three-factor interaction. Where finally an F-distribution test can be conducted to reject of accept the null hypothesis, where:
if f0 > fα,ν1,ν2, where α is the level of significance (in this case it is 0.05, or 95%), and ν1, and ν2 are the degrees of freedom of x and ϵijk, the null hypothesis is rejected and factor x is significant.
After all of the significant factors, and interactions have been identified in the factorial design, the data can be fit to a regression model
Where Y is the response variable, xk are the independent variables, or factors, βk are the unknown parameter
coefficients, and ϵ is the error term. Determining the β coefficients will allow for us to describe the relationship
between independent variables, and the response variable (dependent variable) through an approximate
function. Since there are multiple phases that compose the 2032Nb system, a set of regressions functions will
need to be determined to appropriately describe the system. After these functions have been approximated, a
model for optimizing the system can be drawn, either through statistical or linear programming methods.
Using linear regression models to fit the factorial design data, assumes that the data fits linearly with the response variable. This assumption neglects any significant interactions that were uncovered in the factorial design, as they would be regarded as polynomial terms. However, a linear model may still be used by approximating these polynomial terms as new variables. For example if the significant terms in the regression model were Si,Nb, and Si × Nb, the regression function would be Y = β0 + β1Si + β2Nb + β3NbSi + ϵ. If we let x3 = NbSi this modifies the equation from containing two independent variables to incorporating three independent variables, and can now be considered a linear function.
The linear coefficient can be solved for using the Least Squares Method described in the next section. The regression function should then be tested for significance, and how well if fits with the original data. The easiest way to compare the fit of the approximated function is to calculate the coefficient of multiple determination , or R2 value. R2 is calculated as
The R2 value will be a fraction of how much of the model accounts for the variability in the original data. For example if R2 = 0.95, 95% of the variability of the response data is accounted for in the regression model. There are some short-comings of the R2 value as it keeps improving as more terms are added to the model. This is compensated for in the adjusted R2 value, but for the purposes of this study only the regular R2 value will be reported.
3K ANOVA, and regression is performed on the compiled data, using Minitab software. This will provide statistical data, and confidence as to what elements, or interactions of elements play a significant role in the precipitation of a phase. Regression modeling will be used to provide a mathematical basis for explaining the optimization of an alloy based on equilibrium microstructure for a matrix of alloy compositions. The visual representation of the data described in section 9 provides a quick, and easily comprehensible analysis of how single elements affect the stability of a phase. From these graphs it is hard to distinguish, with a certain degree of confidence, how significant each of these factors is, or if they are even significant at all. ANOVA and regression can provide this kind of insight, and certainty that analyzing graphs cannot give. After regression has been performed on each individual phase, linear programming or other optimization techniques (response surfaces, and method of steepest accent ) can be employed to characterize the best alloy that meets certain microstructure conditions (i.e. min G-phase, maximum solubility of NbC). The following sections will describe the procedure in how to use Minitab to output the ANOVA, and regression data.
If a 3K ANOVA is computed with the imported data it is most likely to error, as some of the experiments during the ThermoCalc simulation error, and do not complete. To test this we can check the rows in the compiled excel sheet and compare them to the total number of expected combinations, in this case 37 = 2187, whereas only 2159 row of data were compiled. Moreover, Rank deficiency during ANOVA can occur with empty data, meaning matrix calculations cannot be performed . Lastly, since not all phases are present in each chemistry, running an ANOVA on the 3K experiment can also throw an error. An alternative would be to perform three 2K ANOVAs (low/high levels, low/medium levels, medium/high levels), determine the significant effects, and then perform regression with the significant effects. Setting a 95% confidence interval, if P - V alue < α the null hypothesis can be rejected (As Described in section 10.1), and the current factor, or interaction of factors can be said to play a significant role in the precipitation of the phase, i.
- Once Minitab is open, import text data by clicking File → OpenWorksheet, then selecting either the text, csv, or excel options in the file of type section box, and then selecting one of the files with the compiled data used to make the matrix plots.
- select Stat → DOE → Factorial → AnalyzeFactorialDesign
- In the Define Custom Factorial Design window select all the elements as factors. Leave the selection as a 2-level factorial.
- Click the Low/High button to change the low and high values that will be used in the 2K ANOVA. These will need to be changed depending on which set of values you are dealing with (i.e. low/high, low/medium, medium/high). Once finished Select OK, and OK again.
- For the responses Choose the Y-axis columns of the data.
- Click the graph button select the Histogram, and Residuals Versus Fits options. In the storage options choose Fits, Residuals, effects, coefficients, and factorial. Select OK
- Once the ANOVA has finished, select the Show Sessions Folder button shown in Figure 3.
- For each section that needs to be saved right click the section in the project manager and choose Append
- Select the Show ReportPad button shown in Figure 5.
- In the projects panel right-click the ReportPad file and click Save Report As... button. This
file can now be opened in an editing program like word, or excel for further processing and
- For the Low/Medium, and Medium/High ANOVA sets perform steps 1-10 again, this time changing the Low/High values to represent the current interval of values.
- Once all the reports have been exported open them up, or copy the data into an excel document. Select the first column for each data block and select the Test to Columns button in the Data tab. Highlighting all the data in the block click Filter button, and then order the P column from smallest to largest. Any terms with a P-value greater than 0.05 are not significant and can be discarded. Also look at the significance of the Interactions (2-Way Interactions, 3-Way interactions), where if they are>0.05, Any of these effects are most likely not significant.
- For each report (Low/High, Low/Medium, and Medium/High) compare the top 10 common significant factors for each phase (or factors where P-Value = 0), and make note of them (i.e. Highlight the cells). For the regression these common factors will be included for each phase.
- Significant interactions of elements determined by ANOVA will need to be added as columns in the data. To multiply columns together first right click where you want the column, and choose Insert Columns, and name the column appropriately. Next, choose Calc → Calculator Choose the newly created column as the column to store the data. Input the value of that column, for example for a Nb×C interaction type Nb*C. Repeat this for the remaining significant interactions.
- For the regression output select Stat → Regression → Regression from the main menu.
- Choose your response variable, note you can only choose one variable at a time, so this procedure will need to be repeated for all of the response variables.
- In the Graphs section again choose the Histogram, and the Normal plot options. In the storage option select Coefficients, Fits, and MSE. The press OK to run the regression. This will perform linear regression fitting on the given data. If the R2 value is insufficient, look through the factorial analysis to determine the next most significant effect(s) and input them as described in step 1.
- Repeat steps 1-4 for all the remaining response variables.
 Nishimoto, K., Saida, K., Inui, M., Takahashi, M.. Changes in microstructure of hp-modified heat-resisting cast alloys with long term aging. repair weld cracking of long term exposed hp-modified heat-resisting cast alloys. (report 2). Quarterly Journal of the Japan Welding Society 2000;18(3):449–458.
 Nishimoto, K., Saida, K., Inui, M., Takahashi, M.. Mechanism of hot cracking in haz of repair weldments. repair weld cracking of long term exposed hp-modified heat-resisting cast alloys. (report 3). Quarterly Journal of the Japan Welding Society 2000;18(4):590–599.
 Erneman, J., Schwind, M., Liu, P., Nilsson, J.O., Andrén, H.O., Ågren, J.. Precipitation reactions caused by nitrogen uptake during service at high temperatures of a niobium stabilised austenitic stainless steel. Acta Mater 2004;52(14):4337–4350.
 Sourmail, T.. Literature review precipitation in creep resistant austenitic stainless steels. Mater Sci Technol 2001;17(January):1–14. URL http://www.thomas-sourmail.org/papers_html/precipitation_review/precipitation_review.pdf.
 Xiao, B., Xing, J.D., Feng, J., Zhou, C.T., Li, Y.F., Su, W., et al. A comparative study of cr 7 c 3 , fe 3 c and fe 2 b in cast iron both from ab initio calculations and experiments. Journal of Physics D: Applied Physics 2009;42(11):115415.
 Holman, K.L., Morosan, E., Casey, P.A., Li, L., Ong, N.P., Klimczuk, T., et al. Crystal structure and physical properties of mg6cu16si7-type m6ni16si7, for m = mg, sc, ti, nb, and ta. Mater Res Bull 2008;43(1):9–15.
 Minitab, . Technical support document rank deficiency; 2010. http://www.minitab.com/support/documentation/Answers /RankDeficiency.pdf.
- Element, or species, that occupies a specific sublattice of a specific phase. A phase can also
be considered as a constituent of the total system.. 5, 14
- The final chemical formula of a stable or metastable phase whose sublattice(s) are
occupied by single constituents. For example M23C6 is an end member of (Cr,Ni,Fe,Nb)23C6.. 8
- The independent variable of a factorial design. 23
- main effect
- How much the change in an individual factor effects the change in the response variable of
a factorial design.. 24
- Independent repetition of a treatment in a factorial experiment. 26
- response variable
- The dependent variable of a factorial experiment, or a regression model.. 24
- A computational thermodynamics program that can calculate equilibrium phase diagrams for multicomponent systems, as well as Scheil simulations, and various thermodynamic properties (Cp, ΔHm, ΔGm etc...). 3
- A specific level of a factor in a factorial design.. 24
- the effect of the ith level of factor ‘B’
- random error component
- for all instances of ...
- in a set ...
- Overall mean effect
- Chemical potential of component or end-member i
- the effect of the ith level of factor ‘C’
- the effect of the ith level of factor ‘A’
- total Gibbs energy; G = ∑ αmα ·Gmα
- partial Gibbs energy of component i in phase α; Giα = T,P,Nj
- integral molar Gibbs energy of a phase
- constituent array of order i
- interaction parameter of compound I
- fraction of a phase
- moles of component i
- gas constant, 8.314Jmol-1K-1
- coefficient of multiple determination
- molar entropy of a phase
- Temperature (K)
- total mol fraction of component i; xi = ∑ αmα · xiα
- mole fraction of component i in phase α