Skip to main content
Chemistry LibreTexts

14.2: Verifying the Method

  • Page ID
    5582
  • \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \) \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)\(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\) \(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\)\(\newcommand{\AA}{\unicode[.8,0]{x212B}}\)

    After developing and optimizing a method, the next step is to determine how well it works in the hands of a single analyst. Three steps make up this process: determining single-operator characteristics, completing a blind analysis of standards, and determining the method’s ruggedness. If another standard method is available, then we can analyze the same sample using both the standard method and the new method, and compare the results. If the result for any single test is unacceptable, then the method is not a suitable standard method.

    14.2.1 Single Operator Characteristics

    The first step in verifying a method is to determine the precision, accuracy, and detection limit when a single analyst uses the method to analyze a standard sample. The detection limit is determined by analyzing an appropriate reagent blank. Precision is determined by analyzing replicate portions of the sample, preferably more than ten. Accuracy is evaluated using a t-test comparing the experimental results to the known amount of analyte in the standard. Precision and accuracy are evaluated for several different concentrations of analyte, including at least one concentration near the detection limit, and for each different sample matrix. Including different concentrations of analyte helps identify constant sources of determinate error and establishes the range of concentrations for which the method is applicable.

    Note

    See Chapter 4.8 for a discussion of detection limits. Pay particular attention to the difference between a detection limit, a limit of identification, and a limit of quantitation.

    See Section 4.6.1 for a review of the t-test.

    See Chapter 4.2 for a review of constant determinate errors. Figure 4.3 illustrates how we can detect a constant determinate error by analyzing samples containing different amounts of analyte.

    14.2.2 Blind Analysis of Standard Samples

    Single-operator characteristics are determined by analyzing a standard sample that has a concentration of analyte known to the analyst. The second step in verifying a method is a blind analysis of standard samples. Although the concentration of analyte in the standard is known to a supervisor, the information is withheld from the analyst. After analyzing the standard sample several times, the analyte’s average concentration is reported to the test’s supervisor. To be accepted, the experimental mean should be within three standard deviations—as determined from the single-operator characteristics—of the analyte’s known concentration.

    Note

    An even more stringent requirement is to require that the experimental mean be within two standard deviations of the analyte’s known concentration.

    14.2.3 Ruggedness Testing

    An optimized method may produce excellent results in the laboratory that develops a method, but poor results in other laboratories. This is not particularly surprising because a method typically is optimized by a single analyst using the same reagents, equipment, and instrumentation for each trial. Any variability introduced by the analysts, the reagents, the equipment, and the instrumentation is not included in the single-operator characteristics. Other less obvious factors may affect an analysis, including environmental factors, such as the temperature or relative humidity in the laboratory. If the procedure does not require their control, then they may contribute to variability. Finally, the analyst optimizing usually takes particular care to perform the analysis in exactly the same way during every trial, which may minimize the run to run variability.

    An important step in developing a standard method is to determine which factors have a pronounced effect on the quality of the results. Once we identify these factors, we can write into the procedure instructions that specify how these factors must be controlled. A procedure that, when carefully followed, produces results of high quality in different laboratories is considered rugged. The method by which the critical factors are discovered is called ruggedness testing.6

    Note

    For example, if temperature is a concern, we might specify that it be held at 25 ± 2oC.

    Ruggedness testing usually is performed by the laboratory developing the standard method. After identifying potential factors, their effects are evaluated by performing the analysis at two levels for each factor. Normally one level is that specified in the procedure, and the other is a level likely to be encountered when the procedure is used by other laboratories.

    This approach to ruggedness testing can be time consuming. If there are seven potential factors, for example, a 27 factorial design can evaluate each factor’s first-order effect. Unfortunately, this requires a total of 128 trials—too many trials to be a practical solution. A simpler experimental design is shown in Table 14.5, in which the two factor levels are identified by upper case and lower case letters. This design, which is similar to a 23 factorial design, is called a fractional factorial design. Because it includes only eight runs, the design provides information about only the eight first-order factor effects. It does not provide sufficient information to evaluate higher-order effects or interactions between factors, both of which are probably less important than the first-order effects.

    Note

    Why does this model estimate the seven first-order factor effects and not seven of the 20 possible first-order interactions? With eight experiments, we can only choose to calculate seven parameters (plus the average response). The calculation of ED, for example, also gives the value for EAB. You can convince yourself of this by replacing each upper case letter with a +1 and each lower case letter with a –1 and noting that A × B = D. We choose to report the first-order factor effects because they are likely to be more important than interactions between factors.

    Table 14.5 Experimental Design for a Ruggedness Test Involving Seven Factors

    factors

    run

    A

    B

    C

    D

    E

    F

    G

    response

    1

    A

    B

    C

    D

    E

    F

    G

    R1

    2

    A

    B

    c

    D

    e

    f

    g

    R2

    3

    A

    b

    C

    d

    E

    f

    g

    R3

    4

    A

    b

    c

    d

    e

    F

    G

    R4

    5

    a

    B

    C

    d

    e

    F

    g

    R5

    6

    a

    B

    c

    d

    E

    f

    G

    R6

    7

    a

    b

    C

    D

    e

    f

    G

    R7

    8

    a

    b

    c

    D

    E

    F

    g

    R8

    The experimental design in Table 14.5 is balanced in that each of a factor’s two levels is paired an equal number of times with the upper case and lower case levels for every other factor. To determine the effect, Ef, of changing a factor’s level, we subtract the average response when the factor is at its upper case level from the average value when it is at its lower case level.

    \[E_\ce{f}=\dfrac{(\sum R_i)_{\textrm{upper case}}}{4}-\dfrac{(\sum R_i)_{\textrm{lower case}}}{4}\tag{14.16}\]

    Because the design is balanced, the levels for the remaining factors appear an equal number of times in both summation terms, canceling their effect on Ef. For example, to determine the effect of factor A, EA, we subtract the average response for runs 5–8 from the average response for runs 1–4. Factor B does not affect EA because its upper case levels in runs 1 and 2 are canceled by the upper case levels in runs 5 and 6, and its lower case levels in runs 3 and 4 are canceled by the lower case levels in runs 7 and 8. After calculating each of the factor effects we rank them from largest to smallest without regard to sign, identifying those factors whose effects are substantially larger than the other factors.

    Note

    To see that this is design is balanced, look closely at the last four runs. Factor A is present at its level a for all four of these runs. For each of the remaining factors, two levels are upper case and two levels are lower case. Runs 5–8 provide information about the effect of a on the response, but do not provide information about the effect of any other factor. Runs 1, 2, 5, and 6 provide information about the effect of B, but not of the remaining factors. Try a few other examples to convince yourself that this relationship is general.

    We also can use this experimental design to estimate the method’s expected standard deviation due to the effects of small changes in uncontrolled or poorly controlled factors.7

    \[s=\sqrt {\frac{2}{7}\sum E_\ce{f}^2}\tag{14.17}\]

    If this standard deviation is unacceptably large, then the procedure is modified to bring under greater control those factors having the greatest effect on the response.

    Example 14.5

    The concentration of trace metals in sediment samples collected from rivers and lakes can be determined by extracting with acid and analyzing the extract by atomic absorption spectrophotometry. One procedure calls for an overnight extraction using dilute HCl or HNO3. The samples are placed in plastic bottles with 25 mL of acid and placed on a shaker operated at a moderate speed and at ambient temperature. To determine the method’s ruggedness, the effect of the following factors was studied using the experimental design in Table 14.5.

    Factor A: extraction time

    A = 24 h

    a = 12 h

    Factor B: shaking speed

    B = medium

    b = high

    Factor C: acid type

    C = HCl

    c = HNO3

    Factor D: acid concentration

    D = 0.1 M

    d = 0.05 M

    Factor E: volume of acid

    E = 25 mL

    e = 35 mL

    Factor F: type of container

    F = plastic

    f = glass

    Factor G: temperature

    G = ambient

    g = 25oC

    Eight replicates of a standard sample containing a known amount of analyte were carried through the procedure. The analyte’s recovery in the samples, given as a percentage, are shown here.

    R1 = 98.9

    R5 = 97.4

    R2 = 99.0

    R6 = 97.3

    R3 = 97.5

    R7 = 98.6

    R4 = 97.7

    R8 = 98.6

    Determine which factors appear to have a significant effect on the response and estimate the method’s expected standard deviation.

    Solution

    To calculate the effect of changing each factor’s level we use equation 14.16 and substitute in appropriate values. For example, EA is

    \[E_\ce{A} = \dfrac{98.9 + 99.0 + 97.5 + 97.7}{4} - \dfrac{97.4 + 97.3 + 98.6 + 98.6}{4} = 0.30\]

    and EG is

    \[E_\ce{A} = \dfrac{98.9 + 97.7 + 97.3 + 98.6}{4} - \dfrac{99.0 + 97.5 + 97.4 + 98.6}{4} = 0.00\]

    Completing the remaining calculations and ordering the factors by the absolute values of their effects

    Factor D

    1.30

    Factor A

    0.35

    Factor E

    -0.10

    Factor B

    0.05

    Factor C

    -0.05

    Factor F

    0.05

    Factor G

    0.00

    shows us that the concentration of acid (Factor D) has a substantial effect on the response, with a concentration of 0.05 M providing a much lower percent recovery. The extraction time (Factor A) also appears significant, but its effect is not as important as the acid’s concentration. All other factors appear insignificant. The method’s estimated standard deviation, from equation 14.27, is

    \[s = \sqrt{\dfrac{2}{7}\left\{(1.30)^2 + (0.35)^2 + (-0.10)^2 + (0.05)^2 + (-0.05)^2 + (0.05)^2 + (0.00)^2\right\}} = 0.72\]

    which, for an average recovery of 98.1% gives a relative standard deviation of approximately 0.7%. If we control the acid’s concentration so that its effect approaches that for factors B, C, and F, then the relative standard deviation becomes 0.18, or approximately 0.2%.

    14.2.4 Equivalency Testing

    If an approved standard method is available, then the new method should be evaluated by comparing results to those obtained with the standard method. Normally this comparison is made at a minimum of three concentrations of analyte to evaluate the new method over a wide dynamic range. Alternatively, we can plot the results using the new method against the results using the approved standard method. A slope of 1.00 and a y-intercept of 0.0 provides evidence that the two methods are equivalent.


    This page titled 14.2: Verifying the Method is shared under a CC BY-NC-SA 4.0 license and was authored, remixed, and/or curated by David Harvey.