The variation that exists among individuals within a species is termed interspecific variation.

Journal Article

Anthony R. Ives,

1

Department of Zoology, University of Wisconsin-Madison

Madison, Wisconsin 53706, USA

E-mail:

Search for other works by this author on:

Peter E. Midford,

2

Department of Zoology, Southern Illinois University Carbondale

Carbondale, Illinois 62901, USA

Search for other works by this author on:

Theodore Garland, Jr.

3

Department of Biology, University of California

Riverside, California 92521, USA

E-mail:

Search for other works by this author on:

Received:

27 October 2005

Revision received:

26 February 2006

Accepted:

16 October 2006

  • PDF
  • Split View
    • Article contents
    • Figures & tables
    • Video
    • Audio
    • Supplementary Data
  • Cite

    Cite

    Anthony R. Ives, Peter E. Midford, Theodore Garland, Jr., Within-Species Variation and Measurement Error in Phylogenetic Comparative Methods, Systematic Biology, Volume 56, Issue 2, April 2007, Pages 252–270, https://doi.org/10.1080/10635150701313830

    Close

    • Email
    • Twitter
    • Facebook
    • More

Close

Navbar Search Filter Microsite Search Term Search

Abstract

Most phylogenetically based statistical methods for the analysis of quantitative or continuously varying phenotypic traits assume that variation within species is absent or at least negligible, which is unrealistic for many traits. Within-species variation has several components. Differences among populations of the same species may represent either phylogenetic divergence or direct effects of environmental factors that differ among populations (phenotypic plasticity). Within-population variation also contributes to within-species variation and includes sampling variation, instrument-related error, low repeatability caused by fluctuations in behavioral or physiological state, variation related to age, sex, season, or time of day, and individual variation within such categories. Here we develop techniques for analyzing phylogenetically correlated data to include within-species variation, or “measurement error” as it is often termed in the statistical literature. We derive methods for (i) univariate analyses, including measurement of “phylogenetic signal,” (ii) correlation and principal components analysis for multiple traits, (iii) multiple regression, and (iv) inference of “functional relations,” such as reduced major axis (RMA) regression. The methods are capable of incorporating measurement error that differs for each data point (mean value for a species or population), but they can be modified for special cases in which less is known about measurement error (e.g., when one is willing to assume something about the ratio of measurement error in two traits). We show that failure to incorporate measurement error can lead to both biased and imprecise (unduly uncertain) parameter estimates. Even previous methods that are thought to account for measurement error, such as conventional RMA regression, can be improved by explicitly incorporating measurement error and phylogenetic correlation. We illustrate these methods with examples and simulations and provide Matlab programs.

Most existing phylogenetically based statistical methods, as commonly applied, assume that within-species variation is absent or negligible (see reviews by Martins and Hansen, 1996; Rohlf, 2001, 2006; Garland et al., 2005). There are two practical reasons for this. First, many published comparative data sets do not include anything like estimates of standard errors associated with the mean values for species (or populations). Second, although standard statistical methods are available for incorporating measurement error and other sources of variation (Judge et al., 1985; Fuller, 1987), they are not commonly applied (Harmon and Losos, 2005), and they have rarely been considered in the context of phylogenetic statistics in which trait values are correlated among related species (but see, for example, Harvey and Pagel, 1991: chapter 6; Martins and Lamont, 1998; Housworth et al., 2004). A related issue is how to incorporate estimates of error in the phylogenetic topology and branch lengths used for analyses (Purvis and Garland, 1993; Garland and Díaz-Uriarte, 1999; Housworth and Martins, 2001; Huelsenbeck and Rannala, 2003). Here, however, we focus on variation in trait values rather than uncertainties in phylogenies, and throughout we assume the phylogenies are known without error.

Our goal is to provide methods for incorporating within-species variation into phylogenetically based statistical methods for continuous-valued traits. Outside of biological comparative studies, the statistical literature typically refers to the problem that we address as one of “measurement error” (Fuller, 1987). Measurement error refers to any type of variation between an observed value and the “true” value of interest, such as the mean value of a trait for a species or for a given population within a species. Thus, estimates of means for whole species will be affected by differences among populations, by how many populations are sampled to compute a composite mean, and by how many and what kind of individuals are sampled from each of those populations (Pagel and Harvey, 1988b; Harvey and Pagel, 1991). Measurement error also occurs within populations, with sources including sampling variation, instrument-related error, low repeatability caused by fluctuations in behavioral or physiological state, variation related to age, sex, season, or time of day, and true individual variation within such categories.

We argue that estimates of standard errors associated with mean values for species (or populations for phylogenetic comparisons of intraspecific trait variation, e.g., Ashton, 2004), as are now commonly reported in the empirical comparative method literature, provide a convenient, useful, and statistically justified way to capture the numerous sources of measurement error. Furthermore, accounting for measurement error in this way can improve parameter estimates and tests of statistical significance for problems involving phylogenetically correlated data. Historically, many comparative studies relied largely on previously published sources for their data. As such, they rarely reported more than a mean value for each taxon under consideration. More recently, however, comparative studies are often conducted de novo, such that new data are reported and standard errors (or standard deviations and sample sizes) are becoming more commonly available. These standard errors incorporate at least part of the total measurement error. Although estimating the total measurement error (e.g., the variation among all populations of a species) is unrealistic, incorporating the measurement error associated with the observations that are actually made to determine a species mean may provide substantial improvement to statistical methods.

In addition to addressing the case when standard errors are available for each “tip” associated with a phylogenetic tree, our techniques can also be used when less information is available. For example, in allometric studies that aim to obtain functional relations among traits, “general structural relation” models are often employed, with reduced major axis (RMA) regression a special case that is most frequently used (Rayner, 1985). General structural relation models use very little information about measurement error; for instance, RMA regression assumes simply that the ratio of within-species variance for traits is equal to the ratio of total variance of the trait values. We provide statistical models for general functional relations that incorporate phylogenetic correlation and measurement error in the most general form, with phylogenetic counterparts to general structural relation models and RMA regression as special cases.

The source of measurement error, as we have broadly defined it, will affect its statistical properties. For instance, the measurement error for one trait might be uncorrelated to the measurement error for another trait if measurement error is caused by instrument-related error and each trait is measured with a different instrument. In contrast, if measurement error is caused by among-population variation within species, then measurement errors for different traits could be correlated; for example, a functional relation between body mass and leg length observed among species might also occur among populations within each species, causing within-species variation (measurement error) of the two traits to be correlated. Although it is rare for researchers to report correlations in measurement errors among traits, we nonetheless assume that this correlation can be nonzero so that the techniques we develop can be applied under the most general circumstances.

In the literature on phylogenetically based statistical methods (“comparative methods”), several studies have explicitly considered measurement error. Lynch (1991) used a mixed models approach to phylogenetic analyses that explicitly separates components of variation due to heritable and nonheritable sources; nonheritable sources of variation include measurement error. This approach forms the basis of our work and also other previous analyses of measurement error. Christman et al. (1997), Felsenstein (2004), and Housworth et al. (2004) incorporated measurement error to estimate correlations between traits by treating individual organisms as the units of study, grafting the data from individuals onto a phylogenetic tree, with each species represented by a hard polytomy of individuals. The length of the tip branchlets (i.e., the within-species variance) is estimated simultaneously with other parameters of the overall statistical model. Our general approach is closely related, although we separate the estimation of the measurement error (i.e., the standard errors of the species values) from the estimation of parameters in the model. Although our approach does not account for the uncertainty in the estimates of the standard errors of the within-species variance, the estimates of the standard errors are unbiased, and the availability of data on species (or population) means plus standard errors is generally much greater than the availability of raw data on the measurements of all individuals (e.g., whenever data come partly from published sources). Furthermore, our approach is both easier to apply and more flexible, allowing researchers to provide known information about measurement error associated with each tip value. Thus, even if raw data on the measurement of individuals are available, it will be easier to summarize this information as standard errors and use the procedures we derive. Finally, our methods can be modified for the case when even less is known about measurement error—for example, when the standard errors averaged among species are known even though the species-specific standard errors are not.

We derive a suite of methods incorporating measurement error into frequently used statistical tests. First, we incorporate measurement error into univariate models that aim to estimate ancestral traits (e.g., Bonine et al., 2005) and quantify the magnitude of phylogenetic signal; that is, the amount of variation among species that can be attributed to phylogenetic relatedness (Blomberg et al., 2003). Second, we develop methods for calculating correlation coefficients between traits while accounting for both phylogenetic relatedness and measurement error. An extension of the correlation analysis leads to a phylogenetic version of principal components analysis (PCA) that summarizes the correlations among multiple traits. Third, we incorporate measurement error into phylogenetic regression, when there is a single dependent variable and one or more independent variables. Fourth, we derive measurement error methods for functional relation models, in which the mathematical relationship between variables is calculated without assuming that one (dependent) variable is driven by other variables, as is the case in regression. Functional relation models produce as special cases phylogenetic versions of RMA regression and other types of general structural models that include measurement error.

Each of these four problems can be analyzed using different statistical estimation approaches. Throughout this article, we consider three approaches: estimated generalized least squares (EGLS; Judge et al., 1985), maximum likelihood (ML), and restricted maximum likelihood (REML); these are described in more detail in  Appendix 1. These three estimation approaches have different advantages and disadvantages. Rather than perform exhaustive comparisons among estimation methods, instead we illustrate the statistical characteristics of each method by applying them to real data and performing selective simulation studies. Our philosophy is that, when in doubt, it is best to use multiple estimation methods, and if they give different results, perform post hoc diagnostics to select the best (e.g., least biased and most precise). All data analyses and simulations were performed using programs written in Matlab that are available from TG upon request.

Analyses

Univariate Analyses and Phylogenetic Signal

The problem of finding the best estimator of the expectation of a random variable when there is phylogenetic correlation and measurement error is given by the statistical model

The variation that exists among individuals within a species is termed interspecific variation.

where X* is a N × 1 dimensional vector containing the true values of a trait in a sample of N species (tips), a is a scalar giving the expected value of the trait, ε is a N × 1 vector of zero-mean error terms depicting the evolutionary variance of the trait among species, X is a N × 1 vector containing the observed values of the trait, and η is the N × 1 vector of errors associated with measurement. Note that for notational convenience we have written X* = a + ε as the sum of a scalar and a vector to represent X* = a1 + ε where 1 is the N × 1 vector of ones.

Because closely related species will likely have similar values of trait x, values of ε will be correlated among species. Thus, we assume the covariance matrix for ε is given by E{εε'} = σ2C, where σ2 scales the overall phylogenetically inherited variance (sometimes referred to as the rate of evolution; Garland et al., 1999; Garland and Ives, 2000), and C gives the correlation structure created by phylogenetic relatedness. The most common assumption in phylogenetic analyses is that evolution proceeds like a “Brownian motion” process; through time, the value of a trait changes in small increments in random directions, like a random walk in continuous time (Felsenstein, 1985). Under this assumption, ε has a multivariate normal distribution in which the element cij of C is proportional to the length of the shared branches, from root to the last common ancestor, between species i and j (Felsenstein, 1985; Hansen and Martins, 1996; Martins and Hansen, 1997; Garland and Ives, 2000). Other models of evolutionary change are possible, such as including a nonphylogenetic component of evolutionary change (Lynch, 1991; Freckleton et al., 2002; Housworth et al., 2004) or assuming evolution follows an Ornstein-Uhlenbeck process (Hansen and Martins, 1996; Blomberg et al., 2003); each of these will lead to a different translation of branch lengths into the covariance matrix C, but the model given by Equation 1 can be applied regardless of how C is selected.

The measurement error term η similarly has a covariance matrix σm2M. If measurement errors are uncorrelated among species, M is a diagonal matrix, and the variance due to measurement error of trait x for species i is σm2mii, where mii is the ith diagonal element of M. It is possible that measurement errors are correlated among species, as might be the case if trait values for a given clade were all measured by a single researcher using the same technique that differed from the techniques used for other clades. In this case, correlation among measurement errors can be incorporated into off-diagonal elements of M. Although we do not consider correlated measurement errors in detail, nonzero off-diagonal elements of M can be used in all of the methods we derive. Finally, although we will typically assume that ε and η have multivariate normal distributions, for some of the statistical procedures described below, ε and η need not be restricted to being normally distributed.

Consider first the case of no measurement error. Equation 1 can be reformulated as a phylogenetic regression problem in which the error terms are correlated, and hence can be analyzed using either independent contrasts or, as we will do here, generalized least squares (Hansen and Martins, 1996; Garland and Ives, 2000; Rohlf, 2001). Because C is a covariance matrix (and hence real, symmetric, and nonsingular), there exists another matrix D such that DCD' = I, where the apostrophe denotes transpose and I is the N × N identity matrix. Matrix D can be used to transform values of trait x by letting Z = DX, U = D1 (the N × 1 vector of 1's), and α = . From Equation 1 (with η = 0), this gives

The variation that exists among individuals within a species is termed interspecific variation.

The covariance matrix of α is E{αα′} = E{Dε ()′} = E{Dε ε′D'} = DE{εε′}D' = D(σ2C) D' = σ2I. Thus, no covariance terms appear in the covariance matrix of α, so the error terms αι are uncorrelated. Equation 2 can, therefore, be analyzed as a standard least-squares regression problem with independent errors. Specifically, the generalized least-squares (GLS) estimator of a is

The variation that exists among individuals within a species is termed interspecific variation.

The corresponding estimate of σ2 is the mean squared error,

The variation that exists among individuals within a species is termed interspecific variation.

What advantages does the phylogenetic mean value of trait x,

The variation that exists among individuals within a species is termed interspecific variation.
⁠, have over the sample mean,
The variation that exists among individuals within a species is termed interspecific variation.
=
The variation that exists among individuals within a species is termed interspecific variation.
∑i = 1Nxi? The expectations of both
The variation that exists among individuals within a species is termed interspecific variation.
and
The variation that exists among individuals within a species is termed interspecific variation.
are a, so both estimates are unbiased. Nonetheless,
The variation that exists among individuals within a species is termed interspecific variation.
has lower variance than
The variation that exists among individuals within a species is termed interspecific variation.
; in fact,
The variation that exists among individuals within a species is termed interspecific variation.
is the minimum-variance estimator of a (Judge et al., 1985).

When there is measurement error, the expression for the observed trait values x from Equation 1 can be written

The variation that exists among individuals within a species is termed interspecific variation.

where the observed (total) error term, ε + η, has covariance matrix σψ2 Ψ = σ2C + σm2M. Consider first the case in which the measurement error is known for each species, so the covariance matrix σm2M is known. (Conventional notation separates the covariance matrix into two components, σm2 and M, and assuming the measurement error is known implies both components are known.) Because the covariance matrix C is determined by the phylogeny, the only parameters that must be estimated are a and σ2. However, unlike the case without measurement error, a simple expression like Equation 3 cannot be derived for the estimate of a, because the matrix σψ2 Ψ = σ2C + σm2M now contains a parameter, σ2, that does not occur as a simple multiplicative term scaling the overall magnitude of the covariance matrix Ψ.

The estimation problem presented by Equation 5 is referred to in the statistical literature as a “measurement error known” problem (Fuller, 1987), because we assume that σm2 has been estimated independently (as reported by the standard errors of mean values for species). For nonphylogenetic analyses, corrective steps for known measurement error are fairly straightforward (Fuller, 1987). Unfortunately, these corrective steps cannot be applied when there is phylogenetic correlation (as incorrectly done by Irschick et al., 1996), and the methods we provide below are needed. However, other measurement error problems can be solved rather simply when there is phylogenetic correlation (CI). Specifically, if instead of knowing the measurement error variance σm2 we know the ratio of measurement error variance to true variance σm2/σ2, it is possible to calculate the phylogenetic mean by replacing C in Equation 3 with Ψ = C + (σm2/σ2) M and treat the problem in the usual GLS or independent contrasts fashion. Because this simple case has been addressed elsewhere (Pagel and Harvey, 1988a, 1988b; Harvey and Pagel, 1991), we do not consider it further.

Estimation

In Equation 5, two parameters are unknown: the mean value a of trait x for all species (or, equivalently, the hypothetical ancestral value at base of tree) and the phylogenetic variance σ2 (or, equivalently, the rate of evolution). These parameters can be estimated using an iterated version of estimated generalized least-squares (EGLS), maximum likelihood (ML), and restricted maximum likelihood (REML). To obtain ML and REML estimates, it is necessary to specify the form of the distribution of error terms ε and η; a natural assumption, and the one we use here, is that ε and η are normally distributed. Because the covariance matrix Ψ contains the parameter σ2 that must be estimated, for all three methods the confidence intervals calculated for

The variation that exists among individuals within a species is termed interspecific variation.
are approximations. Note that the difficulties in estimation when there is measurement error disappear when there is no measurement error, in which case GLS and ML estimates are the same, and provided ε is normally distributed, the estimates of
The variation that exists among individuals within a species is termed interspecific variation.
are t-distributed.

 Appendix 1 gives a full account of these methods as applied in this article. Also, univariate EGLS estimation can be implemented using independent contrasts, as done in the MS DOS program PD_SE.EXE (available from TG) and used by Bonine et al. (2005).

Example

As an example, we analyzed data from Martins and Lamont (1998) on display duration for nine species of lizards. We chose this example because it is a real comparative data set, is small enough to depict our results graphically, and has large enough standard errors for some species that the effects of incorporating measurement error are clearly apparent. For each species, Martins and Lamont (1998) provide the standard error of the measure of the trait, which we use to compute the matrix σm2M under the assumption that measurements are independent among species. For comparison, we computed parameter estimates assuming (i) no phylogenetic correlation among species (C = I; equivalent to assuming a “star phylogeny”) and no measurement error (M = 0), for which the estimate of a is simply the sample mean; (ii) no phylogenetic correlation but measurement error, with the measurement error variance differing among points (species); (iii) phylogenetic correlation (using as the “true” tree, Fig. 1a) and no measurement error, which gives the standard phylogenetic case analyzed by independent contrasts or GLS; and (iv) phylogenetic correlation and measurement error. For each set of assumptions, we computed 95% confidence intervals of the estimates using three approaches. First, for EGLS we used the standard GLS formulae ignoring that we estimated a parameter in the covariance matrix Ψ and the uncertainty associated with this estimate (Neter et al., 1989). Second, for ML we derived approximate confidence intervals from the log-likelihood function (Judge et al., 1985); this is a standard procedure used in ML estimation. Third, for all three estimation methods we used parametric bootstrapping under the assumption that both measurement and true errors are normally distributed. Parametric bootstrapping (Efron and Tibshirani, 1993) is a simulation procedure in which parameters are first estimated (by whatever method is being used), the statistical model with its estimated parameters is used to simulate data sets, and the parameters are estimated from the simulated data. After repeating this many (e.g., 2000) times, the resulting set of estimates approximates the distribution of the estimator (see  Appendix 1 for details). The term “parametric bootstrapping” is potentially confusing, because unlike standard (nonparametric) bootstrapping, the residuals obtained from the true data are not resampled to create new data sets but are instead simulated. Parametric bootstrapping is necessary in our case, because we do not know the actual measurements for each sample used to give species values; therefore, the measurement error must be simulated from a random number generator. Although it might be less confusing to refer to parametric bootstrapping more simply as “simulation” to obtain confidence intervals, this then introduces confusion when we perform simulations to explore the statistical properties of the estimation methods. A particular advantage of parametric bootstrapping is that not only does it give confidence intervals, it also identifies bias; if, for example, the mean of the bootstrapped estimates is lower than the true estimate, then this identifies that the estimator is downward biased.

Figure 1

The variation that exists among individuals within a species is termed interspecific variation.

For the univariate case using data from Martins and Lamont (1998), the effects of measurement error can be visualized by constructing a tree that corresponds to the covariance structure of the data combining both phylogenetic covariance and measurement error variance (see text). The measurement error variance lengthens the terminal branch segments of the tree, with the length of the tip extension giving the measurement error variance relative to the variance of the evolutionary process. (a) The phylogenetic tree from which the covariance matrix σ2C is calculated. (b) The phylogenetic tree with the variance associated with measurement error for total display duration graphed onto the tips of the tree, thereby giving a graphical representation of the covariance matrix σ2C + σm2M. For comparison, (c) is like (b) but with measure error for another trait, headbob duration. By increasing the expected within-species variances without changing among-species covariances, measurement error decreases the among-species correlations in the observed data. The table gives trait values and standard errors of the measurements for both traits and estimates of the phylogenetic mean a are given for each tree.

All three estimation methods incorporating measurement error gave similar estimates of a and σ2 when phylogenetic correlation was not included (i.e., C = I, case ii). However, when assuming Brownian motion evolution along the true phylogeny (i.e., CI, case iv), the ML estimates of both parameters a and σ2 differed sizably from the estimates obtained from EGLS and REML (Table 1). The ML estimate of σ2 appears to be strongly downward biased; the ML estimate of σ2 is 0.049 and the mean of the bootstrapped estimator is 0.032. Bias of ML estimates of variances is a common observation found in many types of statistical problems, and the relative lack of bias of REML estimates is a frequent justification for preferring REML over ML (Patterson and Thompson, 1971; Cooper and Thompson, 1977; Smyth and Verbyla, 1996). Unfortunately, there is no good way to predict a priori the magnitude of bias; in this particular example, strong bias in the ML estimator only occurred for cases when phylogenetic correlation was incorporated.

Table 1.

Parameter estimates of the phylogenetic mean a and variance σ2 and the measure of phylogenetic signal K* for data on total display duration of nine species of iguanas (from Martins and Lamont, 1998: fig. 1).

PhylogenyMethodPhylogenetic mean aBootstrap estimateσ2Bootstrap estimateK*
I (star)GLSc (no m.e.)  2.821 (2.08, 3.56)  2.83 (2.22, 3.46)0.93 (0.22, 1.81)  0.94 (0.25, 2.05) 
  EGLS2.95 (2.32, 3.57)2.95 (2.41, 3.47)  0.29 (0.070, 0.57)0.30 (0, 1.12)   
  ML2.95 (2.40, 3.50)2.96 (2.48, 3.43)  0.22 (0, 0.75)0.19 (0, 0.80)   
  REML2.94  2.95 (2.42, 3.46)  0.28  0.29 (0, 1.04) 
C (true)GLS (no m.e.)  2.521 (0.10, 4.94)  2.53 (0.49, 4.63)  1.92 (0.46, 3.74)  1.91 (0.52, 4.17)  0.32 (P < 0.05)
  EGLS  2.76 (1.61, 3.91)2.77 (1.73, 3.74)  0.35 (0.084, 0.68)  0.40 (0, 1.77)  0.53 (P > 0.4) 
  ML  2.94 (2.16, 3.72)  2.93 (2.44, 3.41)  0.049 (0, 0.44)  0.032 (0, 0.20)  4.5 (P > 0.5) 
  REML  2.76  2.75 (1.80, 3.72)  0.32  0.32 (0, 1.10)  0.57 (P > 0.4) 

PhylogenyMethodPhylogenetic mean aBootstrap estimateσ2Bootstrap estimateK*
I (star)GLSc (no m.e.)  2.821 (2.08, 3.56)  2.83 (2.22, 3.46)0.93 (0.22, 1.81)  0.94 (0.25, 2.05) 
  EGLS2.95 (2.32, 3.57)2.95 (2.41, 3.47)  0.29 (0.070, 0.57)0.30 (0, 1.12)   
  ML2.95 (2.40, 3.50)2.96 (2.48, 3.43)  0.22 (0, 0.75)0.19 (0, 0.80)   
  REML2.94  2.95 (2.42, 3.46)  0.28  0.29 (0, 1.04) 
C (true)GLS (no m.e.)  2.521 (0.10, 4.94)  2.53 (0.49, 4.63)  1.92 (0.46, 3.74)  1.91 (0.52, 4.17)  0.32 (P < 0.05)
  EGLS  2.76 (1.61, 3.91)2.77 (1.73, 3.74)  0.35 (0.084, 0.68)  0.40 (0, 1.77)  0.53 (P > 0.4) 
  ML  2.94 (2.16, 3.72)  2.93 (2.44, 3.41)  0.049 (0, 0.44)  0.032 (0, 0.20)  4.5 (P > 0.5) 
  REML  2.76  2.75 (1.80, 3.72)  0.32  0.32 (0, 1.10)  0.57 (P > 0.4) 

a

Star phylogeny assuming no phylogenetic relatedness; covariance matrix is the identity matrix I.

b

True phylogeny with covariance matrix C.

c

Generalized least squares assuming no measurement error.

d

Estimated generalized least squares incorporating measurement error.

e

Maximum likelihood incorporating measurement error.

f

Restricted maximum likelihood incorporating measurement error.

1

Also implemented in the MS DOS program PD_SE.EXE, as used in Bonine et al. (2005).

2

Approximate 95% confidence interval obtained from GLS.

3

Approximate 95% confidence interval obtained from ML.

4

Approximate 95% confidence interval obtained from parametric bootstrapping.

5

Probability of rejecting the null hypothesis that K* equals 1 (Brownian motion evolution along specified phylogeny).

Table 1.

Parameter estimates of the phylogenetic mean a and variance σ2 and the measure of phylogenetic signal K* for data on total display duration of nine species of iguanas (from Martins and Lamont, 1998: fig. 1).

PhylogenyMethodPhylogenetic mean aBootstrap estimateσ2Bootstrap estimateK*
I (star)GLSc (no m.e.)  2.821 (2.08, 3.56)  2.83 (2.22, 3.46)0.93 (0.22, 1.81)  0.94 (0.25, 2.05) 
  EGLS2.95 (2.32, 3.57)2.95 (2.41, 3.47)  0.29 (0.070, 0.57)0.30 (0, 1.12)   
  ML2.95 (2.40, 3.50)2.96 (2.48, 3.43)  0.22 (0, 0.75)0.19 (0, 0.80)   
  REML2.94  2.95 (2.42, 3.46)  0.28  0.29 (0, 1.04) 
C (true)GLS (no m.e.)  2.521 (0.10, 4.94)  2.53 (0.49, 4.63)  1.92 (0.46, 3.74)  1.91 (0.52, 4.17)  0.32 (P < 0.05)
  EGLS  2.76 (1.61, 3.91)2.77 (1.73, 3.74)  0.35 (0.084, 0.68)  0.40 (0, 1.77)  0.53 (P > 0.4) 
  ML  2.94 (2.16, 3.72)  2.93 (2.44, 3.41)  0.049 (0, 0.44)  0.032 (0, 0.20)  4.5 (P > 0.5) 
  REML  2.76  2.75 (1.80, 3.72)  0.32  0.32 (0, 1.10)  0.57 (P > 0.4) 

PhylogenyMethodPhylogenetic mean aBootstrap estimateσ2Bootstrap estimateK*
I (star)GLSc (no m.e.)  2.821 (2.08, 3.56)  2.83 (2.22, 3.46)0.93 (0.22, 1.81)  0.94 (0.25, 2.05) 
  EGLS2.95 (2.32, 3.57)2.95 (2.41, 3.47)  0.29 (0.070, 0.57)0.30 (0, 1.12)   
  ML2.95 (2.40, 3.50)2.96 (2.48, 3.43)  0.22 (0, 0.75)0.19 (0, 0.80)   
  REML2.94  2.95 (2.42, 3.46)  0.28  0.29 (0, 1.04) 
C (true)GLS (no m.e.)  2.521 (0.10, 4.94)  2.53 (0.49, 4.63)  1.92 (0.46, 3.74)  1.91 (0.52, 4.17)  0.32 (P < 0.05)
  EGLS  2.76 (1.61, 3.91)2.77 (1.73, 3.74)  0.35 (0.084, 0.68)  0.40 (0, 1.77)  0.53 (P > 0.4) 
  ML  2.94 (2.16, 3.72)  2.93 (2.44, 3.41)  0.049 (0, 0.44)  0.032 (0, 0.20)  4.5 (P > 0.5) 
  REML  2.76  2.75 (1.80, 3.72)  0.32  0.32 (0, 1.10)  0.57 (P > 0.4) 

a

Star phylogeny assuming no phylogenetic relatedness; covariance matrix is the identity matrix I.

b

True phylogeny with covariance matrix C.

c

Generalized least squares assuming no measurement error.

d

Estimated generalized least squares incorporating measurement error.

e

Maximum likelihood incorporating measurement error.

f

Restricted maximum likelihood incorporating measurement error.

1

Also implemented in the MS DOS program PD_SE.EXE, as used in Bonine et al. (2005).

2

Approximate 95% confidence interval obtained from GLS.

3

Approximate 95% confidence interval obtained from ML.

4

Approximate 95% confidence interval obtained from parametric bootstrapping.

5

Probability of rejecting the null hypothesis that K* equals 1 (Brownian motion evolution along specified phylogeny).

Comparing cases ii and iv, accounting for measurement error results in markedly lower estimates of σ2. This occurs because part of the variability of the data is attributed to measurement error, leaving less true variability among species. The effects of measurement error can be visualized by constructing a tree that gives the covariance structure of the data combining both phylogenetic covariance and measurement error variance; this is done using the EGLS estimates σ2 in Figure 1 for both total display duration and, for comparison, headbob duration (for another example, see Bonine et al., 2005). The effect of measurement error is to lengthen the terminal branch segments of the tree beyond the strict phylogenetic tree, with the length of the tip extension giving the measurement error variance. By increasing within-species variances without changing among-species covariances, measurement error decreases the among-species correlations in the observed data.

Accounting for measurement error will increase estimates of the strength of phylogenetic signal in data sets. Blomberg et al. (2003) derived a measure, K*, of the strength of phylogenetic signal. The measure K* depends on the ratio of the rate of evolution (measured by σ2) required to explain the variability in a trait among species under the assumption of no phylogenetic correlation (C = I) to the rate of evolution required under the assumption that C is given by the working phylogeny. This ratio computed for the data is then compared to the theoretical expectation of the ratio to give K*. A value of K* = 1 implies that the observed pattern of covariances in the data is consistent with that expected from the working phylogeny (specified by the covariance matrix C), whereas values of K* less than one imply that the strength of phylogenetic correlation is lower than expected from the phylogeny. Thus, values of K* less than 1 imply weaker phylogenetic signal. When measurement error exists, K* should be calculated after removing the variance caused by measurement error. Thus, K* depends on the estimated variance σ2 of the “true” values X* rather than the variance associated with the observed values X, which also depends on σm2M. (Note that Blomberg et al. [2003] also derive a measure K that is closely correlated to K*. For technical reasons we will not discuss here, in measurement error problems K* is a more appropriate measure of phylogenetic signal. See also Rohlf, 2006.)

The estimate of K* for lizard display duration is statistically significantly less than 1 when assuming no measurement error (Table 1). In contrast, the value of K* estimated, while accounting for measurement error is not statistically different from 1 (Table 1). Thus, accounting for measurement error reveals the underlying phylogenetic signal. Note that the ML estimate of K* is greater than 1, although this is due to the same bias that produced the low ML estimate of σ2.

Simulation

In the example above, we know neither the true value of a nor the true phylogenetic correlation, making it impossible to study the statistical properties of the parameter estimators. To examine these properties, we simulated data using the phylogeny from Martins and Lamont (1998; see Fig. 1). We assumed that the trait evolves in a Brownian motion fashion with σ2 = 0.35 (the EGLS estimate from the data for total display duration). To simulate measurement error, we assumed that the standard deviation of a measurement on a single animal is twice the reported standard error of total duration as reported in Martins and Lamont (1998). This gives high measurement error and hence a strong test of the estimation methods incorporating measurement error. To vary measurement error, we assumed that data from n = 2k (k = 0, 1, …, 6) individuals were obtained for each species; increasing the sample size n decreases measurement error because the standard error of the measurement error is proportional to 1/✓ n. For each simulated data set, we calculated the estimates of a, σ 2, and the measure of phylogenetic signal K*. We used only EGLS estimation; REML estimation gave similar results, and ML estimation showed considerable bias, particularly in the estimates of K*.

The simulations show that accounting for measurement error has little effect on the estimate of a, although the confidence intervals decrease (Fig. 2). In contrast, the estimate of σ2 is greatly improved when measurement error is incorporated into the analysis. Nonetheless, when measurement error is large, even accounting for measurement error does not overcome an upwards bias in the estimate of σ2. Similarly, when measurement error is accounted for, the estimate of K* is less biased around its true value of 1 and has confidence intervals that are relatively insensitive to the strength of measurement error. In contrast, when measurement error is ignored, estimates of K* are markedly low when there is large measurement error. We should point out, however, that the measurement error used in the simulations was very high; with a sample size of one, the average standard error of the measurement error was 1.4, which is more than twice the standard deviation of the true among-species error, 0.59. The measurement error reported by Martins and Lamont (1998) corresponds to our simulated sample size of 22 = 4, and the bias in both σ2 and K* above this is minimal for the estimation methods incorporating measurement error.

Figure 2

The variation that exists among individuals within a species is termed interspecific variation.

Simulation of the univariate case to provide estimates of (a) a, (b) σ, and (c) the measure of phylogenetic signal K*. Solid lines give the EGLS estimates accounting for measurement error, and the corresponding 95% bounds of the estimate are given by the shaded region. Dashed lines give the estimate and 95% bounds of the estimate obtained without accounting for measurement error (GLS). We assume the 9-tip phylogeny presented by Martins and Lamont (1998). Trait x evolves according to Brownian motion evolution, with a = 0 and σ2 = 0.35. Measurement error for measurements on single individuals is assumed to have standard error equal to 2 times the standard error provided by Martins and Lamont (1998) for total display duration. For each simulated sample size n = 2k (k = 0, 1, …, 6), 2000 data sets were simulated, and estimates for each parameter were computed.

Correlation between Traits

When measurement error exists, the correlation coefficient between two traits x and y can be calculated from the statistical model

The variation that exists among individuals within a species is termed interspecific variation.

where as before X* and Y* represent the true values of traits x and y among species with the true variation given by εx and εy, and X and Y are the values observed with measurement error ηx and ηy.

The joint covariance matrix for εx and εy is

The variation that exists among individuals within a species is termed interspecific variation.

where ε is the 2N × 1 vector of error terms created by stacking εx on top of εy, and Cxy = Dx− 1(Dy′)− 1 where Dx− 1 and Dy− 1 are the Cholesky decompositions of Cx and Cy such that DxCxDx = DyCyDy = I. In this formulation (and in our Matlab code), the matrices Cx and Cy can differ, and therefore trees with different branch lengths (or even different trees) can be used for different traits. For measurement errors

The variation that exists among individuals within a species is termed interspecific variation.

where η is the 2N × 1 vector created by stacking ηx on top of ηy, σmx2Mx and σmy2My are matrices containing the measurement error variances, and rm < eqid25 > σmyMxy is the matrix containing covariances in measurement errors between traits for each species. If measurement errors for each trait are independent among species, then Mx, My, and Mxy will be diagonal matrices (i.e., all off-diagonal elements will be zero). If measurement errors for the two traits within species are correlated (e.g., the measurements of traits x and y for a given species tend to err either high or low in unison), then this correlation is given by rmMxy.

As in the univariate case, the observed values of traits x and y can be expressed in terms of both ε and η as

The variation that exists among individuals within a species is termed interspecific variation.

where W is the 2N × 1 vector created by stacking X and Y, and A is the 2N × 1 vector whose first N elements are ax and second N elements are ay. The resulting covariance matrix E(WA)(WA)′ = σ2 Ψ is

The variation that exists among individuals within a species is termed interspecific variation.

The case for more than two variables is similar: W is created by stacking vectors of trait values, and σ2 Ψ is constructed with diagonal blocks σ x2Cx+ σmx2Mx and off-diagonal blocks rσxσ yCxy+ rmσmxσmyMxy for any pair of traits x and y.

Estimation

When estimates are available for the standard errors of the trait values for each species, these give the values of σmx2Mx, σmy2My, and rm < eqid26 > σmyMxy. Furthermore, the phylogeny and associated assumption about evolutionary change give Cx and Cy. Therefore, the only parameters that must be estimated are ax, ay, σx2, σy2, and r for the case of bivariate correlation. As with the univariate case, multiple methods can be used to estimate the parameters for the model given by Equations 9 and 10. Here, we illustrate EGLS and REML ( Appendix 1), although we also provide Matlab programs for ML. EGLS has the advantage that it can be formally applied when the true variation and/or measurement error variation are not normally distributed. As we show below, REML has the advantage of having almost no bias, compared to a slight bias shown by EGLS. Furthermore, when calculating the correlation between multiple pairs of traits, REML (and ML) uses data from all of the traits in estimating each pairwise correlation; this leads to the best estimates when performing multivariate analyses such as PCA. Although there are multiple methods for obtaining confidence intervals for the estimates (see “Univariate Analyses” above), we restricted attention to parametric bootstrapping; for small sample sizes typical of many phylogenetic studies, estimators of the correlation coefficients are often biased, and therefore parametric bootstrapping is often the most robust approach for obtaining confidence intervals.

Example

We analyzed data from Bauwens et al. (1995) on the body mass, hind-limb length, and sprint speed of 13 species of lizards using phylogeny A from their figure 2. Their table 1 provides means and standard errors for these traits on the arithmetic scale. We chose this example because it includes traits that might be subjected to a number of different statistical analyses (correlation, regression, and functional relation models) and because it is of a size (13 species, n = 4 to 20 individuals measured per species), which is not atypical of “small” comparative studies (e.g., see compilations in Ricklefs and Starck, 1996; Freckleton et al., 2002; Blomberg et al., 2003). We log-transformed all traits, which reduced skew in the distribution of trait values (analyses not presented). When log-transforming values that are measured with variation, both the mean and variance of the log-transformed data depend on the variance of the measurement error; thus, we assumed that a given trait value for a given species was log-normally distributed and performed the log-transformation accordingly ( Appendix 2). Finally, we assumed that measurement errors are not correlated among traits, so rm = 0 in Equation 10.

In this example, as is likely to be common (e.g., Martins and Lamont, 1998; Bonine et al., 2005), the sample sizes for some species values were small (n = 4). When sample sizes are small, the standard errors themselves are imprecise estimates of the measurement error. In practice, this issue is often inconsequential, because the estimates of measurement error, while imprecise, are nonetheless unbiased. A possible approach when there are small sample sizes, or if some species are represented by a single individual (e.g., Langerhans et al., 2006), is to compute the average per sample measurement error and from this calculate the measurement error for each species based on its corresponding sample size ( Appendix 3). For the analyses below, we used both the standard errors provided in Bauwens et al. (1995) and the measurement error obtained by averaging across species; both procedures gave quantitatively very close results and so we present only the results using the standard errors for each species.

For a bivariate example, we computed estimates of r between body mass and sprint speed using GLS (i.e., with no measurement error), EGLS, and REML assuming either no phylogenetic relatedness among species (a star phylogeny, Cx = Cy = I) or phylogenetic relatedness given by the true phylogeny under Brownian motion evolution (Table 2). For the star phylogeny, EGLS and REML estimates of r were similar and did not differ greatly from the GLS estimate. However, for the true phylogeny, the EGLS estimate (0.025) was similar to the GLS estimate (0.022), and both were much lower than the REML estimate (0.341). The mean of the REML bootstrapped estimates of r (0.327) was lower than the REML estimate, suggesting that if anything, the REML estimate is biased downwards. This suggests that the EGLS estimate (0.025) is even more severely biased than the REML estimate. Despite the large difference between the EGLS and REML estimates, the confidence intervals for both are large, and in neither case is the estimate of r statistically different from zero.

Table 2.

GLS, EGLS, and REML estimates of the correlation coefficient (r) between log body size and log sprint speed from Bauwens et al. (1995).

PhylogenyGLSGLS bootstrapEGLSEGLS bootstrapREMLREML bootstrap
I (star)  0.466  0.4541 (−0.11, 0.81)0.478  0.4651 (−0.15, 0.85)0.497  0.4861 (−0.098, 0.85)
C (true)  0.022  0.0171 (−0.57, 0.56)0.025  0.0331 (−0.60, 0.65)0.341  0.3311 (−0.28, 0.81)

PhylogenyGLSGLS bootstrapEGLSEGLS bootstrapREMLREML bootstrap
I (star)  0.466  0.4541 (−0.11, 0.81)0.478  0.4651 (−0.15, 0.85)0.497  0.4861 (−0.098, 0.85)
C (true)  0.022  0.0171 (−0.57, 0.56)0.025  0.0331 (−0.60, 0.65)0.341  0.3311 (−0.28, 0.81)

1

Mean of the parametric bootstrap distribution of r.

2

95% parametric bootstrap confidence intervals from 2000 replication data sets.

Table 2.

GLS, EGLS, and REML estimates of the correlation coefficient (r) between log body size and log sprint speed from Bauwens et al. (1995).

PhylogenyGLSGLS bootstrapEGLSEGLS bootstrapREMLREML bootstrap
I (star)  0.466  0.4541 (−0.11, 0.81)0.478  0.4651 (−0.15, 0.85)0.497  0.4861 (−0.098, 0.85)
C (true)  0.022  0.0171 (−0.57, 0.56)0.025  0.0331 (−0.60, 0.65)0.341  0.3311 (−0.28, 0.81)

PhylogenyGLSGLS bootstrapEGLSEGLS bootstrapREMLREML bootstrap
I (star)  0.466  0.4541 (−0.11, 0.81)0.478  0.4651 (−0.15, 0.85)0.497  0.4861 (−0.098, 0.85)
C (true)  0.022  0.0171 (−0.57, 0.56)0.025  0.0331 (−0.60, 0.65)0.341  0.3311 (−0.28, 0.81)

1

Mean of the parametric bootstrap distribution of r.

2

95% parametric bootstrap confidence intervals from 2000 replication data sets.

To investigate correlations between multiple pairs of traits, we estimated r for the three pairs of traits: body mass, sprint speed, and hind-limb length using GLS, EGLS, and REML. To implement REML, we estimated correlations in both a pairwise fashion (pairwise REML) and simultaneously for all three traits (joint REML). Joint REML is the correct REML procedure, because REML estimation is based on the likelihood of the entire data set. (Our Matlab program automatically implements joint REML.) Thus, information about the correlation between traits x and y, and between traits y and z, is used in the estimation of the correlation between traits x and z. Stated another way, the estimates of the pairwise correlations among traits are not independent. This differs from the case without measurement error, where estimates of pairwise correlations are independent. Researchers might be tempted to calculate correlation coefficients separately in a pairwise fashion, particularly when large numbers of pairwise correlations are desired. We calculated pairwise REML estimates, even though this is not a correct procedure, to illustrate the problems that this can cause.

The estimated correlations for all three pairs of traits (body mass, sprint speed, hind-limb length) using pairwise REML tended to be larger than the GLS and EGLS estimates (Table 3). The joint REML estimates are slightly less high. The three-species correlation matrices obtained from both EGLS and pairwise REML are not valid, because they are not positive definite. The requirement that correlation matrices be positive definite is equivalent to the requirement that correlation coefficients be between −1 and +1; just as it makes no sense for correlation coefficients to be greater than +1, it makes no sense for a correlation matrix not to be positive definite. The failure of the correlation matrices obtained from EGLS and pairwise REML to be positive definite is caused by the low estimated correlation between body size and sprint speed, rxy. Because traits x (log body size) and z (log hind-limb length) are highly correlated, and traits y (log sprint speed) and z are highly correlated, traits x and y must also be highly correlated for the correlation matrix to be positive definite, but this condition is not met for EGLS and pairwise REML. In contrast, the correlation matrices obtained from GLS (for which pairwise estimates of correlation coefficients are independent) and joint REML (which estimates all correlation coefficients simultaneously) are positive definite. This particular data set is prone to the problem of estimated correlation matrices not being positive definite, because the sample size is small and the correlations between traits are high. Nonetheless, this problem is likely to arise frequently in similar data sets.

Table 3.

Estimates of correlation coefficients and loadings on the first principal component (PC 1) for traits log body mass (x), log sprint speed (y), and log hind-limb length (z) provided by Bauwens et al. (1995) for 13 lizard species.

Loading

Methodrxyrxzryz%Variance PC 1xyz
GLS (no m.e.)  0.022  0.845  0.491  0.66  0.36  0.22  0.42 
EGLS  0.025  0.867  0.550       
Pairwise REML  0.341  0.899  0.799       
Joint REML  0.257  0.887  0.635  0.74  0.34  0.27  0.39 

Loading

Methodrxyrxzryz%Variance PC 1xyz
GLS (no m.e.)  0.022  0.845  0.491  0.66  0.36  0.22  0.42 
EGLS  0.025  0.867  0.550       
Pairwise REML  0.341  0.899  0.799       
Joint REML  0.257  0.887  0.635  0.74  0.34  0.27  0.39 

1

The covariance matrix obtained from pairwise analyses was not positive definite.

Table 3.

Estimates of correlation coefficients and loadings on the first principal component (PC 1) for traits log body mass (x), log sprint speed (y), and log hind-limb length (z) provided by Bauwens et al. (1995) for 13 lizard species.

Loading

Methodrxyrxzryz%Variance PC 1xyz
GLS (no m.e.)  0.022  0.845  0.491  0.66  0.36  0.22  0.42 
EGLS  0.025  0.867  0.550       
Pairwise REML  0.341  0.899  0.799       
Joint REML  0.257  0.887  0.635  0.74  0.34  0.27  0.39 

Loading

Methodrxyrxzryz%Variance PC 1xyz
GLS (no m.e.)  0.022  0.845  0.491  0.66  0.36  0.22  0.42 
EGLS  0.025  0.867  0.550       
Pairwise REML  0.341  0.899  0.799       
Joint REML  0.257  0.887  0.635  0.74  0.34  0.27  0.39 

1

The covariance matrix obtained from pairwise analyses was not positive definite.

Using the estimated correlation matrices, we performed a PCA (Sokal and Rohlf, 1981), calculating the first PC axis and corresponding loadings (Table 3). The high correlations obtained from the joint REML caused 74% of the correlation to be captured by the first principal components axis (PC1). In contrast, the PC1 using the GLS estimates was 66%. Thus, incorporating measurement error reveals a stronger correlation structure in the data. Because the correlation matrices obtained from EGLS and pairwise REML are not positive definite, the resulting PCAs are invalid.

Simulation

To investigate the properties of the estimators of r, we performed a simulation study based on the example. Specifically, we simulated data for 13 species having the phylogeny of the 13 species studied by Bauwens et al. (1995). We assumed that two traits x and y followed Brownian motion evolution up the phylogenetic tree with rates σx = 0.86, σy = 0.28, and r = 0.83. We assumed that the standard deviation of the measurement on a single animal is 4 times the standard error reported by Bauwens et al. (1995) for body mass and sprint speed, and that data from n = 2k (k = 0, 1, …, 6) individuals were obtained for each species. Thus, when k = 4 (n = 16), the measurement error variance is equal to that reported in Bauwens et al. (1995), and higher variance occurs for smaller sample sizes. For each of 2000 simulated data sets at each sample size n, we estimated parameters using both EGLS and REML.

After accounting for measurement error, REML estimates of r had only slight downward bias, with the approximate expectation ranging between 0.813 and 0.821 for a true value of r = 0.83 (Fig. 3). The EGLS estimates had greater downward bias for small sample sizes. In contrast, the GLS estimates that ignore measurement error had much greater downward bias. In addition to having less bias than the EGLS estimates, the REML estimates were also consistently more precise, with narrower 95% inclusion bounds. Note also that the distribution of the REML estimates is highly skewed; the upper 95% inclusion bound never exceeds 0.88, while the lower inclusion bound drops to almost zero. This is expected, as r is constrained to be less than or equal to one (see also Martins and Garland, 1991).

Figure 3

The variation that exists among individuals within a species is termed interspecific variation.

(a) EGLS and (b) REML estimates of the correlation coefficient r from simulated data sets based on Bauwens et al. (1995). Solid lines give estimates accounting for measurement error, and the corresponding 95% bounds of the estimate are given by the shaded region. Dashed lines give the estimate and 95% bounds of the estimate obtained without accounting for measurement error (GLS). We assume there are 13 species with phylogeny given by the phylogeny for the 13 lizards analyzed by Bauwens et al. (1995). Both traits evolve according to Brownian motion evolution, with σx = 0.86, σy = 0.28, and r = 0.83. Measurement error for measurements on single individuals is assumed to have standard error equal to 4 times the species standard errors provided by Bauwens et al. (1995). For each simulated sample size n = 2k (k = 0, 1, …, 6), 2000 data sets were simulated, and estimates for each parameter were computed.

Regression

A statistical model for regression with phylogenetic relatedness and measurement error is given by

The variation that exists among individuals within a species is termed interspecific variation.

where, as before, X* and Y* represent the true values of traits x and y among species, and X and Y are the values observed with measurement error ηx and ηy. Because values of the independent trait x will likely be phylogenetically correlated, we assume εx has covariance matrix Cx, and because residual variance in Y* given by εy is also likely to be phylogenetically correlated, we assume εy has covariance matrix Cy. Following standard regression, we assume that variations in X* and εy are independent. As before, we assume that measurement error may be correlated for traits x and y within species, leading to the measurement error covariance matrix given in Equation 8. If there is no measurement error (Mx = My = 0), then this problem reduces to phylogenetic regression that can be solved with GLS (e.g., as implemented in the Matlab REGRESSION.M program of Blomberg et al., 2003) or independent contrasts (Garland and Ives, 2000).

Just like the correlation model, the regression model can be written in terms of W given by Equation 9, leading to the covariance matrix

The variation that exists among individuals within a species is termed interspecific variation.

Thus, the regression model can be analyzed like the correlation model, with the covariance matrix for σ2 Ψ given in Equation 12 replacing that in Equation 10. More than one independent variable (multiple regression) can be incorporated in a similar manner and different branch lengths for different traits can be used, as done in our Matlab programs.

Estimation

Like univariate analyses and correlation, EGLS, ML, and REML can be used for estimation ( Appendix 1). Here we consider all three in analyzing an example, and study REML in more detail with a simulation.

Example

As in the example of correlation, we analyzed the data from Bauwens et al. (1995). Table 4 gives GLS, EGLS, ML, and REML estimates of the slope b1 for the regression of log hind-limb length on log body size. The estimates under the assumption of no phylogeny relatedness (Cx = Cy = I) are similar for all three methods incorporating measurement error. Furthermore, the parametric bootstrap confidence intervals are similar to the approximate confidence intervals obtained for EGLS and ML. The only differences among the statistical analyses is the relatively low estimate of b1 obtained for EGLS when the true phylogeny of the species is used.

Table 4.

Estimates of regression slope b1 for log hind-limb length regressed on log body mass for 13 species of lizards from Bauwens et al. (1995).

PhylogenyGLS (no m.e.)EGLSEGLS bootstrapMLML bootstrapREMLREML bootstrap
I (star)  0.305 (0.19, 0.42)0.307 (0.20, 0.41)0.307 (0.21, 0.41)0.310 (0.21, 0.41)0.312 (0.21, 0.42)0.309  0.310 (0.21, 0.41)
C (true)  0.224 (0.13, 0.32)0.232 (0.14, 0.33)0.231 (0.14, 0.32)0.263 (0.17, 0.36)0.265 (0.19, 0.35)0.260  0.261 (0.18, 0.35)

PhylogenyGLS (no m.e.)EGLSEGLS bootstrapMLML bootstrapREMLREML bootstrap
I (star)  0.305 (0.19, 0.42)0.307 (0.20, 0.41)0.307 (0.21, 0.41)0.310 (0.21, 0.41)0.312 (0.21, 0.42)0.309  0.310 (0.21, 0.41)
C (true)  0.224 (0.13, 0.32)0.232 (0.14, 0.33)0.231 (0.14, 0.32)0.263 (0.17, 0.36)0.265 (0.19, 0.35)0.260  0.261 (0.18, 0.35)

1

95% confidence interval from GLS.

2

95% confidence interval using the approximate standard error obtained from the GLS formulae.

3

95% confidence interval from parametric bootstrapping.

4

95% confidence interval from a t-distribution ML.

Table 4.

Estimates of regression slope b1 for log hind-limb length regressed on log body mass for 13 species of lizards from Bauwens et al. (1995).

PhylogenyGLS (no m.e.)EGLSEGLS bootstrapMLML bootstrapREMLREML bootstrap
I (star)  0.305 (0.19, 0.42)0.307 (0.20, 0.41)0.307 (0.21, 0.41)0.310 (0.21, 0.41)0.312 (0.21, 0.42)0.309  0.310 (0.21, 0.41)
C (true)  0.224 (0.13, 0.32)0.232 (0.14, 0.33)0.231 (0.14, 0.32)0.263 (0.17, 0.36)0.265 (0.19, 0.35)0.260  0.261 (0.18, 0.35)

PhylogenyGLS (no m.e.)EGLSEGLS bootstrapMLML bootstrapREMLREML bootstrap
I (star)  0.305 (0.19, 0.42)0.307 (0.20, 0.41)0.307 (0.21, 0.41)0.310 (0.21, 0.41)0.312 (0.21, 0.42)0.309  0.310 (0.21, 0.41)
C (true)  0.224 (0.13, 0.32)0.232 (0.14, 0.33)0.231 (0.14, 0.32)0.263 (0.17, 0.36)0.265 (0.19, 0.35)0.260  0.261 (0.18, 0.35)

1

95% confidence interval from GLS.

2

95% confidence interval using the approximate standard error obtained from the GLS formulae.

3

95% confidence interval from parametric bootstrapping.

4

95% confidence interval from a t-distribution ML.

In this example incorporating phylogenetic relatedness caused a large decrease in the estimates of b1, whereas measurement error had relatively little effect. Interestingly, the 95% confidence intervals obtained with phylogenetic information excluded the slope of 1/3 that would be expected for geometric similarity when EGLS was used, but not with ML and REML. In this case, selecting an estimation approach does make a difference in interpreting the results, at least if a confidence level of 95% is strictly adhered to. Unfortunately, in this case there is no ground to statistically prefer one estimation method over another, because all methods showed little bias. In rare situations like this, all we can recommend is to report the results cautiously.

Simulation

We designed simulations similar to previous simulations (Figs. 2 and 3) to investigate the effect of measurement error by varying the sample sizes of individuals measured per species. We also wanted to compare data sets with different numbers of species; increasing the number of species will not decrease the measurement error, but it should decrease the variance of the parameter estimates by providing more information about the relationship between the two traits. Thus, we wanted to compare the variance in the estimate of b1 when the number of individuals sampled from the same species is increased versus when the number of species sampled is increased. We only consider REML estimation, because EGLS and ML give similar results.

For the simulation, we assumed that there were either 13 or 49 species. For the 13-species case, we used the phylogeny for 13 lizards given by Bauwens et al. (1995). For the 49-species case, we used the phylogeny for 49 Carnivora and ungulates from Garland et al. (1993). We set the true value of b1 = 1/3 as would be expected if the dependent variable was the log of a linear dimension (e.g., leg length), the independent variable was the log of body mass, and species of different body size were geometrically similar. We set the other parameters equal to the REML estimates from the Bauwens et al. (1995) data using the full measurement error model with the true phylogeny (Table 4). For the 13-species phylogeny, we assumed that the standard deviation of measurement error for a single individual was 9 times greater than the standard error reported by Bauwens et al. (1995); we used such a large measurement error because the true measurement error did not have a strong effect on the analyses of the true data. For the 49-species phylogeny, we assigned standard errors to the species by randomly selecting from the 13 standard error values used in the 13-species simulation. We assumed that measurement errors between traits were independent.

In both 13- and 49-species cases, the REML estimate of b1 incorporating measurement error was at most slightly biased, whereas the GLS estimate of b1 without measurement error was badly biased when the number of individuals sampled per species was small (Fig. 4). The bias of the GLS estimates was nearly the same for both the 13- and 49-species data sets, illustrating that bias due to measurement error does not depend on the number of species sampled, only on the precision of the measurements for each species (and hence the number of individuals sampled per species). Nonetheless, the confidence intervals of the estimates of b1 become narrower with increasing numbers of species. This accentuates the statistical problems that can arise from bias. In the 49-species case, the true value of b1, 1/3, is excluded from the 95% inclusion interval of the estimates when sample sizes n are small, so the hypothesis that b1 = 1/3 would be rejected even though we know that the true value of b1 is 1/3! Harmon and Losos (2005) discuss more generally the effect of measurement error on type I and type II errors in phylogenetic analyses.

Figure 4

The variation that exists among individuals within a species is termed interspecific variation.

REML estimates of the slope of a regression of simulated data sets. In (a) simulated data consisted of 13 species having phylogeny given by Bauwens et al. (1995), and in (b) there are 49 species with the phylogeny given by Garland et al. (1993). Solid lines give estimates accounting for measurement error, and the corresponding 95% bounds of the estimate are given by the shaded region. Dashed lines give the estimate and 95% bounds of the estimate obtained without accounting for measurement error (GLS). Both traits evolve according to Brownian motion evolution, with b1 = 1/3, σx = 0.8, and σy = 0.1. In (a) the measurement error for measurements on a single individual is assumed to have standard error equal to 9 times the species standard errors provided by Bauwens et al. (1995), whereas in (b) measurement errors are selected at random from these 13 values. For each sample size n = 2k (k = 0, 1, …, 6), 2000 data sets were simulated. Comparable figures using EGLS and ML estimation were not qualitatively different.

Although increasing sample sizes n per species will reduce measurement error and therefore give more precise estimates of b1, precision of the estimates of b1 is also limited by the number of species in the data set. For these examples, increasing sample sizes per species provides only moderate improvement in the precision in the estimates of b1 once the sample size exceeds 4. However, the standard deviation of the estimates of b1 is decreased by roughly 50% when there are 49 species relative to 13 species. This is because increasing the number of species increases information about the relationship between traits x and y; this information is contained in the variance among species.

Functional Relation Models

Regression is the appropriate statistical model to apply when one variable is causally determined by another, or if one variable is to be predicted by the value of another variable. However, in many biological questions the goal is to understand how two traits are functionally related without assigning a direction of causality. For example, one might be interested in the relationship between tail length and leg length among a group of species. The problem of functional relations has been addressed with “general structured relations” models (Rayner, 1985), of which reduced major axis (RMA) regression is the most frequently used special case. Here we develop functional relation models that incorporate phylogenetic correlation and measurement error in a flexible way. We then present some of the special cases that are derived for the non-phylogenetic case by Rayner (1985). Our goal in presenting these cases is to show that the models derived by Rayner (1985) can be easily expanded to include phylogenies and measurement error.

Our functional relation model is

The variation that exists among individuals within a species is termed interspecific variation.

Variation in the true value of x among species, X*, is given by γx having covariance matrix σγ x2Cγ x. Variation in the observed values X and Y is divided into two components. The measurement errors are given by η x and ηx with covariance matrices σmx2Mx, σmy2My, and rmσmxσmyMxy, which we assume are known (see Equation 8). The observed values of X and Y also depend on unknown sources of variation given by εx and εy with covariance matrices σx2Cx, and σy2Cy, and cross-covariance matrix r σxσyCxy. The errors ε x and εy represent biological variation in X and Y that may have a phylogenetic component, and may additionally contain unknown measurement error not captured by ηx and ηy.

The model given by Equation 13 contains seven parameters: ax, b0, b1, σγ x2, σx2, σy2, and r. However, the information available from a data set allows only five parameters to be estimated. This can be explained heuristically by noting that a data set provides five pieces of information about the distribution of x and y: the means of x and y, their variances, and the covariance between them. This presents the problem of statistical identifiability, which appears frequently in measurement error models (Fuller, 1987). If no additional information is available, then the only solution to the identifiability problem is to make assumptions about the values of parameters or the mathematical relationships among them. For example, if it is assumed that there is no unknown variation in x (σx2 = r = 0), then Equation 13 reduces to the regression model of y on x given by Equation 11. Conversely, if there is no unknown variation in y (σy2 = r = 0), then Equation 13 reduces to a regression of x on y. Finally, if it is assumed that there is no variation in X* (σγ x2 = 0, and hence b1 = 0), then the model reduces to the correlation model given by Equation 6.

Additional special cases can be derived by recognizing that Equation 13 is a generalization of the general structural relation model of Rayner (1985); specifically, the general structural equation model is obtained when there is no phylogenetic correlation, Cγ x = Cx = Cy = I, and the measurement error is zero, Mx = My = Mxy = 0; a proof is given in  Appendix 4. RMA regression is then derived as a special case assuming that there is no correlation between εx and εy (r = 0), and the ratio of standard deviations of εx and ε y satisfies σy/σx = b1 (Rayner, 1985). A specific property of RMA regression is that there is no causal directionality between traits x and y, because the RMA regression of trait y on x is equivalent to the RMA regression of x on y. Specifically, the first two lines of Equation 13 could be written equivalently as Y* = ay+ γy and X* = β0+ β1Y*, where ay = b0+ b1ax, β0 = −b0/b1, and β1 = 1/b1. Furthermore, because σy/σx = b1 for RMA regression, σx/σy = 1/b1 = β1. Therefore, in RMA regression either x or y can be treated as the “dependent variable.”

Phylogenetic relatedness can be incorporated into RMA regression by removing the assumption that Cγx = Cx = Cy = I. In this case, the estimate of b1 (i.e., the RMA slope) in the absence of measurement error is

The variation that exists among individuals within a species is termed interspecific variation.

where

The variation that exists among individuals within a species is termed interspecific variation.
x and
The variation that exists among individuals within a species is termed interspecific variation.
y are the phylogenetically correct means of x and y given by Equation 3. In the phylogenetic version of the general structural relation model, all statistical tests and confidence intervals can be calculated by modifying standard formulae (Rayner, 1985). In particular, the (1 −α) % confidence intervals of the estimate of b1 are

The variation that exists among individuals within a species is termed interspecific variation.

where

The variation that exists among individuals within a species is termed interspecific variation.

Another special case can be derived by assuming that the relative magnitudes of variances in ε x and εy are known to be a constant k (k = σy/σx), and εx and εy are uncorrelated (r = 0). This might be a reasonable model when a researcher has limited information about variation in εx and εy—enough to assign relative but not absolute magnitudes of variance between traits, as was the case for Pagel and Harvey (1989). We refer to this case as VRF (variance ratio fixed) regression. With the assumptions that k = σy/σx and r = 0, parameters ax, b0, b1, σγx2, and σx2 can be estimated from Equation 13.

Measurement error can be incorporated into special cases of the general structural relation model (in particular RMA and VRF regression) by explicitly considering the covariance matrix for the observed values of traits x and y. Specifically, for the bivariate case parameters can be estimated from the covariance matrix σ2 Ψ for the vector W constructed by placing X on top of Y:

The variation that exists among individuals within a species is termed interspecific variation.

Different special cases are given by imposing different restrictions on the parameters; for example, for RMA regression, r = 0 and σy/σx = b1.

From a statistical perspective, the numerous different general structural relation models that can be derived as special cases from Equation 13 might all be considered equally valid. Thus, correlation, regression, and RMA regression are all similarly well defined statistically. Nonetheless, different statistical models are prone to different mistakes in interpretation. For example, if RMA regression is applied to two traits that are independent, the expectation for the estimated slope will be 1, even though there is no relationship between traits. In this case, the way to guard against misinterpreting the RMA slope is to pay attention to confidence intervals; if the traits are independent, the confidence intervals will be wide. Also, a correlation analysis will reveal lack of correlation between the traits. Here, we do not want to condone or condemn the use of RMA regression and other structural relation models, but we do believe in caution when interpreting their results.

Estimation

For the general case with measurement error (Equation 16), EGLS, ML, and REML estimation can be used. Here we restrict attention to ML estimation. An advantage of ML over EGLS estimation is that likelihoods can be used to compare different formulations of the model. Mixing and matching different assumptions, for example whether or not the error terms εx and ε y contain phylogenetic correlations, lead to multiple possible models, and ML can be used to sort models and find the best. It is also possible to use REML estimation, although interpreting likelihoods computed from REML is not as straightforward as ML, so we prefer ML estimation. For structural relation models, we found little bias in ML estimates, so this major limitation of ML estimation found for other problems (especially those involving estimates of variances and correlations) did not occur.

We provide Matlab programs for RMA and VRF regression.

Example

For the example, we analyzed the same log body mass and log hind-limb length data from the 13 lizard species in Bauwens et al. (1995) used in the regression example. We consider three pairs of models. First, we use RMA regression and its phylogenetic counterpart in which estimates of b1 and confidence intervals are given by Equations 14 and 15 when there is no measurement error (Mx = My = Mxy = 0). We refer to these models as rma(I) and rma(C), respectively. Second, we consider the nonphylogenetic and phylogenetic pair of VRF regressions in which k = σy/σx and r = 0, and there is no measurement error. For the value of k, we use the simple average standard errors of log-transformed traits x (0.0495) and y (0.0177) reported by Bauwens et al. (1995). This mimics the case in which a researcher has only rough information about the variation in traits and assumes that the total unexplained variation in traits x and y is proportional to their within-species variation estimated from the standard error. We refer to these models as vrf(I) and vrf(C) models, respectively. Third, we consider the pair of nonphylogenetic and phylogenetic models that make the assumptions b1 = σy/σx and r = 0 as in RMA regression, but Mx and My are derived from the standard errors reported by Bauwens et al. (1995). We refer to these as rmaM(I) and rmaM(C), respectively. We do not consider measurement error in the VRF regression model, because we have already used information about the measurement error to select a value of k.

The nonphylogenetic versions of all three models give very similar estimates of the functional relation slope b1 (Table 5). The similarity between the RMA and VRF models is due to the fact that for the standard errors reported in Bauwens et al. (1995), k = 0.36 which, by coincidence, is very close to the value of b1 = 0.347, which is assumed to equal k = σy/σx in RMA regression. The estimates of b1 for all phylogenetic versions of the models are lower than the nonphylogenetic versions. However, the lower log-likelihoods for the phylogenetic versions indicate that they fit the data more poorly than the nonphylogenetic models. Thus, on statistical grounds the estimates from the nonphylogenetic versions are preferred. To formally arbitrate between phylogenetic and nonphylogenetic models, the best approach is to introduce a parameter that explicitly governs the strength of phylogenetic correlation. For example, Blomberg et al. (2003) derive a transform from an Ornstein-Uhlenbeck process, which introduces a parameter d into the covariance matrix C that dictates the strength of phylogenetic correlation; incorporating this into the models (as done for the case of no measurement error in REGRESSIONv2.m) would allow tests of phylogenetic strength and selection of the best estimate of b1 (see also Grafen, 1989; Freckleton et al., 2002). In our experience with real data sets, it is often the case that even a slight distortion of the phylogenetic tree to make it somewhat more star-like yields substantially improved fits.

Table 5.

Estimates of functional relation slope b1 for log body mass and log hind-limb length for 13 species of lizards from Bauwens et al. (1995).

ModelAssumptionsPhylogenyML estimate of b1Bootstrap ML estimate of b1Log-likelihood
RMAr = 0  I (star)  0.347 (0.25, 0.45)1 (0.24, 0.51)0.349 (0.25, 0.46)8.64 
  σy/σx = b1  C (true)  0.265 (0.18, 0.35)1 (0.17, 0.42)0.267 (0.19, 0.36)3.72 
VRFr = 0  I (star)  0.346 (0.23, 0.46)0.349 (0.24, 0.49)8.64 
  σy/σx = k = 0.36  C (true)  0.251 (0.15, 0.35)0.253 (0.16, 0.36)3.72 
RMA with measurement error  r = 0  I (star)  0.349 (0.25, 0.45)0.352 (0.26, 0.47)8.69 
  σy/σx = b1  C (true)  0.292 (0.20, 0.38)0.295 (0.22, 0.39)5.13 

ModelAssumptionsPhylogenyML estimate of b1Bootstrap ML estimate of b1Log-likelihood
RMAr = 0  I (star)  0.347 (0.25, 0.45)1 (0.24, 0.51)0.349 (0.25, 0.46)8.64 
  σy/σx = b1  C (true)  0.265 (0.18, 0.35)1 (0.17, 0.42)0.267 (0.19, 0.36)3.72 
VRFr = 0  I (star)  0.346 (0.23, 0.46)0.349 (0.24, 0.49)8.64 
  σy/σx = k = 0.36  C (true)  0.251 (0.15, 0.35)0.253 (0.16, 0.36)3.72 
RMA with measurement error  r = 0  I (star)  0.349 (0.25, 0.45)0.352 (0.26, 0.47)8.69 
  σy/σx = b1  C (true)  0.292 (0.20, 0.38)0.295 (0.22, 0.39)5.13 

a

Reduced major axis regression.

b

Variance ratio fixed regression.

1

95% confidence interval from a t-distribution using the approximate standard error obtained from the information matrix in ML estimation.

2

95% confidence interval from parametric bootstrapping.

3

95% confidence interval from Equation 15.

Table 5.

Estimates of functional relation slope b1 for log body mass and log hind-limb length for 13 species of lizards from Bauwens et al. (1995).

ModelAssumptionsPhylogenyML estimate of b1Bootstrap ML estimate of b1Log-likelihood
RMAr = 0  I (star)  0.347 (0.25, 0.45)1 (0.24, 0.51)0.349 (0.25, 0.46)8.64 
  σy/σx = b1  C (true)  0.265 (0.18, 0.35)1 (0.17, 0.42)0.267 (0.19, 0.36)3.72 
VRFr = 0  I (star)  0.346 (0.23, 0.46)0.349 (0.24, 0.49)8.64 
  σy/σx = k = 0.36  C (true)  0.251 (0.15, 0.35)0.253 (0.16, 0.36)3.72 
RMA with measurement error  r = 0  I (star)  0.349 (0.25, 0.45)0.352 (0.26, 0.47)8.69 
  σy/σx = b1  C (true)  0.292 (0.20, 0.38)0.295 (0.22, 0.39)5.13 

ModelAssumptionsPhylogenyML estimate of b1Bootstrap ML estimate of b1Log-likelihood
RMAr = 0  I (star)  0.347 (0.25, 0.45)1 (0.24, 0.51)0.349 (0.25, 0.46)8.64 
  σy/σx = b1  C (true)  0.265 (0.18, 0.35)1 (0.17, 0.42)0.267 (0.19, 0.36)3.72 
VRFr = 0  I (star)  0.346 (0.23, 0.46)0.349 (0.24, 0.49)8.64 
  σy/σx = k = 0.36  C (true)  0.251 (0.15, 0.35)0.253 (0.16, 0.36)3.72 
RMA with measurement error  r = 0  I (star)  0.349 (0.25, 0.45)0.352 (0.26, 0.47)8.69 
  σy/σx = b1  C (true)  0.292 (0.20, 0.38)0.295 (0.22, 0.39)5.13 

a

Reduced major axis regression.

b

Variance ratio fixed regression.

1

95% confidence interval from a t-distribution using the approximate standard error obtained from the information matrix in ML estimation.

2

95% confidence interval from parametric bootstrapping.

3

95% confidence interval from Equation 15.

Note that the maximum log-likelihoods for the RMA and VRF models with the same phylogenetic assumptions are the same. This is a result of the identifiability problem of having seven parameters; different ways of constraining the model to give five parameters (the maximum that can be estimated) will all give the same maximum likelihoods. The ML approximate confidence intervals for b1 are close to the parametric bootstrap confidence intervals, demonstrating that the approximation is accurate. Finally, the ML approximate confidence intervals are better (using the bootstrap confidence intervals as the gold standard) than confidence intervals for RMA regression given by Equation 15.

Simulation

To explore the statistical properties of the estimators for the different models, we simulated data using the rmaM(C) model—the model with r = 0, b1 = σy/σx, and phylogeny and measurement errors given by Bauwens et al's. (1995) example of 13 lizards—using parameter values obtained from the fit of the model to the data (Table 5). For comparison, we also simulated the same model after increasing the standard deviations of the measurement errors by a factor of 4 (4× simulations). For 2000 simulated data sets, we fit the same six models as illustrated in the example (Table 5).

All estimators of b1 were unbiased in the 1× and 4× measurement error simulations (Figs. 5a and 5c). Heuristically, this can be explained by noting that the estimate of functional relation of y on x, b1, is the inverse of the functional relation of x on y, 1/b1 for the RMA and VRF models. If there were, for example, consistent downward bias in the estimate of the functional relationship, then the estimates of both b1 and 1/b1 would have to be downwards biased, which clearly is not simultaneously possible.

Figure 5

The variation that exists among individuals within a species is termed interspecific variation.

ML estimates of b1 in the functional relations model given by Equation 13 for simulated data when the standard deviations in measurement error are (a) those reported by Bauwens et al. (1995) and (c) four times these values. For each of 2000 simulated data sets, ML estimates of b1 were obtained for 6 model variants: rma(I) and rma(C), reduced major axis regression (b1 = σ x/σy, r = 0) with no phylogenetic correlation and phylogenetic correlation given by the lizard phylogeny of Bauwens (1995) under Brownian motion evolution; vrf(I) and vrf(C), the functional relations model having measurement error variance ratio fixed (k = σ x/σy, r = 0) with and without phylogenetic correlation; and rmaM(I) and rmaM(C), reduced major axis regression incorporating measurement error with and without phylogenetic correlation. Error bars give 95% inclusion intervals for estimates. Numerical convergence to the ML estimate did not occur for 1.2% of the data set estimation method combinations; nonconvergent cases were included in the 95% inclusion intervals but not in the mean estimates of b1. (b and d) The proportion of the 2000 simulated data sets corresponding to (a) and (c), respectively, in which a given model had the highest likelihood. Because the likelihoods of the rma and vrf models are the same, these are combined.

Despite absence of bias, there is considerable variability among models in the precision of the estimates of b1, as revealed by their 95% inclusion intervals. Not surprisingly, the greatest precision (smallest inclusion interval) was achieved by the model used to simulate the data, rmaM(C). The precision of the rma(C) model, however, was almost identical. The VRF models both had poor precision, particularly when there was large measurement error (Fig. 5c).

Surprisingly, the rmaM(C) model used to generate the data did not always fit the simulated data best (Figs. 5b, 5d); even in the high measurement error case (Fig. 5d), it was the best-fitting model for only a little over 50% of the simulated data sets, and in the low measurement error case (Fig. 5b) the rma(C) model was selected as the best-fitting model more frequently than the rmaM(C) model. Furthermore, the nonphylogenetic models were selected fairly frequently. In part this is due to the small sample size of 13 species; using simulation studies, Blomberg et al. (2003) showed that the reliable detection of phylogenetic signal (nonstar phylogenies) using univariate data sets requires at least 20 species. This example provides caution about the statistical ability to identify the correct statistical model from small data sets.

Discussion

We have developed phylogenetic models that incorporate measurement error to investigate four statistical problems: (i) univariate estimates of the mean, variance, and phylogenetic signal of a trait; (ii) correlation and principal component analysis; (iii) multiple regression; and (iv) bivariate functional relation models, of which reduced major axis regression is a special case. Measurement error is ubiquitous in comparative studies, as it encompasses a wide variety of sources of variation, including variation caused by instrumentation and techniques, variation among individuals within a given population, and variation among populations of the same species. We have treated measurement error as variability that is given by the standard error of mean trait values for species (or populations); when this information is available, it can be used to improve statistical tests in many cases.

When measurement error is large, it can obscure patterns of variation and covariation in data. In the univariate case, measurement error obscures the pattern of phylogenetic correlation in trait values among species, thereby leading to underestimates of the strength of phylogenetic signal measured for a trait. It can also adversely affect estimates of ancestral values. For example, in a study of muscle fiber type composition of 24 species of lizards, in which four individuals were measured in each species, Bonine et al. (2005) found that although the point estimates for ancestral values were virtually unaffected by incorporating standard errors, the confidence intervals about them were narrowed, which would enhance statistical power for testing hypotheses about particular nodes.

Measurement error similarly obscures covariation between traits, leading to underestimates of correlation coefficients. The consequences of ignoring measurement error are likely to be severe for univariate estimates of phylogenetic signal and bivariate correlations. This is because measurement error increases the variance among observed trait values for species, thus diminishing the observed correlation among trait values. A particular concern arises when comparing the strength of phylogenetic signal among numerous traits. For example, Blomberg et al. (2003) compared 121 traits corresponding to 35 phylogenetic trees, finding that behavioral traits exhibit lower phylogenetic signal than body size, morphological, life history, and physiological traits. Although this result matches the long-standing idea that behavioral traits are evolutionarily labile, it could also be caused by behavioral traits having higher measurement error, as Blomberg et al. (2003) duly noted. Not all of the data sets analyzed by Blomberg et al. (2003) included standard errors of the species trait values, but future analyses could test the hypothesis that behavioral traits show lower phylogenetic signal simply because they show greater within-species standard errors.

In regression, measurement error leads to bias, generally downward, in the estimates of slopes. The magnitude of this bias obviously depends on the magnitude of the measurement error. However, the consequences of this bias for statistical tests depends also on the number of species analyzed and hence the precision of the estimates of the slope. When the number of species is large, the slope estimates have lower standard errors. This heightens the danger of bias caused by measurement error, because the wrong estimate is “known” with greater precision. This argument also applies to the other statistical models we have investigated, leading to the somewhat counter-intuitive recommendation that it is more important to incorporate measurement error when analyzing large data sets (see also Harmon and Losos, 2005). This recommendation makes more sense, however, when realizing that increasing the number of species will increase information about regression slopes, correlations, etc., thereby reducing the variation in parameter estimates that is not associated with measurement error and hence making the variation due to measurement error relatively greater. The greater the proportion of variation in parameter estimates caused by measurement error, the greater is the impact of not accounting for measurement error.

RMA regression and other forms of bivariate general structural relation models are often recommended as replacements for simple regression when there is variation in both x and y variables; thus, they are recommended when measurement error exists. We prefer to think of the broad suite of functional relation models explicitly in terms of Equation 13, in which there is a linear relationship between traits x and y that is modified by variation in x and y, εx and ε y, that might or might not depend on phylogeny. Measurement error is then additional within-species variation (from any number of sources) that is known to the researcher (i.e., standard errors). The explicit accounting of sources of variation given by Equation 13 makes clearer the assumptions of RMA regression and other functional relation models. For the special cases of functional relation models in which the estimate of the slope b1 of y on x is the reciprocal of the slope 1/b1 of x on y (such as RMA regression), the estimates of the slope are unbiased even when measurement error is not explicitly accounted for. This is a reason to use RMA regression when a researcher knows that measurement error exists but does not know its magnitude. (Of course, one must always be cautious when the traits are actually independent, because RMA regression gives the nonsensical result that the slope is unity for independent traits.) Despite being unbiased, however, incorporating phylogeny and measurement error makes the estimates of the slope b1 more precise. Unfortunately, the increase in precision requires knowing which model best fits a data set, and when data sets contain few species, identifying the best-fitting model may be statistically difficult (Figs. 5b and 5d). Thus, our recommendation is to approach functional relation problems with a variety of plausible models, examine each, and determine the robustness of any conclusions to the choice of models.

It is important to note that RMA regression is a bivariate technique and that it is not actually a formal “measurement error method” (see also Rayner, 1985:417–418); for example, we analyzed RMA regression models with and without the formal inclusion of measurement error. This raises important issues when one wishes to estimate a functional relations slope but also needs to control statistically for nuisance variables and/or other factors. As an example of the former, some studies report whole body mass while others report fat-free body mass. As an example of the latter, whole clades may vary in general body shape, thus causing a “grade shift” (e.g., see Garland et al., 1993, 2005) in the relation between, say, leg length and body size. A bivariate slope fitted to such heterogeneous data will be inappropriate. Unfortunately, RMA regression for more than two traits has yet to be developed, although it could be derived from a generalization of Equation 13. Until such methods are developed, practitioners might adopt a strategy of using (multiple) regression to remove effects of other covariates and factors, saving residuals, and analyzing residuals with bivariate RMA regression.

Throughout this article we have performed statistical tests assuming that interspecific correlations in trait values are known with certainty, given either by a star phylogeny or by the true phylogeny (which we assume in known without error) under the assumption of Brownian motion character evolution. These two assumptions about interspecific correlations can be considered as extremes in a continuum of phylogenetic signal (Garland et al., 2005). Rather than pick one of the extremes, the analyses can be performed by explicitly incorporating the strength of phylogenetic signal as a parameter in the models (Grafen, 1989; Freckleton et al., 2002; Huey et al., 2006). In this case, the phylogenetic covariance matrix C (or Cx and Cy, etc.) contains parameters that must be estimated. For example, Hansen (1997), Blomberg et al. (2003), and Butler and King (2004) provide branch-length transforms that correspond to evolution occurring as an Ornstein-Uhlenbeck process, with a parameter measuring the strength of phylogenetic signal (although the exact parameterization of the OU process differs among those sources). This parameter can be incorporated into the statistical models by letting the phylogenetic correlation matrices C depend on this parameter (Huey et al., 2006; Ives and Godfray, 2006). The parameter can then be estimated simultaneously with all of the other parameters of the model. A particular advantage of this approach is that it overcomes the need to identify whether the model using the true phylogeny fits the data better than the model with a star phylogeny (e.g., Table 5); arbitration between these extremes is made while estimating the Ornstein-Uhlenbeck parameter simultaneously with the other parameters. In general, incorporating parameters that control the phylogenetic covariance matrix C is straightforward for all of the different problems we have considered in this article, and the estimation techniques can be applied with little modification, as we intend to provide in future versions of our Matlab programs.

Throughout our analyses, we have assumed that information on standard errors for mean species trait values are available. However, even more limited information can be incorporated into the analyses. For example, incorporating an estimate of the average among-species measurement error for a trait can be done by simply giving all species the same standard error for a trait, or by weighting measurement errors according to sample sizes ( Appendix 3). Furthermore, if measurement errors are assumed to be the same for all species, then measurement errors can themselves be estimated along with the other parameters in the model. This involves estimating the variance σm2 in the models used throughout this article. This approach is analogous to models that separate among-species variation into phylogenetic and nonphylogenetic components (Lynch, 1991; Freckleton et al., 2002; Housworth et al., 2004; Rochet et al., 2006); in this case, the nonphylogenetic component represents measurement error.

Throughout this article we have used EGLS, ML, and REML estimation methods. For most problems and data sets, we suspect that they will all give similar results, provided measurement error is not too large. For most of the simulations we analyzed, REML outperformed EGLS, which in turn outperformed ML estimation. This does not mean, however, that ML estimation should not be used. An advantage of ML estimation is that the likelihoods from different models can be compared using a likelihood-ratio test, or the likelihood can be used to compute the Akaike information criterion (AIC) for model comparisons (Burnham and Anderson, 1998). In general, we suggest multiple methods be used; TG will provide Matlab (MathWorks, 1996) code that performs all three estimation methods for most of the models we have presented.

Much of the work developing phylogenetically based comparative methods over the last two decades has been done largely in isolation from a large body of statistical literature dealing with correlated data (e.g., Ives and Zhu, 2005). Nonetheless, there is much to be gained by applying relatively off-the-shelf approaches to comparative problems. For example, Paradis and Claude (2002) recently described how generalized estimating equations can be used to apply phylogenetic comparative methods to noncontinuous data, opening up comparison of discrete traits to phylogenetic analyses. The models presented throughout this manuscript are applicable to a broad range of comparative problems, and they easily surrender to well-worn statistical approaches. Answers to a wide range of additional problems in comparative analyses probably await within the statistical literature.

Acknowledgments

We thank Matt Helmus, Todd Oakley, Roderic Page, Jun Zhu, and two anonymous reviewers for helpful comments on the manuscript. This work was supported by NSF DEB-0196384 to TG and ARI.

References

Comparing phylogenetic signal in intraspecific and interspecific body size datasets

,

J. Evol. Biol.

,

2004

, vol.

17

 

(pg.

1157

-

1161

)

,  ,  ,  . 

Evolution of sprint speed in lacertid lizards—Morphological, physiological, and behavioral covariation

,

Evolution

,

1995

, vol.

49

 

(pg.

848

-

863

)

,  ,  . 

Testing for phylogenetic signal in comparative data: Behavioral traits are more labile

,

Evolution

,

2003

, vol.

57

 

(pg.

171

-

745

)

,  ,  . 

Muscle fibre-type variation in lizards (Squamata) and phylogenetic reconstruction of hypothesized ancestral states

,

J. Exp. Biol.

,

2005

, vol.

208

 

(pg.

4529

-

4547

)

,  . ,

Model selection and inference: A practical information-theoretic approach

,

1998

New York

Springer

,  . 

Phylogenetic comparative analysis: A modeling approach for adaptive evolution

,

Am. Nat.

,

2004

, vol.

164

 

(pg.

683

-

695

)

,  ,  . 

A comparison of two models for estimating phylogenetic effect on trait variation

,

Evolution

,

1997

, vol.

51

 

(pg.

262

-

266

)

,  . 

Note on estimation of parameters of autoregressive-moving average process

,

Biometrika

,

1977

, vol.

64

 

(pg.

625

-

628

)

,  . ,

An introduction to the bootstrap

,

1993

New York

Chapman and Hall

Phylogenies and the comparative method

,

Am. Nat.

,

1985

, vol.

125

 

(pg.

1

-

15

)

,

Inferring phylogenies

,

2004

Sunderland, Massachusetts

Sinauer Associates

,  ,  . 

Phylogenetic analysis and comparative data: A test and review of evidence

,

Am. Nat.

,

2002

, vol.

160

 

(pg.

712

-

726

)

,

Measurement error models

,

1987

New York

John Wiley & Sons

,  ,  ,  . 

Phylogenetic analysis of covariance by computer simulation

,

Syst. Biol.

,

1993

, vol.

42

 

(pg.

265

-

292

)

,  . 

Using the past to predict the present: Confidence intervals for regression equations in phylogenetic comparative methods

,

Am. Nat.

,

2000

, vol.

155

 

(pg.

346

-

364

)

,  ,  . 

An introduction to phylogenetically based statistical methods, with a new method for confidence intervals on ancestral values

,

Am. Zool.

,

1999

, vol.

39

 

(pg.

374

-

388

)

,  ,  . 

Phylogenetic approaches in comparative physiology

,

J. Exp. Biol.

,

2005

, vol.

208

 

(pg.

3015

-

3055

)

,  . 

Polytomies and phylogenetically independent contrasts: An examination of the bounded degrees of freedom approach

,

Syst. Biol.

,

1999

, vol.

48

 

(pg.

547

-

558

)

The phylogenetic regression

,

Trans. R. Soc. Lond. B Biol. Sci.

,

1989

, vol.

326

 

(pg.

119

-

157

)

Stabilizing selection and the comparative analysis of adaptation

,

Evolution

,

1997

, vol.

51

 

(pg.

1341

-

1351

)

,  . 

Translating between microevolutionary process and macroevolutionary patterns: The correlation structure of interspecific data

,

Evolution

,

1996

, vol.

50

 

(pg.

1404

-

1417

)

,  . 

The effect of intraspecific sample size on type I and type II error rates in comparative studies

,

Evolution

,

2005

, vol.

59

 

(pg.

2705

-

2710

)

,  . ,

The comparative method in evolutionary biology

,

1991

Oxford, UK

Oxford University Press

Bayesian inference for variance components using only error contrasts

,

Biometrika

,

1974

, vol.

61

 

(pg.

383

-

385

)

,  . 

Random sampling of constrained phylogenies: Conducting phylogenetic analyses when the phylogeny is partially known

,

Syst. Biol.

,

2001

, vol.

50

 

(pg.

628

-

639

)

,  ,  . 

The phylogenetic mixed model

,

Am. Nat.

,

2004

, vol.

163

 

(pg.

84

-

96

)

,  . 

Detecting correlation between characters in a comparative analysis with uncertain phylogeny

,

Evolution

,

2003

, vol.

57

 

(pg.

1237

-

1247

)

,  ,  ,  ,  ,  ,  ,  . 

Evolution of sexual size dimorphism in a Drosophila clade, the Dobscura group

,

Zoology

,

2006

, vol.

109

 

(pg.

318

-

330

)

,  ,  ,  ,  ,  . 

A comparative analysis of clinging ability among pad-bearing lizards

,

Biol. J. Linn. Soc.

,

1996

, vol.

59

 

(pg.

21

-

35

)

,  . 

Phylogenetic analysis of trophic associations

,

Am. Nat.

,

2006

, vol.

168

 

(pg.

E1

-

E14

)

,  . 

Statistics for correlated data: Phylogenies, space, and time

,

Ecol. Appl.

,

2005

, vol.

16

 

(pg.

20

-

32

)

,  ,  ,  ,  . ,

The theory and practice of econometrics

,

1985

2nd edition

New York

John Wiley and Sons

,  ,  . 

Shared and unique features of evolutionary diversification in Greater Antillean Anolis ecomorphs

,

Evolution

,

2006

, vol.

60

 

(pg.

362

-

369

)

Methods for the analysis of comparative data in evolutionary biology

,

Evolution

,

1991

, vol.

45

 

(pg.

1065

-

1080

)

,  . 

The statistical analysis of interspecific data: A review and evaluation of comparative methods

,

Phylogenies and the comparative method in animal behavior

,

1996

Oxford, UK

Oxford University Press

(pg.

22

-

75

)

,  . 

Phylogenies and the comparative method: A general approach to incorporating phylogenetic information into the analysis of interspecific data

,

Am. Nat.

,

1997

, vol.

149

 

(pg.

646

-

667

)

erratum 153:448

,  . 

Estimating ancestral states of a communicative display: A comparative study of Cyclura rock iguanas

,

Anim. Behav.

,

1998

, vol.

55

 

(pg.

1685

-

1706

)

MathWorks

,

Matlab, version 5.0

,

1996

Natick, MA

The MathWorks, Inc.

,  ,  . ,

Applied linear regression models

,

1989

Homewood, Illinois

Richard D. Irwin

,  . 

How mammals produce large-brained offspring

,

Evolution

,

1988

, vol.

42

 

(pg.

948

-

957

)

,  . 

The taxon-level problem in the evolution of mammalian brain size—Facts and artifacts

,

Am. Nat.

,

1988

, vol.

132

 

(pg.

344

-

359

)

,  . 

Taxonomic differences in the scaling of brain on body-weight among mammals

,

Science

,

1989

, vol.

244

 

(pg.

1589

-

1593

)

,  . 

Analysis of comparative data using generalized estimating equations

,

J. Theor. Biol.

,

2002

, vol.

218

 

(pg.

175

-

185

)

,  . 

Recovery of inter-block information when block sizes are unequal

,

Biometrika

,

1971

, vol.

58

 

(pg.

545

-

564

)

,  . 

Polytomies in comparative analyses of continuous characters

,

Syst. Biol.

,

1993

, vol.

42

 

(pg.

569

-

575

)

Linear relations in biomechanics: The statistics of scaling functions

,

J. Zool. Lond. A

,

1985

, vol.

206

 

(pg.

415

-

439

)

,  . 

Applications of phylogenetically independent contrasts: A mixed progress report

,

Oikos

,

1996

, vol.

77

 

(pg.

167

-

172

)

,  ,  ,  . 

Comparative analysis of phylogenetic and fishing effects in life history patterns of teleost fishes

,

Oikos

,

2006

, vol.

91

 

(pg.

255

-

270

)

Comparative methods for the analysis of continuous variables: Geometric interpretations

,

Evolution

,

2001

, vol.

55

 

(pg.

2143

-

2160

)

A comment on phylogenetic correction

,

Evolution

,

2006

, vol.

60

 

(pg.

1509

-

1515

)

,  . 

A conditional likelihood approach to residual maximum likelihood estimation in generalized linear models

,

J. R. Stat. Soc. Ser. B Methodol.

,

1996

, vol.

58

 

(pg.

565

-

572

)

,  . ,

Biometry

,

1981

2nd edition

New York

W. M. Freeman & Company

Appendix 1 Estimation Methods

This appendix summarizes the estimation techniques we use: iterative EGLS, ML, and REML. We also give an overview of our parametric bootstrapping approach. We have structured the models throughout the article so that the estimation techniques can be applied in a similar way for all models.

Iterative EGLS Estimation

Estimated generalized least-squares is an extension of GLS for the case in which unknown parameters occur in the covariance matrix (Judge et al., 1985:169–187). For the univariate case given in Equation 5, the problem is that Ψ (θ) = C + (σm2/σ2) M has an unknown parameter θ = (σm2/σ2) that rules out application of GLS. EGLS is a standard procedure, and our iterative approach is a simple extension in which EGLS is repeated to obtain successively better estimates.

The iterative EGLS is initiated by choosing a value of θ; zero is an obvious choice in most cases. Conditioned on this value of θ, GLS can be used to estimate the mean a and variance σ2 from Equations 3 and 4. Because σm2 is known, a new estimate of θ is σm2[(N − 1/N)

The variation that exists among individuals within a species is termed interspecific variation.
2m]−1; here, the estimate of σ2 is multiplied by (N − 1)/N to give the maximum likelihood estimate, rather than the unbiased estimate given by Equation 4. This new value of θ is used to update Ψ (θ) = C + θM, and the procedure is iterated until estimates of a and σ2 converge.

The multivariate models are slightly more complex, and they employ a method-of-moments approach. For the case of regression (Equation 12), let Ψ x (θx) = Cx+ θ xMx and Ψy (θxy, θy) = Cy+ θ xyCx+ θyMy, where θx = σmx2/σx2, θxy = b12 σ x2/σy2, and θy = σmy2/σy2. With initial values of θx, θxy, and θy to give Ψx (θx) and Ψ y(θxy, θy) estimates of ax and σx2, and ay and σy2 are computed from Equations 3 and 4. Using the method of moments, an estimate of b1 is given by

The variation that exists among individuals within a species is termed interspecific variation.

These estimates of σx2, σy2, and b1 are then used to update θx, θxy, and θy, and the procedure is iterated until the estimates converge.

Approximate confidence intervals for the parameter estimates can be obtained using standard GLS formulae, treating Ψ(θ) as known (i.e., ignoring the uncertainty arising from estimating parameters θ). In addition, parametric bootstrapping can be used, which is particularly convenient for iterative EGLS because EGLS takes relatively little computing time. The statistical properties of EGLS estimators can be complex, but in general the EGLS estimates of a and b1 will be consistent, meaning that for large enough sample size the probability distributions of the estimators converge to the true parameter values (Judge et al., 1985:175–177). In practice, we have found that the iterative EGLS estimates are often similar to ML and REML estimates, and the iterative EGLS outperforms ML and REML for some tasks, such as estimating the phylogenetic signal K*.

ML Estimation

Maximum likelihood estimation consists of using the matrix Ψ (θ), containing parameters θ that need to be estimated, to construct the likelihood function, and then maximizing the likelihood function to obtain those parameter values for which the observed data set is the most likely. The log-likelihood function is

The variation that exists among individuals within a species is termed interspecific variation.

For the univariate case, a = a is a scalar giving the phylogenetic mean, and Z is the N × 1 vector of ones. For correlation of p traits, a is the 1 × p vector containing the means of the traits, and Z is the pN × p matrix given by Ip× p⊗1N×1 where Ip× p is the p ×p identity matrix, 1N×1 is a N × 1 vector of ones, and ⊗ denotes the Kronecker (inner) product. For regression with p−1 independent variables, a is the 1 ×p vector containing the intercept and slopes, and Z is the N ×p matrix containing ones in the first column and the independent variables in the remaining columns. To speed the numerical maximization of the log-likelihood function, it is possible to concentrate the likelihood function by replacing a and σ2 by their GLS estimates conditioned on θ and maximizing over θ alone. To account for the known bias of ML estimates of variances, we multiplied the ML estimates of variances by N/(N –p) to obtain a less biased estimate. Finally, maximization must be restricted so that estimates of variances are greater than zero.

Special considerations are necessary for the case of correlation to guarantee that the covariance matrix σ2Ψ(θ) remains positive definite while the log-likelihood function is maximized. Rather than maximize over the correlation coefficients for the multivariate case, we instead maximized over the parameters qij defined such that the correlation matrix R(θ) = Q(θ)2, where Q(θ) is symmetric having values of 1 along the diagonal and values of qij in the off-diagonals. Because R(θ) is the square of a symmetric matrix, it is necessarily positive definite. After maximizing over values of qij constrained to be between −1 and 1, estimates of the correlation coefficients are given as the off-diagonal elements of R(θ).

An advantage of ML estimation is that it can be used to provide approximate confidence intervals for the parameter estimates. In particular, let I(ϕ) =

The variation that exists among individuals within a species is termed interspecific variation.
L(ϕ |X) be the observed information matrix calculated at the ML parameter estimates ϕ. Then the approximate covariance matrix for the parameter estimates ϕ is given by I(ϕ)− 1. From this, standard errors and asymptotic confidence intervals can be calculated (Judge et al. 1985:177–182). Unfortunately, this is only an asymptotic result, and the accuracy of the approximation for small samples is generally unknown.

REML Estimation

Restricted maximum likelihood estimation is a variant of ML estimation in which the likelihood function is partitioned into components, allowing estimation of variance parameters in the model independently from the parameters involving means (Patterson and Thompson, 1971; Cooper and Thompson, 1977; Smyth and Verbyla, 1996). The marginal log-likelihood function from which variance parameters are estimated is (Harville, 1974)

The variation that exists among individuals within a species is termed interspecific variation.

where

The variation that exists among individuals within a species is termed interspecific variation.
GLS is the GLS estimate of a, and the remaining terms are as defined for ML estimation.

In general, we found that REML estimates of the variance parameters were less biased than those obtained from ML and iterative EGLS. Estimates of the parameters involving means (e.g., a, b0, and b1), however, were very similar to those obtained from ML and iterative EGLS. Confidence intervals for parameters involving means can be approximated using the formulae for standard GLS with the fitted covariance matrix σ2 Ψ (θ) or asymptotically using the information matrix.

Parametric Bootstrapping

For all estimation approaches, we used parametric bootstrapping to obtain confidence intervals. Parametric bootstrapping is useful because it not only produces confidence intervals but also identifies bias in the estimates. We performed parametric bootstrapping for a given model by first estimating parameters and then using the model fitted with these estimates to simulate 2000 data sets. For each of the simulated data sets, we estimated the parameters of the model. The resulting 2000 sets of parameter estimates approximate the distribution of the parameter estimators under the assumption that the model (with its fitted parameter values) is correct. Thus, the 95% inclusion intervals for the 2000 estimates give the approximate 95% confidence intervals for the parameter estimates. Bias in the estimates is revealed if the mean of the 2000 simulated parameter estimates differs substantially from the parameter values estimated from the data and used to construct the simulated data sets.

Appendix 2 Log-transforming Data with Measurement Error

Most empirical comparative studies report means and standard errors for species on the arithmetic scale. However, for many problems, such as allometric analyses, data need to be log-transformed before analysis. When measurement error exists, log-transformation requires not only transforming the standard error of the measured values but also the mean of the measured values. Assuming that the observed values follow a log-normal distribution with mean m and variance v, the mean and variance of the log-transformed data are

The variation that exists among individuals within a species is termed interspecific variation.
= log (m) −
The variation that exists among individuals within a species is termed interspecific variation.
and σ2 = log (
The variation that exists among individuals within a species is termed interspecific variation.
+ 1). These relationship can be derived directly from the probability density function of a log-normal distribution.

Appendix 3 Measurement Error with Small Sample Sizes

When sample sizes used to estimate trait values for some or all species are small, measurement error can be obtained by averaging among species. Suppose there are N species with standard errors SEi (i = 1, …, N) taken from ni individuals. Assume that each observation on each individual is measured with the same error. Let

The variation that exists among individuals within a species is termed interspecific variation.
i =
The variation that exists among individuals within a species is termed interspecific variation.
SEi, and let
The variation that exists among individuals within a species is termed interspecific variation.
2 =
The variation that exists among individuals within a species is termed interspecific variation.
∑i = 1N
The variation that exists among individuals within a species is termed interspecific variation.
i2. Then the standard error for a species with ni individuals estimated by pooling species is
The variation that exists among individuals within a species is termed interspecific variation.
(ni) =
The variation that exists among individuals within a species is termed interspecific variation.
/
The variation that exists among individuals within a species is termed interspecific variation.
. For data sets in which one or more species are represented by a single individual (ni = 1), these species can be excluded from the calculations and then assigned standard errors
The variation that exists among individuals within a species is termed interspecific variation.
(1) =
The variation that exists among individuals within a species is termed interspecific variation.
.

As an example, suppose there are two species with standard errors 2 and 3, and sample sizes 16 and 9. Then

The variation that exists among individuals within a species is termed interspecific variation.
2 =
The variation that exists among individuals within a species is termed interspecific variation.
(
The variation that exists among individuals within a species is termed interspecific variation.
× 2 +
The variation that exists among individuals within a species is termed interspecific variation.
× 3) =
The variation that exists among individuals within a species is termed interspecific variation.
, and the standard errors estimated from pooling species are
The variation that exists among individuals within a species is termed interspecific variation.
(16) = 2.125 and
The variation that exists among individuals within a species is termed interspecific variation.
(9) = 2.833.

Appendix 4 Demonstration that Equation 13 is the Model of Rayner (1985)

In the general structural equation model of Rayner (1985), the slope b1 of the relationship between two traits x and y is the value that minimizes the residual sum-of-squares given by

The variation that exists among individuals within a species is termed interspecific variation.

where Sxx, Sxy, and Syy are the sums-of-squares of the values X and Y, and Sξ ξ, Sξ η, and Sη η are the variances and covariances of the error terms. Here we derive the same expression from Equation 13 for the case when there is no phylogenetic correlation (Cγ x = Cx = Cy = I) and measurement error is zero (Mx = My = Mxy = 0). In this case, Sξ ξ = σx2Sξ η = rσxσy, and Sη η = σy2.

Let Z be the N × 2 matrix whose first and second columns consist of X

The variation that exists among individuals within a species is termed interspecific variation.
and Y
The variation that exists among individuals within a species is termed interspecific variation.
. From Equation 13, the covariance matrix of X and Y is given by

The variation that exists among individuals within a species is termed interspecific variation.

From this, the residual sum-of-squares, after some algebra, is

The variation that exists among individuals within a species is termed interspecific variation.

The least squares estimates of b1 and σγ2 are those values that minimize the value of S2. Taking the derivatives of S2 with respect to b1 and σ γ2 and setting them equal to zero,

The variation that exists among individuals within a species is termed interspecific variation.

The value of b1 that minimizes the sum-of-squares given by Rayner (Equation A4) is

The variation that exists among individuals within a species is termed interspecific variation.

Thus, the least-squares estimate of b1 computed for the model given by Equation 13 in the text equals the least-squares estimate of b1 computed from Rayner's function (Equation A4).

© 2007 Society of Systematic Biologists

© 2007 Society of Systematic Biologists

Associate Editor: Todd Oakley

Todd Oakley

Associate Editor

Search for other works by this author on:


What does the term endothermic refer to?

Definition of endothermic 1 : characterized by or formed with absorption of heat. 2 : warm-blooded.

What is the term for traits that reflect specific evolutionary lineages which can be informative of evolutionary relationships?

Term. Ancestral (primitive) Definition. Traits that reflect specific evolutionary lineages and can be informative of evolutionary relationships.

What is a major underlying factor for the declining numbers of nonhuman primates?

Humanity's population expansion is the main cause for the extinction threat, with 5 billion humans living in countries with primates. Habitat loss due to logging, mining and agriculture; hunting; the illegal pet trade; and climate change are all top reasons for the decline, Garber said.