Mathematical Modeling of Circadian Rhythms
In all organisms studied so far, circadian rhythms that allow adaptation to a periodically changing environment originate from negative autoregulation of gene expression. Scheper et al. [
10] illustrated and analyzed the generation of a circadian rhythm as a process involving a reaction cascade containing a loop, as depicted in Fig.
1A. The reaction loop consists in the production of the effective protein from its mRNA and negative feedback from the effective protein on mRNA production. The protein production process involves translation and subsequent processing steps such as phosphorylation, dimerization, transport and nuclear entry. It is assumed that the protein production cascade and the negative feedback are nonlinear processes in the reaction loop (Fig.
1B), with a time delay between protein production and subsequent processing. These nonlinearities and the delay critically determine the free-running periodicity in the feedback loop.
Scheper et al. [
10] proposed a system of coupled differential equations to describe the circadian behavior of the intracellular oscillator:

where M and P are, respectively, the relative concentrations of mRNA and the effective protein measured at a particular time, rM is the scaled mRNA production rate constant, rP is the protein production rate constant, qM and qP are, respectively, the mRNA and protein degradation rate constants, n is the Hill coefficient, m is the nonlinear exponent in the protein production cascade, τ is the total duration of protein production from mRNA, and k is a scaling constant.
Equation 1 constructs an unperturbed (free-running) system of the intracellular circadian rhythm generator that is defined by seven parameters, Θu = (n, m, τ, rM, rP, qM, qP, k). The behavior of this system can be determined and predicted by changes in these parameter combinations. For a given QTL, differences in the parameter combinations among genotypes imply that this QTL is involved in the regulation of circadian rhythms. Statistical models will be developed to infer such genes from observed molecular markers such as single nucleotide polymorphisms (SNPs).
Statistical Modeling of Functional Mapping
Suppose a random sample of size N is drawn from a natural human population at Hardy-Weinberg equilibrium. In this sample, multiple SNP markers are genotyped, with the aim of identifying QTL that affect circadian rhythms. The relative concentrations of mRNA (M) and the effective protein (P) are measured in each subject at a series of time points (1, ..., T), during a daily light-dark cycle. Thus, there are two sets of serial measurements, expressed as [M(1), ..., M(T)] and [P(1), ..., P(T)]. According to the differential functions (1), these two variables, modeled in terms of their change rates, are expressed as differences between two adjacent times, symbolized by
y = [M(2) - M(1),..., M(T) - M(T - 1)]
= [y(1),..., y(T - 1)]
for the protein change and
z = [P(2) - P(1),..., P(T) - P(T - 1)]
= [z(1),..., z(T - 1)]
for the mRNA change.
Assume that a putative QTL with alleles A and a affecting circadian rhythms is segregated in the population. The frequencies of alleles A and a are q and 1 - q, respectively. For a particular genotype j of this QTL (j = 0 for aa, 1 for Aa and 2 for AA), the parameters describing circadian rhythms are denoted by Θuj = (nj, mj, τj, rMj, rPj, qMj, qPj, kj). Comparisons of these quantitative genetic parameters among the three different genotypes can determine whether and how this putative QTL affects circadian rhythms.
The time-dependent phenotypic changes in mRNA and protein traits for individual i measured at time t due to the QTL can be expressed by a bivariate linear statistical model

where ξij is an indicator variable for the possible genotypes of the QTL for individual i, defined as 1 if a particular QTL genotype j is indicated and 0 otherwise, uMj(t) and uPj(t) are the genotypic values of the QTL for mRNA and protein changes at time t, respectively, which can be determined using the differential functions expressed in equation (1), and
(t) and
(t) are the residual effects in individual i at time t, including the aggregate effect of polygenes and error effects.
The dynamic features of the residual errors of these two traits can be described by the antedependence model, originally proposed by Gabriel [12] and now used to model the structure of a covariance matrix [13]. This model states that an observation at a particular time t depends on the previous ones, the degree of dependence decaying with time lag. Assuming the 1st-order structured antedependence (SAD(1)) model, the relationship between the residual errors of the two traits y and z at time t for individual i can be modeled by

where
k and ψk are, respectively, the antedependence parameters caused by trait k itself and by the other trait, and
(t) and
(t) are the time-dependent innovation error terms, assumed to be bivariate normally distributed with mean zero and variance matrix

where
(t) and
(t) are termed time-dependent innovation variances. These variances can be described by a parametric function such as a polynomial of time [14], but are assumed to be constant in this study. ρ(t) is the correlation between the error terms of the two traits, specified by an exponential function of time t [14], but is assumed to be time-invariant for this study. It is reasonable to say that there is no correlation between the error terms of two traits at different time points, i.e. Corr(
(ty),
(tz)) = 0 (ty ≠ tz).
Based on the above conditions, the covariance matrix (Σ) of phenotypic values for traits y and z can be structured in terms of
y,
z, ψy, ψz and Σε(t) by a bivariate SAD(1) model [15,16]. Also, the closed forms for the determinant and inverse of Σ can be derived as given in [15,16]. We use a vector of parameters arrayed in Θv = (
y,
z, ψy, ψz, δy, δz, ρ) to model the structure of the covariance matrix involved in the function mapping model.
Likelihood
The likelihood of samples with 2(
T 1)-dimensional measurements,
.gif)
, for individual
i and marker information, M, in the human population affected by the QTL is formulated on the basis of the mixture model, expressed as

where the unknown parameters include two parts, ω = (ωj|i) and Θ = (Θuj, Θv). In the statistics, the parameters ω = (ωj|i) determine the proportions of different mixture normals, and actually reflect the segregation of the QTL in the population, which can be inferred on the basis of non-random association between the QTL and the markers. For a mapping population, N progeny can be classified into different groups on the basis of known marker genotypes. Thus, in each such marker genotype group, the mixture proportions of QTL genotypes (ωj|i) can be expressed as the conditional probability of QTL genotype j for subject i given its marker genotype.
Suppose that this QTL is genetically associated with a codominant SNP marker that has three genotypes, MM, Mm and mm. Let p and 1 - p be the allele frequencies of marker alleles M and m, respectively, and D be the coefficient of (gametic) linkage disequilibrium between the marker and QTL. According to linkage disequilibrium-based mapping theory [17], the detection of significant linkage disequilibrium between the marker and QTL implies that the QTL may be linked with and, therefore, can be genetically manipulated by the marker. The four haplotypes for the marker and QTL are MA, Ma, mA and ma, with respective frequencies expressed as p11 = pq + D, p10 = p(1 - q) - D, p01 = (1 - p)q - D and p00 = (1 - p)(1 - q) + D. Thus, the population genetic parameters p, q, D can be estimated by solving a group of regular equations if we can estimate the four haplotype frequencies. The conditional probabilities of QTL genotypes given marker genotypes in a natural population can be expressed in terms of the haplotype frequencies (see [18]).
In the mixture model (4),
is the unknown vector that determines the parametric family fj, described by a multivariate normal distribution with the genotype-specific mean vector

and the covariance matrix Σ. While the mean vector is determinedby genotype-specific parameters, Θuj = (nj, mj, τj, rMj, rPj, qMj, qPj, kj), j = (2,1,0) the covariance matrix is structured by common parameters, Θv = (
y,
z, ψy, ψz, δy, δz, ρ).
Algorithm
Wang and Wu [
18] proposed a closed form for the EM algorithm to obtain the maximum likelihood estimates (MLEs) of haplotype frequencies
p11,
p10,
p01 and
p00, and thus the allele frequencies of the marker (
p) and QTL (
q) and their linkage disequilibrium (
D). Genotype-specific mathematical parameters in
uj (5) for the two differential functions of circadian rhythms, and the parameters that specify the structure of the covariance matrix,
Σ, can be theoretically estimated by implementing the EM algorithm. But it would be difficult to derive the log-likelihood equations for these parameters by this approach because they are related in a complicated nonlinear way. The simplex algorithm, which relies only upon a target function, has proved powerful for estimating the MLEs of these parameters [
19] and will be used in this study. As discussed above, closed forms exist for the determinant and inverse and should be incorporated into the estimation process to increase computational efficiency.