# DNA methylation profile in beef cattle is influenced by additive genetics and age

### Population resource and data collection

All animal procedures implemented in this study were approved by the University of Nebraska-Lincoln Animal Care and Use Committee. All methods were performed in accordance with relevant guidelines and regulations. Methods are reported in the manuscript following the recommendations in the ARRIVE guidelines.

The blood obtained from 136 cows was centrifuged at 2500 × g for 10 min at room temperature. The buffy coat was collected and stored at −80 °C. All samples were collected from October 2 to November 27 of 2018 at the Eastern Nebraska Research and Extension Center at the University of Nebraska-Lincoln. Cows were chosen to provide variation in age and ranged from 217 to 3,192 days (0.6 to 8.7 years) of age at the time of sample collection. Animal were from an admixed population comprised of purebred Angus and composites of differing proportions of Angus, Simmental, and Red Angus.

All animals were genotyped with the medium-density Illumina Bovine SNP50 (~50 K SNPs) BeadChip (Illumina, San Diego, CA, USA). Genotype filtering included removing non-autosomal SNPs, SNPs with minor allele frequency < 0.02, and SNPs with Hardy–Weinberg equilibrium P-value > 10−5.

### DNA methylation

DNA was extracted from buffy coats using the DNeasy Blood and Tissue Kit (Qiagen, Cat No. 69506) and, subsequently, bisulfite-converted using the EZ DNA Methylation Kit (ZymoResearch, Irvine, CA, USA). DNAm data was obtained from bisulfite-treated samples using the mammalian array (HorvathMammalMethylChip40)31. DNAm level for each site was calculated as methylation β-value (β-value = intensity of the methylated allele/intensity of the unmethylated allele + intensity of the methylated allele + 100). The addition of 100 was used to stabilize β-values ​​when both intensities of the methylated and unmethylated alleles were small32. The SeSAMe pipeline33 was used to generate normalized β-values ​​and for quality control. The β-value has severe heteroscedasticity outside the intermediate methylation range; thus, a logit transformation of the β-values ​​(M-values) was used to approximate homoscedasticity. M-values ​​of 0 correspond to 50% of methylation, and positive and negative values ​​correspond to greater and lesser than 50% methylation level, respectively. M-values ​​were used to quantify DNAm level by region (ie, promoter, 5′ and 3′ UTR, exonic, intronic, and intergenic) and location related to CpG islands (ie, within or outside). DNA methylation load was calculated as the sum of all DNAm levels (M-value).

DNAm status of each site was determined by the distribution of the methylation β-values. For example, β-values ​​below, within, and above 2 standard deviations were classified as unmethylated (0), intermediately methylated (1), and methylated (2), respectively. mDNA status was used for prediction purposes.

### Statistical analyzes

#### Genetic parameters for DNAm level and DNAm load

Genetic parameters (ie, additive genetic and residual variances) for DNAm level (M-values) and DNAm load were estimated using the following animal model fitted in a Bayesian genomic best linear unbiased prediction (GBLUP) framework.

$${varvec{y}}={varvec{X}}{varvec{b}}+{varvec{Z}}{varvec{u}}+{varvec{e}}$$

where ({varvec{y}}) corresponds to the vector of phenotypes (DNAm level or DNAm load); ({varvec{X}}) corresponds to the design matrix linking the fixed effects to the phenotypes; ({varvec{b}}) corresponds to the vector of fixed effects including the intercept, the linear and quadratic (DNAm only) covariates of age, and covariates for proportion of each breed; ({varvec{Z}}) corresponds to the incidence matrix linking the random animal effect to the phenotypes; ({varvec{u}}) corresponds to the vector of random animal effects, where ({varvec{u}}) ~N(0, G ({sigma }_{u}^{2})), where G corresponds to the the genomic relationship matrix constructed following the first method of VanRaden3. 4 and ({sigma }_{u}^{2}) corresponds to the additive genetic variance; and ({varvec{e}}) corresponds to the vector of random residual effects associated with the phenotype, where ({varvec{e}}) ~N(0, Yo ({sigma }_{e}^{2})), where I corresponds to the identity matrix and ({sigma }_{e}^{2}) corresponds to the residual variance. The htwo was obtained as the ratio of additive genetic variance divided by phenotypic variance (additive genetic variance + residual variance).

Gibbs sampling was used to sample a posterior parameter distribution with a chain length of 20,000 iterations, burn-in of 2,000 samples, and a thinning interval of 100. Analyzes were performed using the BGLR packages35 in R software.

#### Age effect on mDNA load and age prediction

The effect of age on DNAm load was estimated by fitting an exponential regression of DNAm load on the age of animals (years). DNAm status was included as a variable to predict the age of animals using five Bayesian regression models: BRR29Bayes A29BayesB29BayesCπ36and Bayesian LASSO (BLASSO)37as follows:

$${varvec{y}}={varvec{X}}{varvec{b}}+sum_{i=1}^{k}{{varvec{m}}}_{{varvec {i}}}{boldsymbol{alpha }}_{{varvec{i}}}{{varvec{updelta}}}_{{varvec{i}}}+{varvec{e} }$$

where ({varvec{X}}) and ({varvec{e}}) have been previously described; ({varvec{y}}) corresponds to the vector of ages in years; ({varvec{b}}) corresponds to the vector of fixed effects, including the intercept and the linear covariates for each breed; ({{varvec{m}}}_{{varvec{i}}}) is the vector of mDNA status for site Yo (coded as 0, 1, and 2); ({boldsymbol{alpha }}_{{varvec{i}}}) i is the effect of DNAm status for site Yo for each of k sites; ({{varvec{updelta}}}_{{varvec{i}}}) is an indicator whether mDNA status site Yo was included (({{varvec{updelta}}}_{{varvec{i}}}) = 1) or excluded (({{varvec{updelta}}}_{{varvec{i}}}) = 0) from the model for a given iteration of gibbs sampling (BayesRR and BayesA, ({{varvec{updelta}}}_{{varvec{i}}}) = 1). In BayesRR and BayesCπ, the effect of DNAm status is assumed to follow a normal distribution. In BayesA and BayesB, the effect of DNAm status is assumed to follow a you-distribution with site-specific variances. In Bayes Lasso, the effect of DNAm status is assumed to follow a double-exponential distribution.

Gibbs sampling was used to sample a posterior parameter distribution with a chain length of 20,000 iterations, burn-in of 2,000 samples, and a thinning interval of 100. Bootstrapping (n= 400) was used to evaluate the performance of the models with 102 and 34 individuals as training and validation populations, respectively. The performance of the models was evaluated based on the correlation between true and predicted age, mean square error, and slope of the regression of predicted age on true age. Analyzes were performed in BGLR package35 in R software.