Breeding for Nitrogen use Efficiency: Lessons from Two Genomic Prediction Experiments Within a Broad-based Population of Upland Rice
Received: 23-Jul-2021 / Accepted Date: 02-Sep-2021 / Published Date: 09-Sep-2021 DOI: 10.4172/2375-4338.1000264
Abstract
In the context of subsistence farming in Madagascar, upland rice producers have limited access to mineral fertilizers and yields remain very low. Genetic improvement of yield through the nitrogen use efficiency (NUE) component is an avenue to be explored in breeding programs. A recent GWAS study carried out on an upland rice diversity panel allowed to detect genomic regions involved in NUE variability, nevertheless these regions explained only a moderate part of the total genetic diversity observed in the panel. We investigated the potential of genomic prediction for NUE in order to optimize our upland rice breeding program for this trait. We evaluated the predictive ability of genomic prediction using two validation experiments. The first consisted of a standard cross-validation with 5-fold subdivision of the diversity panel (DP) and the second consisted of an independent experiment involving a breeding population (BP) derived from the DP. The DP was structured into five genetic clusters of different sizes and with some degree of admixture, while the BP was composed of two main clusters. The best prediction ability for NUE was obtained in crossvalidation within the DP. The predictive ability in the independent validation experiment was weak (r = 0.25), about three times less than those obtained in the cross-validation. The low kinship between DP and BP, different genetic structures and slightly different LD patterns probably explains the low predictive ability of across populations genomic prediction. Practical implications for our rice breeding program are discussed.
Keywords: Genomic Prediction; Nitrogen use Efficiency; Upland Rice
Introduction
Rice (Oryza sativa L.) is one of the most important food crops worldwide [1]. Its improvement for higher yield in sustainable agriculture systems is also vital to meet energy and nutritional needs of the growing world population [2]. In Madagascar, upland rice cultivation is developing in the high plateau area where soil fertility is generally low [3]. The use of mineral fertilizer is uniformly low because of its high cost in a subsistence farming context [4]. As a consequence, animal manure is often the only source of fertilizer [5]. There is an urgent need to find alternatives to increase yields without massive use of mineral fertilizers. One option consists in improving the nitrogen use efficiency (NUE) [6].
NUE plays a central role in low-input systems [7]. NUE is usually defined as grain yield per unit of N supplied by soil and by fertilizer [8]. NUE and its components are complex quantitative traits and the study of their relationship with other simpler morphological or physiological traits could help understand the mechanisms involved in NUE and identify ways to improve selection efficiency. Genetic variability of both N uptake efficiency and N utilization efficiency has been demonstrated in a large number of crops conducted a genome-wide association study (GWAS) for 16 NUE-related agronomic traits and yield components using a diversity panel (DP) of 190 mainly japonica varieties and a set of 38,390 SNPs [9,10]. Few association signals were identified for NUE corresponding to two haplotype groups and one isolated SNP on chromosomes (Chrs.) 6, 7 and 11, explaining 9.5%, 9.6% and 10.4% of phenotypic variation, respectively.
In such a context, genomic selection (GS) should be considered for breeding rice varieties with improved NUE because it encompasses all marker information and can therefore better capture variation that arises from low effect QTLs [11,12] . Defined GS as a combination of genetic markers covering the entire genome with different statistical methods to maximize the efficiency of selection. The principle of GS is (i) to combine molecular markers and phenotypes of individuals within a calibration population where the effects of all markers are estimated simultaneously and then (ii) to predict the genomic estimated breeding value (GEBV) for individuals of the breeding population that are only genotyped [13,14]. The GEBV is then used to select the best candidates to generate new elite material. Several methods are available to build prediction models [15-17]. They differ in the assumptions about the markers’ effects and the variance of these effects on the observed trait variation. The general conclusion of several empirical studies is that there are no perfect statistical methods [18]. Beyond the statistical methods, the predictive ability of genomic predictions also depends on the characteristics of the target populations (markers density and their distribution in the genome, size and structure of the calibration population, relatedness between calibration and validation populations, respective populations’ LD, minor allele frequency) and traits (heritability and correlations between traits) [18-20].
GS is expected to accelerate genetic gain for traits such as yield potential and adaptation to climate change constraints and resource use efficiency including nitrogen [21]. GS studies in rice have mainly explored the cross-validation approach within diversity panels for different target traits in more or less advanced generations or lines (Table 1). To our knowledge, NUE has not yet been studied in GS; however [20]. studied the nitrogen index (NI), they defined as foliar nitrogen content, between 7 to 10 days after flowering. They used a diversity panel (DP) to calibrate the prediction equation and used it to predict 97 advanced lines from biparental crosses between 31 founders sampled among 284 DP accessions. The breeding population was advanced following a pedigree breeding process until F5 to F7. Reasonably high predictive abilities were obtained, i.e. 0.54 for panicle weight and 0.33 for nitrogen index. The same strategy was applied in [22]. The calibration population consisted of 228 accessions and the validation population consisted of 95 advanced lines (F5-F7). The predictive ability of GEBV between populations was quite high ranging from 0.43 (FL-As) to 0.48 (CG-As). Such results were obtained because the relationship between the two populations were sufficiently strong [20,23].
Plant material | Phenotypic traits | Number of markers SNPs | Applied statistical methods | Precision of GEBV | Main conclusions | References |
---|---|---|---|---|---|---|
278 in F2 | GY, TL, TGW and GP | 250 000 SNPs | GBLUP, LASSO and SSVS | GY: 0.09-0.13 TL: 0.21-0.23 TGW: 0.67-0.69 GP: 0.35-0.37 |
The GBLUP tends to perform better if the trait is largely polygenic, compared to the other two genomic prediction models | [22] |
354 in S3:4 | DFl, PH, PW and GY | 8,336 | RR-BLUP, GBLUP, BRR, Lasso and BL | DFl: 0.18 - 0.40 PH: 0.46 - 0.62 PW:0.20-0.45 GY: 0.16 – 0.46 |
The predictive ability of GEBV is affected by the correlation between the calibration and validation population, trait inheritance, prediction models, LD, MAF, and the composition of the calibration population | [18] |
284 in F5-F7 | DFl, PW and NI | 43,686 | GBLUP, RKHS and BayesB | DFl: 0.51-0.65 PW: 0.54-0.62 NI: 0.52-0.57 |
The predictive ability of the prediction varies considerably depending on the composition of the calibration population, the trait, LD and the frequency of minor alleles (MAF) | [20] |
228 in F5-F7 | FL-As, CG-As/FL-As and CG-As | 22,370 | GBLUP, BayesA and RKHS | FL-As:0.45 CG-As/FL-As:0.33 CG-As:0.53 |
Excluding the most redundant SNP markers based on LD information had a limited effect on predictive ability. The level of predictive ability was similar to levels reported in the literature. | [23] |
1495 in F1 | GY, GN, TGW, PH, GL and GW | 102 795 SNPs | GBLUP | GY: 0.54 GN: 0.62 TGW: 0.54 PH: 0.58 GL: 0.92 GW: 0.87 |
The accuracy is better when the two populations are linked. The accuracy of the GEBV is of the same order of magnitude as the heritability of traits | [24] |
Phenotypic traits: CGAs, dehulled grain; DFl, Flowering date; FL-As, Flag Leaf; GL, Grain Length; GN,Grain Number; GP; Grains per Panicle; GW, Grain Width; GY, Grain Yield; NI, Nitrogen Index; PB, Primary Branches per panicle; PH, Plant Height; PL, Panicle Length; PN, Panicles per Plant; PW, Panicle Weight; SB, Secondary Branch;SP,Seeds per Panicle; TGW, Thousand Grain Weight; TL,Tillering; ZN, grain zinc concentration.
Table 1: Research in Genomic Selection of rice.
Here, we report an empirical study that evaluates the predictive ability of GS to predict NUE, using an upland rice diversity panel to calibrate the model, and advanced lines from a synthetic multi-parent population to validate the model. We explored the potential of GS using both cross-validation and across populations genomic prediction experiments. We sampled 184 accessions from the working collection of the upland rice breeding program, hereafter called diversity panel (DP) used by and 198 advanced breeding lines, from a synthetic population, hereafter referred to as the breeding population (BP) [10]. We (1) evaluated the predictive ability of genomic prediction for yield and NUE through cross-validation within the DP, (2) analyzed the phenotypic characteristics (genetic variance, heritability, correlations between traits) of the BP, in comparison with the DP and (3) assessed the ability of the DP to predict the breeding value in the BP. We finally discussed our results in the lights of some well-known breeding practices.
Materials and Methods
Plant material
The plant material is composed of two rice (Oryza sativa) populations of diverse geographic origins (tropical and temperate japonica subspecies).
The first population corresponds to a subsample of the diversity panel (DP) studied by it includes 184 accessions and represents the working collection of the upland rice breeding program of the National Centre of Applied Research and Rural Development (FOFIFA) and French Agricultural Research Centre for International Development (CIRAD) [10]. Eighty accessions of the DP are of Malagasy origin and 104 are of diverse geographical origins, including Brazil, Colombia, Ivory Coast, China and other Asian countries. DP accessions constitute the basis of the bi-parental crosses that form the starting point of the pedigree breeding scheme within the breeding program. The DP is representative of the global diversity of tropical japonica subspecies [10].
The second population, corresponds to a breeding population (BP) composed of 198 S4:6 lines representing a sample of advanced lines derived from a synthetic population. The population was developed by crossing a sample of an existing synthetic population (PCT 11) segregating for a recessive genic male sterility (ms) gene and twelve founders from the DP used as males. First, the 12 founders were crossed to male sterile plants of PCT 11 in isolated plots [18]. Seeds that were fully filled were harvested in male sterile plants to obtain F1 seeds, with 100% heterozygous at the ms locus.
At the F2 generation, equal numbers of seeds from each cross were bulked together. The bulked seeds were then sown in one field to allow fertile plants to pollinate male sterile ones through random mating across all progenies. Again, seeds were harvested on male sterile plants. After two cycles of random mating, four generations of single-seed progeny (SSD) were run to obtain several hundred S3:4 plants, that constituted the starting point to (1) produce S4:5 and S4:6 progenies for the phenotyping and (2) sample a single S5:6 seed for each S3:4 line for the genotyping.
Experimental site and phenotyping
Field phenotyping was conducted at Ivory in the Mid-West region of Madagascar located at 19° 33'27''S, 46° 24'43''E at 960 m altitude, with red ferralitic soils, especially deficient in nitrogen (N). The DP was phenotyped during two successive rainy seasons, 2014-2015 and 2015-2016, for 16 NUE-related agronomic traits and yield components as described in [10]. The BP was phenotyped for the same traits in 2016-2017 and 2017-2018.
The field experiments consisted in four augmented alpha-lattice designs with two complete replications each. Replications were subdivided into a number of blocks depending on the population size and the dimensions of the available field. The two checks varieties (Nerica 4 and WAB 880-1-32-1-1-P2-HB-1) were replicated in each block. In the DP phenotyping experiments, two nitrogen treatments were applied within each block, half of which was under high nitrogen (HN) treatment (120 kg/hm2 N) and half under low nitrogen (LN) treatment (0 kg/hm2 N). In year-1, the size of each subplot was 1.8 m × 2.4 m (108 hills or holes made for sowing seeds) for the first replication and 1.4 m × 2.0 m (70 hills) for the second replication. In year-2, the size of each subplot was 1.8 m × 1.6 m (72 hills) in the first replication and1.2 m × 1.6 m (48 hills) in the second replication [10]. In the BP phenotyping experiments, only the LN treatment (0 kg/hm2 N) was applied. In year-1, the size of each subplot was 1.6 m x 2.0 m (80 hills) for the two repetitions. In year-2, the subplot size was 1.6 m x 2.0 m (80 hills) for the first replication and 1.4 m x 2.0 m (70 hills) for the second replication. Field preparation started with ox plowing followed by hand surfacing of the soil. In the context of subsistence farming, we considered only the LN condition in the analysis because it represents the farmers' current fertilization practices for upland rice. Four to six rice seeds were sown per hill with 20 cm × 20 cm spacing between hills. In the DP, right before sowing, the same base dressing of cattle manure (5 000 kg/hm2), triple superphosphate (69 kg/hm2 P2O5), potassium sulfate (62.4 kg/hm2 K2O) and dolomite (500 kg/hm2) was applied to each hill in all the plots [10]. In the BP, right before sowing, the same base dressing was applied except for triple superphosphate (150 kg/hm2 P2O5) and potassium sulfate (130 kg/hm2 K2O).
The target traits for both DP and BP were time (in days) from sowing to flowering (DFl), thousand grain weight (TGW), grain yield (GY), grain nitrogen content (GNC) and nitrogen use efficiency (NUE). The majority of these traits (DFl, TGW and GNC) are already registered in the Crop Ontology. DFl was recorded as the number of days after sowing, when 50% of the plants in the plot has flowered. TGW was calculated from the weight of 200 harvested grains after 72 hours of oven drying at 60°C. GY was recorded by weighing dried grains in grams of each plot and extrapolating in kg/ha. GNC was measured in grinded grains, using near infrared spectrometry (NIRS) and a calibration equation developed by [10]. NUE was estimated as GY/nitrogen supply as proposed by [8].
Analysis of phenotypic data
For each population, the two-year experiments were analyzed jointly. The whole phenotypic data processing was based on the PROC MIXED procedure (SAS 9.4; SAS Institute Inc, Cary), including a model-based diagnostic analysis to identify and, if necessary, eliminate influential individuals (plots), as described by and [18,20,24].
The same mixed linear model was run on the cleaned data in order to extract BLUPs (Best Linear Unbased Prediction) and different variance components. The same linear mixed model was run for the DP and BP. The model was:
Ycijkl= μ + Al + Rjl + Tc + (arb)jkl+ g(T)ci + gy(T)cil+ ecijklwhere:
Ycijkl observed phenotypic value of genotype i in block k, nested in repetition j of year l,
μ experiment mean
Al effect of year l
Rjl effect of repetition j nested in year l
Tc effect of check c
(arb)jkl effect of block k nested in repetition j of year l, with (arb)jkl~ (0, σ2bl)
g(T)ci effect of genotype i nested in check c, with g(T)ci~ (0, σ2 g). For plots receiving either check (Nerica 4 or WAB), g(T)ci= 0, or g(T)ci ≠ 0 otherwise
gy(T)cil interaction between genotype i and year l nested in check c, with gy(T)cil~ (0, σ2 gy). As for g(T)ci, for plots receiving either check, gy(T)cil = 0, orgy(T)cil ≠ 0 otherwise.
ecijkl residual, with ecijkl~(0, σ2 e)
The variance components allowed to estimate the broad sense heritability (H²) and its standard error (SE) using the formula and the SAS script from as follows [25]:
with ny, the harmonic number of years and np, the harmonic number of plots across years respectively.
Genotyping and genotypic data
Genotypic information for the DP was retrieved from [10]. The BP population was genotyped specifically to fulfill the objective of the present study. Seeds of each S4:6 progeny originating from each S4 line of the BP were grown at the Cirad greenhouse (Montpellier, France). DNA were extracted from 30 mg of young leaves of a single S5:6 plant for each S3:4 line three weeks after seedling following the protocol describe in [26]. Two hundred ng of the extracted DNA was used with Apek I enzyme to obtain a DNA fragment library to perform genotyping by sequencing (GBS) compatible with Illumina Inc_sequencers of latest generation (HiSeq™ 2000). The GBS methodology consists in digesting genomic DNAs with a restriction enzyme, ligating adapters comprising barcodes, pooling all these fragments, amplifying by PCR this pool, and estimating the quality of bank before sequencing it [27]. The complete sequence of Nipponbare rice (Os-Nipponbare-Reference-IRGSP-1.0) was used as a reference for alignment with Bowtie2 using default parameters.
The SNP calling was done with Tassel GBS pipeline v5.2.37 without any filtering. The molecular information is stored and managed through a web-based tool, Gigwa-Genotype investigator for genomewide analyses [28,29]. A first filtering was applied to data from the two populations simultaneously to obtain SNPs, with minor allele frequency (MAF) higher than 1% and maximum missing data below 20%. Missing data were imputed separately per population using BEAGLE 5.0 R40B then filtered for heterozygosity (Ho ≤ 30%), minor allele frequency (MAF ≥ 1%) and imputation rate (≤ 20%). A total of 87 089 SNPs to the two populations was finally obtained [30].
Genetic characterization of the populations
Populations were characterized individually using 87,089 SNPs: First, minor Allele Frequency (MAF), observed heterozygosity (Ho) and the kinship between/within population were calculated with Tassel software version 5.0. Later, diversity tree was drawn on Darwin version 6.0.14 [31]. To provide a visualization of the structure among and within populations. Next, we used Fst to determine the degree of differentiation between DP and BP. The package hierfstat proposed by was used [32]. A filter on MAF >5% on the 87,089 SNPs was performed and allowed the selection of 46,055 SNPs. Then, the removal of strictly redundant SNPs allowed to refine the SNP set to 29,847 SNPs to reduce the size of the genotypic matrix and the computation time without information loss. Afterwards, the Landscape and Ecological Association Studies (LEA) package was used to analyze the structure within and among populations [33]. We arbitrary used the probability threshold of 0.6 to assign individuals to the different genetic groups. An individual was assigned to a cluster if the associated probability is greater than 0.6, otherwise it was classified as admixed and placed in cluster 9. Finally, linkage disequilibrium (LD) was evaluated as proposed by and calculated with LDcorSV package in the R environment [34,35]. LD was calculated using the 29,847 SNPs dataset.
Genomic prediction methods
Two incidence matrices, of size 46,055 SNPs and 29,847 SNPs, were used for genomic predictions. Two kernel-based methods were used to predict GEBV as they are computationally more efficient than their primal formulation counterparts in the context of “small-n-large-p” problem [16]. One was the Genomic Best Linear Unbiased Predictor method (GBLUP) that utilizes the well-known genomic relationships matrix G, as kernel. The GBLUP method was implemented using the Expectation–Maximization convergence algorithm and the genomic matrix G = XX′, X being the centered genotype matrix containing values of −1, 0 and +1, with N × P dimension, where N is the number of entries and P the number of markers as used by [20,36,37]. The second method was the Reproducing Kernel Hilbert Space method (RKHS), which was first proposed by and uses a Gaussian kernel to connect the genotypes and phenotypes and is defined as K = k(xi,xj) = exp(- ||xi - xj||²/h), where xi and xj are the genotypic vectors of individuals i and j respectively, h is a bandwidth parameter which controls how fast the relationship between two genotypes decays as the distance between the corresponding pairs of marker vectors increases [38]. The choice of the bandwidth parameter can be optimized by applying a cross-validation or a Bayesian approach treating h as a random variable. K matrix is able to approximate any arbitrary function including the multiplicative linear function used to model epistatic inheritance [16,39]. The genomic prediction model can be described by the following equation:
Y = Xβ + Zg + ε
where Y is an n×1 vector of observation, X is the incidence matrix for the fixed effects, β is the vector of non-genetic fixed effects, the part Xβ includes only the global mean μ and when the structure is taken into account in the model one adds μ plus the structure matrix, Z is the genotypic matrix of size N (individuals) × P (markers), g is the vector of random regression coefficients of accessions and ε is the vector of residuals.
Assessment of genomic prediction ability
Cross-validation experiments: Firstly, cross-validation experiments were performed with 5-fold subdivision of the DP, with 147 accessions in the calibration set and 37 accessions in the validation set. Each cross-validation experiment was replicated 100 times using 100 independent partitions of the accessions into the training set and validation set. For each independent partition, the correlation between the predicted and the observed phenotype was calculated, so as to obtain 100 correlations for each cross-validation experiment. The predictive ability of each cross-validation experiment was computed as the mean value of the 100 correlations. Once a model is calibrated, it is used to predict GEBV of the validating set. The same partition was used for the 5 phenotypic traits considered, the 2 prediction models and the 2 SNP datasets.
Secondly, the predictive ability was specifically estimated for the 12 founders of the BP, using the remaining 172 accessions as the calibration set.
Finally, the cross-validation experiment was also performed within the BP and its predictive ability estimated and compared with the one of DP.
Prediction across-populations: In this experiment, the prediction models were trained with the phenotypic and genotypic data of the 184 DP accessions and then served to predict GEBV for the BP accessions; the correlation between the predicted and the observed phenotypes of the 198 BP accessions was calculated, and represents the predictive ability of the prediction experiment.
Analysis of factors affecting the predictive ability of GEBVs
We were interested in identifying factors and combinations of factors that influence significantly the variation of predictive ability. In the case of the cross-validation experiment, the analysis distinguished two categories of effects: (a) effects due to the controlled factors and (b) effects due to the unpredictable part of the process. The analysis is based on the following ANOVA models:
where:
PAmse individual value of predictive ability,
μ overall mean
MSms scenario effect which combines all effects
Ms effect of prediction method, s {GBLUP, RKHS}
Sm effect of SNP number m, mÎ {29 847, 46 055}
MS’ms interaction between prediction method and SNP number
emse residual, emse~(0, ), e Î {1, 2, 3, …, 100}
First, the model (1) decomposed the overall variation into two components: between-scenario (MSms) and within-scenario (emse). The scenario was the only explanatory factor and the model was run using conventional single factor ANOVA. Four scenarios were defined for each trait (2 prediction methods × 2 SNP datasets). If scenarios differed significantly, then the model (2) decomposed further the overall variation, still using ANOVA. The model (2) decomposed the variation into all possible effects. Because the two-way interaction could represent significant sources of variation, the F tests for main effects and associated p-values could be inflated. To prevent this inflation, the main effects were also tested using the interaction mean square as the error term when the interaction effect was significant.
Results
Phenotypic characteristics of the two populations
The variance components and associated statistics are shown for the five traits in the DP and BP in Table 2. There is a highly significant genetic variance in each population and each trait. The genetic variance in BP for GY is only 12.5 % that of DP. It reaches 40 to 60% for TGW, GNC and NUE. The year effect is highly significant for most traits except NUE in DP and except GNC in BP. The interaction between genotype and year (G × Y) is significant for all traits in DP. It is significant only for DFl in BP.
Population | Component | DFl | TGW | GY | GNC | NUE |
---|---|---|---|---|---|---|
Diversity panel (DP) |
Year1 | 381.55*** | 7.60** | 39.41*** | 31.35*** | 0.47NS |
Repetition(Year)1 | 2.17NS | 1.16NS | 4.07** | 18.82*** | 3.51* | |
Check1 | 0.08 NS | 0.26 NS | 3.42* | 0.10NS | 3.28* | |
Genotype2 | 27.2382*** | 15.8094*** | 347768*** | 0.0099*** | 28.8524*** | |
Block(year*Rep)2 | 0.5707* | 0.08819NS | 146954*** | 0.0063*** | 12.4641*** | |
Genotype*Year2 | 5.9885*** | 1.3474*** | 103766* | 0.0017* | 13.6166** | |
H2 | 0.836 | 0.925 | 0.461 | 0.471 | 0.437 | |
± SE | 0.023 | 0.011 | 0.063 | 0.055 | 0.064 | |
Breeding population (BP) |
Year1 | 111.48*** | 304.35*** | 548.49*** | 3.63NS | 25.38*** |
Repetition(Year)1 | 4.33* | 9.80*** | 13.61*** | 9.25*** | 2.69NS | |
Check1 | 0.02NS | 0.27NS | 6.95** | 0.23NS | 8.56*** | |
Genotype2 | 37.3015*** | 9.7258*** | 44524*** | 0.0062*** | 12.0119*** | |
Block(year*Rep)2 | 2.3134*** | 0.1159NS | 44462*** | 0.0060*** | 11.2422*** | |
Genotype*Year2 | 7.6253*** | 0NS | 8771.42NS | 0.0031NS | 0.6083NS | |
H2 | 0.826 | 0.911 | 0.333 | 0.327 | 0.333 | |
± SE | 0.023 | 0.011 | 0.060 | 0.065 | 0.058 |
1F value and test of the fixed effect
2Estimate and Wald test of the random effect
H2 = Broad sense heritability of the family means
SE = Standard Error of H²
Significance level : NS = Not significant, * = Significant at p-value between 0.05 and 0.01, ** = Significant at p-value between 0.01 and 0.001,*** = Significant at p-value < 0.001
Table 2:Variance components and broad sense heritability for five traits in DP and BP populations. The phenotyping was conducted for two years (2014-2015 and 2015-2016 for DP, 2016-2017 and 2017-2018 for BP).
H2 is high for DFl and TGW and moderate for GNC, GY and NUE in both populations. In terms of correlation between traits, the two populations harbor comparable correlation structure according to Mantel test (correlation of 0.93 with p-value < 0.05, Table 3). GY and GNC traits were weakly correlated, with r = -0.16 (p-value = 0.025) and-0.11 (p-value = 0.145), in DP and BP respectively. Conversely, correlations between GY and NUE were highly significant (r = 0.96 in BP to 0.98 in DP with p-value < 0.0001). However, DFl, TGW and GNC were slightly and rather negatively correlated with the other traits in both populations.
(a) Variables | DFl | TGW | GY | GNC | NUE |
DFl | 1 | -0.191 | -0.121 | -0.170 | -0.155 |
TGW | -0.191 | 1 | 0.064 | -0.082 | 0.068 |
GY | -0.121 | 0.064 | 1 | -0.108 | 0.983 |
GNC | -0.170 | -0.082 | -0.108 | 1 | -0.100 |
NUE | -0.155 | 0.068 | 0.983 | -0.100 | 1 |
(b) Variables | DFl | TGW | GY | GNC | NUE |
DFl | 1 | -0.302 | -0.235 | 0.117 | -0.229 |
TGW | -0.302 | 1 | 0.228 | -0.078 | 0.218 |
GY | -0.235 | 0.228 | 1 | -0.160 | 0.965 |
GNC | 0.117 | -0.078 | -0.160 | 1 | -0.195 |
NUE | -0.229 | 0.218 | 0.965 | -0.195 | 1 |
Table 3:Genetic correlation between traits in (a) diversity panel and (b) breeding population.
Values in bold different from 0 with a significance level alpha = 0.05.
Genotypic population characteristics
Figure 1 shows 382 genotypes consisting of 184 DP accessions including 12 founders of the BP, and 198 BP accessions. Genotypes from the same origins tended to group together. The majority of the accessions (126) belong to the tropical japonica group, 16 to the temperate japonica group, and 42 accessions are admixed. The founders were distributed over the diversity of the DP. The BP is disseminated within the DP.
Figure 1: Neighbor joining tree of the 382 accessions based on 87 089 single nucleotide polymorphisms (SNPs).
Unweighted neighbor-joining tree was based on simple matching distances constructed from the genotypic information of 184 accessions from the DP (pink), 12
founders of the BP (black) and 198 genotypes from the BP (green).
The genetic differentiation between DP and BP (estimated by Fst) was evaluated at only 0.01, meaning that the differentiation between the two populations is almost null. Most of the genetic diversity was found in the populations. They formed together six genetic clusters using LEA package. Moreover, we artificially assigned most accessions to cluster 9 when any associated probability exceeded 0.6; such accessions were considered highly admixed. This cluster was composed of 96 DP and 115 BP accessions (Figure 2). The DP was under-represented in cluster 1 and fairly evenly distributed among clusters 2 to 6. In contrast, the BP population was better represented in clusters 1, 2 and 5 and underrepresented in clusters 3, 4 and 6. Four of the six clusters appeared to be more population specific (clusters 3, 4 and 6, specific to DP and cluster 1 specific to BP). However, they represented only 43.2% of the total accessions (165/382) (Table 4).
Populations | Cluster 1 | Cluster 2 | Cluster 3 | Cluster 4 | Cluster 5 | Cluster 6 | Cluster 9 | Total |
---|---|---|---|---|---|---|---|---|
BP | 39 | 3 | 41 | 115 | 198 | |||
DP | 15 | 13 | 10 | 29 | 21 | 96 | 172 | |
Parents | 2 | 1 | 2 | 1 | 6 | 12 | ||
Nerica 3 | Sucupira | B22 | IRAT 13 | Exp 206 | ||||
Nerica 10 | FOFIFA 116 | FOFIFA 154 | ||||||
IAC 1205 | ||||||||
SEBOTA 330 | ||||||||
WAB 878 6-12-1-1-P1-HB | ||||||||
PRIMAVERA |
Table 4: Population structure given by the LEA package with the numbers of accessions belonging to the different clusters.
Distribution of the 12 founders of the BP among the six clusters and the admixed ensemble is indicated just below the table.
The proportion of monomorphic loci calculated on 87 089 SNPs was 2.05% in DP, and 2.45% in BP. It should be noted that the monomorphic rate among the 12 founders (without PCT 11, not genotyped) reaches 34.70%. The analysis of variance showed that DP and BP differed very highly significantly (p value < 0.0001) in both MAF (10.2% for DP and 9.7% for BP) and Ho (1.8% for DP and 3.2% for BP).
Regarding relationship coefficients, the distribution was flatter and more spread out around zero within DP, meaning that some paired accessions were unrelated, while others were highly related (min=-0.597 and max=4.915). In contrast, the relationship coefficients associated with the BP were more concentrated around zero, meaning that a higher proportion of the population was unrelated with another member of the population (min=-0.416, maxi=3.754). When considering paired accessions from the two populations, the same kind of distribution was observed as within the BP population, which means a weak mean relationship between the two populations (Figure S1).
Finally, the average linkage disequilibrium (LD) in the 12 chromosomes for distances between 0 to 25 kb was 0.59 for the DP and 0.57 for the BP (Figure 3). The r2 reached half its initial value at a distance of 275-300 kb for both populations. The r² was equal to 0.2, for a distance of 750 kb in the DP and 600 kb in the BP. The r² reached 0.1, for pairwise distances of 2,200 kb in DP and 1,400 kb in BP. The LD decreases more slowly in DP than in BP.
Predictive ability of genomic prediction in cross-validation experiment
Each data point on the figure represents the average predictive ability (AvPA) of a scenario while each vertical bar represents the corresponding standard deviation obtained over the 100 replications.
Overall, AvPA ranged from 0.46 to 0.76. There were some differences of AvPA between traits (Figure 4).
Differences between scenarios were significant only for NUE (Table 5), for which the prediction model effect was significant at p-value < 0.0001 and the interaction between model and SNP number was significant at p-value = 0.022. The interaction was then used as error term to test the main effects; this resulted in no significant difference between the two genomic prediction models anymore. For all traits, a low fraction of total variation was explained by the linear model (R² between 0.3% and 5.6%), meaning that most variation was driven by the random sampling of the cross-validation process.
Default F test | |||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Trait | R² | CV | RMSE | Mean | Source | DF | SS | MS | F value | p-value | |
FL | 0.003 | 27.841 | 0.168 | 0.604 | Scenarios | 3 | 0.036 | 0.012 | 0.420 | 0.737 | NS |
Error | 396 | 11.213 | 0.028 | ||||||||
Corrected Total | 399 | 11.249 | |||||||||
Model | 1 | 0.023 | 0.023 | 0.820 | 0.365 | NS | |||||
SNP number | 1 | 0.012 | 0.012 | 0.430 | 0.513 | NS | |||||
Model*SNP number | 1 | 0.000 | 0.000 | 0.020 | 0.902 | NS | |||||
GNC | 0.019 | 26.080 | 0.173 | 0.664 | Scenarios | 3 | 0.226 | 0.075 | 2.510 | 0.059 | NS |
Error | 396 | 11.887 | 0.030 | ||||||||
Corrected Total | 399 | 12.113 | |||||||||
Model | 1 | 0.167 | 0.167 | 5.550 | 0.019 | * | |||||
SNP number | 1 | 0.057 | 0.057 | 1.880 | 0.171 | NS | |||||
Model*SNP number | 1 | 0.002 | 0.002 | 0.080 | 0.775 | NS | |||||
GY | 0.003 | 20.866 | 0.125 | 0.599 | Scenarios | 3 | 0.016 | 0.005 | 0.340 | 0.794 | NS |
Error | 396 | 6.181 | 0.016 | ||||||||
Corrected Total | 399 | 6.197 | |||||||||
Model | 1 | 0.003 | 0.003 | 0.160 | 0.689 | NS | |||||
SNP number | 1 | 0.005 | 0.005 | 0.320 | 0.573 | NS | |||||
Model*SNP number | 1 | 0.009 | 0.009 | 0.550 | 0.459 | NS | |||||
NUE | 0.056 | 22.502 | 0.125 | 0.555 | Scenarios | 3 | 0.364 | 0.121 | 7.790 | <.0001 | *** |
Error | 396 | 6.173 | 0.016 | ||||||||
Corrected Total | 399 | 6.538 | |||||||||
Model | 1 | 0.277 | 0.277 | 17.780 | <.0001 | *** | |||||
SNP number | 1 | 0.005 | 0.005 | 0.330 | 0.564 | NS | |||||
Model*SNP number | 1 | 0.082 | 0.082 | 5.250 | 0.022 | * | |||||
TGW | 0.012 | 16.451 | 0.164 | 0.997 | Scenarios | 3 | 0.128 | 0.043 | 1.590 | 0.192 | NS |
Error | 396 | 10.652 | 0.027 | ||||||||
Corrected Total | 399 | 10.780 | |||||||||
Model | 1 | 0.047 | 0.047 | 1.750 | 0.187 | NS | |||||
SNP number | 1 | 0.002 | 0.002 | 0.080 | 0.774 | NS | |||||
Model*SNP number | 1 | 0.079 | 0.079 | 2.930 | 0.088 | NS |
Table 5: ANOVA of factors affecting variation of the AvPA in the cross validation experiment within the diversity population (DP) considering 20 scenarios.
Sources of variation were: models (GBLUP and RKHS) and SNP numbers (29 847 and 46 055). R2coefficient of determination.
Positive relationships (0.46 for RHKS and 0.54 for GBLUP) were observed between average predictive ability and trait heritability (Figure S2). This correlation is higher with the GBLUP model than with the RKHS model.
When the GEBV of the 12 founder accessions was predicted using model trained with the remaining 172 accessions of the DP (Figure 5), high predictive ability, ranging from 0.59 to 0.86 were obtained. The GBLUP model produced higher accuracy than the RKHS for GY, NUE and TGW. The opposite was observed for DFl and GNC.
On the other hand, the cross-validation experiment within BP showed very low AvPA for all traits (Figure S3).
Predictive ability of genomic prediction across-populations using DP as calibration set
Four scenarios were considered in this non-replicated genomic prediction experiment, by combining two genomic prediction models and two SNP matrices (Figure 6). The breeding population prediction ability (BPPA) was defined as the prediction ability when BP was to be predicted using the calibration developed on the DP. BPPA varied from 0.06 to 0.27. The RKHS model resulted in slightly higher BPPA than GBLUP. The across populations prediction was much less accurate than cross validation within the DP.
Discussion
Reducing fertilizer inputs and improving nitrogen use efficiency (NUE) are the main objectives of plant nutrition research [40]. These two challenges contribute to preserving the environment and improving sustainable and productive agriculture [41]. There have been continuous improvements of NUE over the years along with increases in crop yield as in wheat, maize, barley or rice [7,42-45].
GWAS results explain only a small proportion of the genetic variability available for NUE [10]. In this study, we explored the potential of genomic selection to improve NUE prediction with the aim to include this strategy in rice breeding programs targeting this trait.
Three research questions will be discussed in this section for the improvement of NUE and Yield related traits in upland rice: (i) the potential of genomic prediction to select good candidates for initiating new breeding populations, (ii) the phenotypic and genotypic characteristics of the breeding population (BP) that directly descended from a subset of DP and (iii) the predictive ability of the BP obtained from the diversity panel (DP).
Predictive ability of genomic prediction in the diversity panel
The diversity panel considered in our study is a broad-based panel. The vast majority of varieties belong to the japonica subspecies. Most of them belong to the tropical and temperate japonica groups and are already adapted to local conditions [10]. Moreover, the moderate to high prediction abilities showed that phenotypic BLUPs associated with inbred lines are relevant to predict GEBV. It would therefore be interesting to use the calibrations we have already developed to identify new NUE donors to generate new populations to feed the breeding program. Although some of them have not yet been field evaluated, the available genome-wide molecular profiles make it possible to predict their phenotypes.
The first five genotypes with good GY in the DP areG3158 (EXP 304), G3156 (EXP 302), G3229 (SCRID090 72-3-1-3-5-1--), G3228 (SCRID090 60-1-1-2-4-1-2), and G3203 (NERICA 3). Four of these lines were derived from the FOFIFA/CIRAD breeding program and one from Africa rice. While the lines that exhibited good performance for NUE areG3158 (EXP 304), G3203 (NERICA 3), G3223 (PCT 4\SA\4\1>330-2-3-2-M 5-4-3-1-5), G3156 (EXP 302), and G3229 (SCRID090 72-3-1-3-5-1). Three of these lines were from the FOFIFA/ CIRAD breeding program, while PCT4 line was from CIAT. The genetic correlation between the two traits (GY and NUE) is strong (0.98), so it is not surprising that the best genotypes mentioned above are almost the same.
Our results from cross-validation with DP are consistent with those found by other authors working on different self-pollinated species such as rice, wheat, sorghum and cross-pollinated maize [20,46-50]. We found sometimes better precisions compared to those found in other papers on DFl, TWG and GY traits.
The predictive ability of the GEBV followed the same trend as the trait heritability, as stated by [18]. Both genomic prediction models are well suited to use relationships between accessions to construct efficient prediction models, as shown by several authors [51-53].
Phenotypic and genotypic comparison of the two populations
Phenotypic variation was largely influenced by accession, year and accession × year (G × Y) interaction, within each population. We found high genetic variance in the two populations, for almost all traits. Since the populations were evaluated with a time lag, neither year, nor G × Y interaction were shared between them. As a consequence, none could be used further in the genomic modeling step. The phenotypic BLUPs associated with the genotypes were the only effects to be modeled by the markers.
We recall that DP is a working collection which is an artificial population because it contains elite lines from several selection programs and historical lines from the Malagasy program. The BP was developed from DP by selecting a subsample of accessions (12 out of 184) and recombining them during 2 generations of random mating (using a thirteenth founder, PCT 11, as female) followed by 4 generations of SSD for generation advancement. The 12 founders were retained primarily for their adaptation to midlands conditions. This sampling first reduced the genetic diversity, with a nearly 17- fold higher rate of monomorphic loci (34.70%) than in the entire DP (2.05%). Fortunately, the PCT 11 founder, which has contributed half of the BP genome, was a broad-based synthetic population PCT 11 reexpanded the BP genetic base and reduced the monomorphic rate to as low as 2.45% [18]. This may explain why the two populations had rather similar phenotypic characteristics: broad sense H² and genetic correlations between traits.
Both populations showed very weak genetic differentiation (Fst = 0.01). Fst was calculated on markers sampled across the genome while the vast majority of markers sampled were neutral. This parameter probably reflects the absence of significant drift or intentional selection after separation of the two populations. However, the structure and kinship analyses showed significant differences between them: DP was more structured than BP, and the distribution of kinship coefficients in DP was flatter and less centered around zero than in BP. These parameters are known to impact the LD patterns and predictive ability [54].
Across-population prediction and practical implications for the rice breeding programs
When the DP was used as a calibration population to predict GEBV of the BP, the prediction abilities ranged from 0.06 to 0.27, meaning that the mean prediction ability for each trait was drastically reduced compared to cross-validation within DP. This could be explained by the low kinship between DP and BP, their different genetic structures and slightly different LD patterns have contributed a theoretical basis to the importance of relationships between populations in the predictive ability of genomic prediction. Many experimental results also showed that the prediction ability of GS should be based primarily on close relationships between the calibration and validation data sets [55-58].
In our study, the synthetic PCT 11 population alone contributed half of the genome because it is the only female parent, whereas the 12 male parents contributed only 4.2% each. PCT 11 was not genotyped in the study even though six inbred lines derived from it were included in the DP and genotyped (Table S1). In addition, phenotyping was performed on S4:5 and S4:6 progenies directly derived from the respective S3:4 reference lines, while genotyping was performed two generations later, by sampling a single plant per S4:6 progeny. These complex procedures reduced the genetic relationships between genotyped plants and phenotyped progenies within BP, as well as between BP and DP.
In upland rice, good performances of across populations genomic prediction could be obtained in the context of either (1) biparental crosses and pedigree selection to predict advanced F5-F7 lines [20,23] or (2) synthetic populations with a broad genetic base and early generations selection of the target population [18]. Simulation and empirical studies of GS showed sufficient accuracies to generate rapid gains in early generations selection, because all individuals to be predicted are linked with a sufficient number of close ancestors [59]. Beyond those early generations selection, allele frequency changes, recombination, and inbreeding make analytical prediction of gain impossible [60-63].
Conclusion
BP has a large genetic base that can be valuable for a sustainable genetic improvement of NUE. It is hoped that genomic prediction models can be as promising as in previous case studies if the genotypic and phenotypic data are referred to the same genetic unit (S3:4 seeds). We propose to genotype again the S6 generation, using a bulk of S4:6 progeny to infer the S3:4 genotype. Then, prediction equations can be reconstructed to predict the GEBV of S3:4 plants, based on phenotypes of their respective S4:5 and S4:6 progenies. This will allow to select and recombine the best BP accessions to develop a new population. The ms alleles that still segregate should make seed management and handling easier and should be used routinely to optimize recombination within the population. Such strategy is already used in recurrent selection programs. Considering the whole breeding process, GS has great potential to accelerate the genetic gain per unit of time and cost through early phenotyping and selection within segregating units before actual phenotypes are measured. In this perspective, prediction models could be calibrated as soon as S0:2 are produced.
For the time being, breeders can already use the results of crossvalidation on the upland rice diversity panel in the upland rice breeding program. In the DP, we have detected genotypes that have high levels of NUE, which addresses the perspective proposed by in the genomewide association study on nitrogen use efficiency.
Acknowledgements
Firstly, the authors are grateful to the field staff of the National Center for Applied Research on Rural Development in Madagascar who contributed to the data collection.Secondly, the authors thank the researchers at the French Agricultural Research Centre for International Development for their support in designing the structure of this article, in particular Gilles TROUCHE and Nourollah AHMADI. Finally, the authors acknowledge the financial support of the FondationAgropolis (France) through the GeneRice project, grant 1605-019 and Rice-CRP, grant C19386.
References
- Meuwissen TH, Hayes BJ, Goddard ME (2001) Prediction of total genetic value using genome-wide dense marker maps. Genetics 157: 1819–1829. PMID:11290733
- Frichot E, François O (2015) LEA : An R package for landscape and ecological association studies. Methods in Ecology and Evolution 6: 925–929.
- Spindel J, Begum H, Akdemir D, Virk P, Collard B, et al. (2015) Genomic Selection and Association Mapping in Rice (Oryza sativa): Effect of Trait Genetic Architecture, Training Population Composition, Marker Number and Statistical Model on Accuracy of Rice Genomic Selection in Elite, Tropical Rice Breeding Lines. PLOS Genetics 11: e1004982.
- Sapkota S, Boyles R, Cooper E, Brenton Z, Myers M, et al. (2020) Impact of sorghum racial structure and diversity on genomic prediction of grain yield components. Crop Sci 60: 132–148.
- Bernal-Vasquez AM, Möhring J, Schmidt M, Schönleben M, Schön CC, et al. (2014) The importance of phenotypic data analysis for genomic prediction - a case study comparing different spatial models in rye. BMC Genomics 15, 646.
- Ma P, Huang J, Gong W, Li X, Gao H, et al. (2018) The impact of genomic relatedness between populations on the genomic estimated breeding values. J Animal Sci and Biotech 9.
- Lorenz A, Nice L (2017) Training Population Design and Resource Allocation for Genomic Selection in Plant Breeding, in: Varshney, R.K., Roorkiwal, M., Sorrells, M.E. (Eds.), Genomic Selection for Crop Improvement. Springer International Publishing, Cham, Pp: 7–22.
- Auinger HJ, Schönleben M, Lehermeier C, Schmidt M, Korzun V, et al. (2016) Model training across multiple breeding cycles significantly improves genomic prediction accuracy in rye (Secale cereale L.). Theoretical and Applied Genetics 129: 2043–2053.
- Jannink JL (2010) Dynamics of long-term genomic selection. Genetics Selection Evolution 42, 35.
- Bassi FM, Bentley AR, Charmet G, Ortiz R, Crossa J (2016) Breeding schemes for the implementation of genomic selection in wheat ( Triticum spp.) . Plant Sci 242: 23–36.
- DoVale JC, Carvalho HF, Sabadin F, Fritsche-Neto R (2021) Reduction of genotyping marker density for genomic selection is not an affordable approach to long-term breeding in cross-pollinated crops (preprint). Genetics.
- Xu Y, Ma K, Zhao Y, Wang X, Zhou K, et al. (2021) Genomic selection: A breakthrough technology in rice breeding. The Crop Jrnl  9: 669–677.
Citation: Rakotomalala J, Brocke KV, Frouin J, Pot D, Rabekijana R, et al. (2021) Breeding for Nitrogen use Efficiency: Lessons from Two Genomic Prediction Experiments Within a Broad-based Population of Upland Rice. J Rice Res 9: 264. DOI: 10.4172/2375-4338.1000264
Copyright: © 2021 Rakotomalala J, et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Share This Article
Recommended Conferences
Madrid, Spain
Recommended Journals
ºÚÁÏÍø Journals
Article Tools
Article Usage
- Total views: 1616
- [From(publication date): 0-2021 - Nov 25, 2024]
- Breakdown by view type
- HTML page views: 1161
- PDF downloads: 455