Population genomics of Rhizobium leguminosarum – 2
I am continuing my commentary on some of the issues raised by our new paper:
Kumar N, Lad G, Giuntini E, Kaye ME, Udomwong P, Shamsani NJ, Young JPW, Bailly X. 2015 Bacterial genospecies that are not ecologically coherent: population genomics of Rhizobium leguminosarum. Open Biol. 5: 140133.
Gene content and gene transfer
The 72 strains that we sequenced are all unique. A small part of their individuality stems from allelic variation in core genes. Although core genes do not appear to recombine between genospecies very often, they certainly experience a lot of recombination within the genospecies. Nitin Kumar demonstrated that by showing that most core genes have phylogenies that are significantly different from the consensus, and by using the ClonalFrame software to quantify the effect of recombination on core genes.
A more important part of the individuality of strains is conferred by the accessory genome: almost every strain had a unique set of genes, differing from its nearest relative by at least one cluster of five or more adjacent genes. All these strains were collected from one square metre, and sometimes even from separate nodules on the same plant. This implies that the gain and loss of accessory genes occurs very often. A nodule is most often founded by a single rhizobial lineage. When the nodule senesces and releases its bacteria, we assume that they are still more or less clonal (has anybody tested that?). By the time they form nodules of their own, though, these bacteria are likely to have shed some of their genes, or gained new ones from a donor, so that they have clearly diverged from each other.
A reference genome like that of 3841 is of limited use when exploring the accessory gene pool of a population. Nitin looked at all the contigs that could be assembled from the 72 genomes but had no similarity to sequences in 3841. He found 13,252 putative complete genes in addition to those that were in 3841 – more than twice the typical total number of genes in any strain! When considering the whole population, the accessory genome is much larger than the core genome. A few years ago, the concept of a species “pangenome” was popular. This comprised all the core and accessory genes found in a bacterial species. As long as only a few strains were sequenced, this was manageable, but as more and more genomes became available, the number of accessory genes in most species just seemed to grow without limit – an “open pangenome”. Every new genome contributed new genes, just as we are seeing in R. leguminosarum. A species seems to sample very widely from the pool of genes available to bacteria in general. The species pangenome concept does not seem very useful if it just means “all the genes there are in bacteria”.