rhizobium

January 18, 2021

Rhizobium leguminosarum 23

Our paper is published!

Our paper “Defining the Rhizobium leguminosarum species complex” was published today in Genes. You can download the PDF and the supplementary files from https://www.mdpi.com/2073-4425/12/1/111. You can refer to it (please do!) as Young, J.P.W.; Moeskjær, S.; Afonin, A.; Rahi, P.; Maluk, M.; James, E.K.; Cavassim, M.I.A.; Rashid, M.H.; Aserse, A.A.; Perry, B.J.; Wang, E.T.; Velázquez, E.; Andronov, E.E.; Tampakaki, A.; Flores Félix, J.D.; Rivas González, R.; Youseif, S.H.; Lepetit, M.; Boivin, S.; Jorrin, B.; Kenicer, G.J.; Peix, Á.; Hynes, M.F.; Ramírez-Bahena, M.H.; Gulati, A.; Tian, C.-F. Defining the Rhizobium leguminosarum Species Complex. Genes 2021, 12, 111.

Many thanks to all my coauthors for making this such an interesting project. Now I will have to find something else to do.

December 28, 2020

Rhizobium leguminosarum 22

The manuscript has been submitted

The manuscript “Defining the Rhizobium leguminosarum species complex” was submitted to Genes on 11 December. It is available as a preprint at https://www.preprints.org/manuscript/202012.0297/v1, with the DOI 10.20944/preprints202012.0297.v1. Many thanks to all my coauthors, without whose efforts this project would not have been possible. Now we have to wait for reviews, but in the meantime your comments will be very welcome.

October 15, 2020

Rhizobium leguminosarum 21

16S: the full story

The 16S ribosomal RNA sequences of the type strains of Rhizobium laguerreae, R. sophorae, R. ruizarguesonis and R. indicum are all identical to that of the type strain of R. leguminosarum. Even the type strain of the sister taxon R. anhuiense has the same sequence. From this, it would be reasonable to guess that all members of the Rlc had this sequence, but the truth is very different. In fact, I found 18 distinct 16S sequences among the available genomes – though these certainly do not correspond to the 18 genospecies. That does not include a further 5 variants that were only found in a single strain and differed by a single nucleotide from a common variant, which I discounted on the grounds that they might be sequencing errors. There were also three genome assemblies that had no 16S sequence, and three more in which it was incomplete – clearly these are errors in the assembly, since 16S is essential.

The ‘type’ sequence is certainly the predominant one, found in 286 of the 440 genomes, but there are three places in the 16S that have significant levels of polymorphism within the Rlc. Kumar et al. (2015, http://dx.doi.org/10.1098/rsob.140133) found a single polymorphic site in their sample (position 1069 in their numbering, 1151 in my alignment, which includes the IVS). They found this was T in gsA and gsB, C or A in gsC, A in gsD, C in gsE. With a much larger set of genomes, this remains broadly true, though the picture is less clear-cut and the fourth possible nucleotide, G, is also found. The type strains have the C variant. This nucleotide is in a loop, so is not paired in the 16S rRNA secondary structure. The second polymorphism is in a stem, so involves a complementary pair of nucleotides at positions 1023 and 1036 in the alignment. These are T and A in the type sequence, but C and G in all members of gsR (R. laguerreae) except, ironically, the type strain FB206. The C:G variant is also common in other F-clade genospecies, as well as in all gsM strains and one gsL.

The third polymorphism is the long intervening sequence that I discussed in the last post. After publishing that post, I located the reference that had slipped my mind. It is a nice paper from Raúl Rivas’s group in Salamanca, published last year (Flores-Félix et al. 2019, https://doi.org/10.1016/j.syapm.2018.10.009). They found the extra sequence in a number of strains, including three of the eleven genomes that I have just rediscovered it in, and have a very nice discussion of this. If I understand the paper correctly, they found that the IVS is excised in the RNA and the molecule is rejoined – it does not remain split as I imagined. The paper also refers to the literature on IVS in rRNA genes, and reminded me that the first published report in rhizobia (in what is now R. leucaenae) was by Anne Willems and Dave Collins back in 1993 (https://doi.org/10.1099/00207713-43-2-305). I decided that I did not have enough material to write a paper about the IVS I had found in R. leguminosarum in 1991, so I just submitted the sequence to GenBank in 1994 (accession U09271). The 11 genomes that have the IVS are all in the F-clade, but they are not a monophyletic group. Two of the strains have a single nucleotide variant within the IVS, but these strains are not neighbours.

The variation I have just described accounts for 9 of the 18 variants I claimed at the start. The other 9 involve a variety of other locations in the sequence, but occur only in one or two strains each.

I have tried to capture the 16S variation by adding to the phylogeny. Maybe the result is rather complex, but I hope it is more informative than just showing 18 arbitrary symbols for the variants.

Next week, I plan to start writing all this up as a manuscript, so I may not have new analyses to share with you. If anyone wants to try their own analyses (whether or not for potential inclusion in the manuscript), I can provide a link to a folder with all 440 genome sequences.

October 8, 2020

Rhizobium leguminosarum 20

A 16S flashback

In November 1991, Helen Downer and I were sequencing 16S genes of rhizobia. We used a recently-invented process called PCR (Saiki et al. 1988 http://dx.doi.org/10.1126/science.239.4839.487) and primers Y1 and Y2 that I had designed to amplify the first part of the gene (Young et al. 1991 http://dx.doi.org/10.1128/jb.173.7.2271-2277.1991). Then we sequenced the products by hand using big gels, X-ray film and 32P radioisotope. The PCR product was normally 308-312 bp, but we were intrigued by one pea-nodulating strain, SP18, that gave a much longer product. When we sequenced it, we found that the extra DNA was in a region that was normally conserved. The first stem-loop in the secondary structure of Rhizobium 16S rRNA usually looks like this (taken from my 1991 lab book):

The CCCC….GGGG stem is found in most Rhizobium and in Sinorhizobium. The GCAA loop is even more conserved in most Alphaproteobacteria, but instead of GCAA, strain SP18 had:

TCCTTCAAGCAAGCTTGAAG-ATTTTTATCCTTGGAAAGGAAGATCAAGAAGAGCTTCTAAGAAGCTTTCTTGATGGA

A few months later, I left the John Innes Centre for the University of York and got involved in new projects, so I never published this strange sequence. Last week, I started to look at conservation of the 16S sequence in the 429 Rlc genomes, but was motivated to dig out my old lab records because I saw a similar ‘extra’ sequence in a few genomes. In fact, not just similar, but identical, apart from an additional ‘G’ where I have shown ‘-‘ in the SP18 sequence (almost certainly, this was an error in our manual sequence, which was based on a single read). There are 11 genomes with the extra sequence; they are all in genospecies O, P and Q, but not all genomes in these genospecies have it.

The first 16 bases of this long ‘loop’ sequence are complementary to the last 16 (except a couple of ‘bulges’), so would be expected to extend the stem structure, but what kind of secondary structure would be adopted by the rest of the sequence is unclear. This is what I got when I sent the sequence to an RNA structure prediction site (http://rna.urmc.rochester.edu/RNAstructureWeb/):

The red part at the bottom is the conserved stem shown in the previous figure; the rest of the structure is speculative.

I am hoping that you, my readers, can help me here. I think I have seen publications fairly recently that have described similar ‘long’ sequences in this location of 16S, but I cannot remember where. Can someone point us to relevant papers? I also have a suspicion that the 16S rRNA may be cleaved within this sequence and exist as two disconnected strands within the ribosome, but I can’t remember whether someone else showed that or it was our own unpublished observation of an unexpected pattern of rRNA bands in nucleic acid preps.

All this is something of a digression. I just wanted to record the 16S sequences of all the strains because this is something that taxonomists like to look at, and I thought the result was going to be boring and uninformative. It turns out that there is more 16S sequence variation than I expected. There are also a few genome assemblies with broken 16S sequences or no 16S at all (!), and it is taking me a while to sort those out, so the ‘boring’ consideration of 16S variation will have to wait until the next post.

September 30, 2020

Rhizobium leguminosarum 19

Some more information

Many thanks to everyone who responded so quickly to my request for information on the country and host of origin, and especially to Marta Maluk who not only dealt with her own JHI strains but with many others as well. We now have a fairly complete list, and the few remaining gaps are not too important. The Google Sheet is still here, but if you have some changes to suggest, please let me know directly, because I have already downloaded the current state of the spreadsheet and may not notice any further changes on the Google Sheet. My main aim was to get a sense of whether some genospecies were confined to certain regions or hosts. For those genospecies with many strains, this is not generally true, apart from gsA, which only includes clover symbionts so far, though from various locations.

I have searched the genomes for matches to NodD, NodA and NodC sequences representing the three symbiovars viciae, trifolii and phaseoli. This is a useful complement to documentation of host of origin. There are a few isolates that appear to have lost their symbiosis genes in cultivation between isolation and genome sequencing. This is something that has been observed before – it seems that not all symbiosis plasmids are fully stable in culture.

Here is the phylogeny with the addition of the symbiovar data from this nod-gene search, and the strain names have been added, too.

I have checked the species assigned to all those strains that are included in the GTDB (http://gtdb.ecogenomic.org/). Some of the more recent accessions are not there yet, but there is good agreement for those that are. GTDB divides the Rlc into ten species plus two single-strain ‘species’, lumping together some of the closely related species and unique strains that are borderline but I have argued for keeping separate. For example, they place the whole F-clade in s__Rhizobium laguerreae. There are no direct conflicts between the two schemes, though. Here is the equivalence table.

Genospecies	GTDB_species
anhuiense	s__Rhizobium anhuiense
L	s__Rhizobium leguminosarum_D
M	s__Rhizobium leguminosarum_I
C	s__Rhizobium leguminosarum_C
D + CC278f + Norway	s__Rhizobium leguminosarum_K
E	s__Rhizobium leguminosarum
H	s__Rhizobium leguminosarum_J
A	s__Rhizobium leguminosarum_E
WYCCWR10014	s__Rhizobium sp001657485
Tri-43	s__Rhizobium leguminosarum_M
G	not represented
S	not represented
I	not represented
Q, WSM1689, CCBAU10279, R, P, O, N	s__Rhizobium laguerreae
Vaf12	s__Rhizobium sp005860925
K, J, B	s__Rhizobium leguminosarum_L

Their taxonomy includes three further species that sound as though they ought to be in the Rlc but are actually more distant. Their s__Rhizobium leguminosarum_G covers WSM2297, which is somewhere close to R. hidalgonense. Their s__Rhizobium leguminosarum_A is for OV483, which is so far away that it is not even in the leguminosarum-etli clade. Their s__Rhizobium sophorae is actually R. sophoriradices – an unfortunate mistake that arose because the first version of the R. sophorae genome was not from the right strain.

I can also bring you, hot off the press, my summary figure of the ANI evidence for the 10 genospecies. I have included some of these individual plots in earlier posts, but now we have all 18 plots, in glorious colour. Each plot shows, in rank order, the ANI values for all 440 strains against the reference strain for that genospecies. Larger symbols indicate strains that belong to the genospecies in question, and the colours match the genospecies throughout. It took a few hours of battling with the intricacies of Seaborn FacetGrid to get to this point, but I think the result is pretty.

By the way, the figures in this blog are PNG files that you can download and save (using the right-click menu) so that you can take a closer look at them.

That’s all for now.

September 24, 2020

Rhizobium leguminosarum 18

No comment

Last week, I asked my reader(s) for comments on what I had done so far, and ideas for further analysis. So far, I have received no response. Zero. It seems that nobody else is interested in defining the Rlc, and all my readers have deserted me. It may be a single-author publication, after all.

There are some small tasks that I will need help with, such as providing the country of origin and isolation host for every strain – something that the people who submitted the genomes are best placed to do. I have created a Google Sheet here that you can add the information to. If that doesn’t work, or demands that you create a Google account, just let me know and I will email you the file. Suggestions for more sophisticated analyses are also welcome.

Meanwhile, I have refined the list of genomes to incorporate the new ones and eliminate duplicates and erroneous genomes that do not correspond to the strain. That leaves 440 genomes altogether: 429 are Rlc and 11 in the R. anhuiense outgroup. I have repeated the analyses using this final set. I used the colours defined by Cavassim et al. 2020 (https://doi.org/10.1099/mgen.0.000351) for genospecies A to E, and chose colours for the 13 new genospecies. I worked out how to get the ANI plot in the same order as the phylogeny, and to add keys for the genospecies colours. Here are the results.

Fig: Phylogeny based on 120 core genes.

Fig: Pairwise ANI values for all genomes in the Rlc and R. anhuiense, showing genospecies assignment.

Fig: ANI values, as in previous figure, but showing values > 96% in black, 95-96% in grey.

September 15, 2020

Rhizobium leguminosarum 17

Questions for you

So far, I have identified the Rhizobium leguminosarum species complex (Rlc) as a clearly-defined cluster with over 400 genomes that can be split into 18 putative genospecies plus 7 single strains that have no close relatives. I used a phylogeny of 120 core genes made with fasttree, and Average Nucleotide Identity values based on whole genomes calculated with fastANI. What else should we do to make a convincing and useful description of the Rlc? The aim is to define a set of well-supported genospecies that others can readily assign new strains to, and to set clear criteria for defining additional genospecies in the future.

Should we make a phylogeny using a different phylogenetic method, or a different set of core genes? If so, which?
Should we calculate pairwise genome similarity using a different metric, or different software to calculate ANI?
Should we look at all the non-core genes, to identify sets of genospecies-specific genes?
Should we look at recombination rates, to see whether these are higher within than between species? If so, how?
Should we look at plasmid distributions?
Does “species complex” convey the right level of divergence to describe the Rlc? How is the term “species complex” used for other groups of species, and how closely related are the species within them?
What about the single strains with no close relatives? Are they just the first known members of additional genospecies, or are they some kind of short-lived ‘hybrid’ between species, or are they genomes that were not well assembled for some reason? How can we tell?
What other questions do we need to answer?

The results so far are based on the genomes available from NCBI on 25 July 2020. I have kept an eye on new releases, and there have been an additional 30 genomes labelled “R. leguminosarum”. I have checked them by fastANI, and 19 are in the Rlc, in genospecies A, B, C and E, so I will add them to the final analyses. The other 11 are outside the Rlc, so we can add them to the list of mislabelled strains and forget about them. Here is the list.

R._leguminosarum_DSM_106839_GCF_014202125.1.fna	E
R._leguminosarum_DSM_30141_GCF_014138565.1.fna	E
R._leguminosarum_RCAM0610_GCA_014189555.1.fna	E
R._leguminosarum_RCAM0626_GCA_014189575.1.fna	C
R._leguminosarum_RCAM1365_GCA_014189635.1.fna	A
R._leguminosarum_RCAM2802_GCA_014189655.1.fna	C
R._leguminosarum_SEMIA_4011_GCF_014205785.1.fna	not in Rlc
R._leguminosarum_SEMIA_4016_GCF_014200035.1.fna	not in Rlc
R._leguminosarum_SEMIA_4022_GCF_014200055.1.fna	not in Rlc
R._leguminosarum_SEMIA_4024_GCF_014200075.1.fna	not in Rlc
R._leguminosarum_SEMIA_4025_GCF_014207035.1.fna	not in Rlc
R._leguminosarum_SEMIA_415_GCF_014197955.1.fna	not in Rlc
R._leguminosarum_SEMIA_416_GCF_014197975.1.fna	E
R._leguminosarum_SEMIA_421_GCF_014198005.1.fna	not in Rlc
R._leguminosarum_SEMIA_422_GCF_014198335.1.fna	not in Rlc
R._leguminosarum_SEMIA_430_GCF_014198015.1.fna	not in Rlc
R._leguminosarum_SEMIA_445_GCF_014198115.1.fna	E
R._leguminosarum_SEMIA_449_GCF_014198095.1.fna	E
R._leguminosarum_SEMIA_459_GCF_014198415.1.fna	E
R._leguminosarum_SEMIA_460_GCF_014138515.1.fna	E
R._leguminosarum_SEMIA_463_GCF_014198545.1.fna	E
R._leguminosarum_SEMIA_475_GCF_014198665.1.fna	B
R._leguminosarum_SEMIA_481_GCF_014198655.1.fna	E
R._leguminosarum_SEMIA_482_GCF_014198705.1.fna	not in Rlc
R._leguminosarum_SEMIA_483_GCF_014198695.1.fna	E
R._leguminosarum_SEMIA_485_GCF_014198735.1.fna	E
R._leguminosarum_SEMIA_488_GCF_014206965.1.fna	E
R._leguminosarum_SEMIA_491_GCF_014198795.1.fna	not in Rlc
R._leguminosarum_SEMIA_498_GCF_014198195.1.fna	E
R._leguminosarum_SEMIA_499_GCF_014198835.1.fna	E

There is also a new R. laguerreae, but it is just another version of the type strain under a different name. There is a strain, R. sp. WYCCWR11317, that is a new member of gsS. There is a corrected UPM1135. If anybody knows of other new accessions within the Rlc, or is aware of important new genomes that are just about to be made public, please let me know.

I hope I still have some readers out there to answer these questions, because this is a project that is important for the whole community of researchers who study R. leguminosarum and its relatives, and I would like to create a publication that will have wide support. I look forward to being overwhelmed by all your comments!

September 11, 2020

Rhizobium leguminosarum 16

Average Nucleotide Identity

So far, I have only shown ANI values using selected reference strains. Now that we have some potential genospecies that look reasonable in the phylogeny, it is time to see how well they are supported by ANI. I set fastANI running on my trusty iMac to calculate all the ANIs between the 424 distinct genomes in the Rlc. Almost 12 hours later, it came back with 179776 numbers. Here they are:

I was not able to get the strains in the same order as in the phylogeny presented last time, but the order is based on the same phylogeny (it is the order of strains in the Newick file that describes the phylogeny). Yellow and red colours indicate ANI > 96%, blue colours are ANI < 95%, while values in the range 95-96% are greenish. You can see that the genospecies that we defined in the phylogeny stand out as orange-red squares in this ANI plot. I have marked the larger ones – the rest are there, but too small to label at this scale. You can see that strains generally have low (blue) ANI with members of other genospecies. You can also see that the F-clade looks a little less well resolved, as it also did in the phylogeny. Rhizobium anhuiense is very clearly an outgroup, dark blue with all the Rlc strains.

We found that members of a genospecies generally had ANI of 96% or above with the representative strain. Here is the same set of ANI values, but shown with a threshold at 96%:

Now the genospecies are very clear – they are solid red squares on a clean background. Two strains just above gsB are an exception: they exceed 96% ANI with about half the gsB strains. These are WSM1455 and WSM1481 (gsJ), which we had already noted as very close to gsB (Rhizobium leguminosarum 11). There are also two strains in the F-clade that have ANI>96% to all members of both gsQ and gsR, although these genospecies are otherwise distinct. We saw this issue earlier, in Rhizobium leguminosarum 13: “we have already assigned SPF2A11 and HP3 to gsQ. In fact, they have ANI > 96 to the reference strains of both clade Q and clade R”. The phylogeny places them in gsQ, but it would be good to know how robust this is, and why these strains have such high ANI with two sets of strains that are otherwise distinct.

The threshold of 96% ANI was chosen because it represented natural gaps in the data, but it is at the top end of the range (95-96%) that taxonomists usually consider to be appropriate for separating species. What would it look like if we set a threshold of 95% instead? Here we are:

This looks a lot messier. There is partial overlap between gsD and gsE. Genospecies B has swallowed gsJ, gsK and gsI (R. indicum), but with some internal gaps. The F-clade has coalesced into a single group, but with a lot of missing internal points. There are a few more red dots scattered on the background. The other small genospecies are still very distinct. If we adopted this lower threshold, we could reduce the number of genospecies that we defined within the Rlc, though there would still be at least ten, and we would create numerous ambiguities and anomalies. It looks to me that ANI>96% gives a much clearer picture that reflects some real structure in the data, and we just have to accept that there are 18+ genospecies in the Rlc.

September 8, 2020

Rhizobium leguminosarum 15

Colouring in the phylogeny

Now it is time to put together all the fragments of the Rlc phylogeny that we have seen in recent posts, and see the whole picture. Here it is.

Each of the 18 coloured sections is one of the potential genospecies we have defined, and there are just 7 strains that do not fit into any of these. To orient you, the genospecies A-E are (moving clockwise round the circle):

gsC: light blue
gsD: light green
gsE: mid green
gsA: pink
gsB: orange

The F-clade (with 5 genospecies) is in shades of brown.

The outgroup, R. anhuiense, is dark grey.

The other genospecies can be identified by comparing this tree to those in the previous posts. It is the same phylogeny.

I made this tree with iTOL on the web (https://itol.embl.de/). It is the first time I have used this, but it seems potentially powerful. I have a lot to learn. I tried to export a legend for the colours, but this function did not seem to work, despite selecting the option. The colours were assigned in a hurry and are entirely arbitrary and certainly not the final choice.

The genospecies vary in the number of genomes that they cover from 170 strains in gsC to just 2 in gsJ, gsP and gsS. Of course, this is influenced by sampling bias and may not reflect the relative sizes of the total populations of each species in the world. The 18 genospecies are clades on branches that are well supported and are generally fairly long relative to those within the genospecies, which is good as it means that they have well-defined boundaries. It is true, though, that the genospecies vary in apparent ‘depth’. Genospecies C starts closer to the common ancestor than other genospecies – one could argue that it should be split up to make it more comparable with the others, though the ANI values do not justify this. If we accept that branch lengths on the phylogeny reflect differences in evolutionary rate, it appears that gsC is evolving relatively slowly, and the F-clade is faster, so a given ANI value reflects more evolutionary time for gsC than for the F-clade. Using ANI as a criterion means basing species on the amount of sequence divergence, rather than on the length of time needed to reach that divergence. I think it can be argued that this is a reasonable choice. On the other hand, the Genome Taxonomy Database (GTDB http://gtdb.ecogenomic.org/) normalises for differences in evolutionary rate and requires the boundaries of each taxonomic level to fall within a certain band of relative distance from the root of the tree to the branch tips. We will consider how GTDB divides up the Rlc in a future post.

Eighteen genospecies is a lot for people to get used to. Of course, we could amalgamate some with their neighbours to create a smaller number of genospecies, but this would create units in which some pairwise distances are greater than is usually considered appropriate for members of the same species. We will consider the ANI metrics in the next post.

September 4, 2020

Rhizobium leguminosarum 14

The F-clade was the last large group that we need to consider, but there are a number of small clades left over that we need to look at before we have finished our dissection of the Rlc.

Genospecies I – Rhizobium indicum

Here is the sister group to the gsB+gsJ+gsK+F-clade grouping:

There are five strains here, but two of them have two genome versions that I have included just to check that they agree.

Strain JKLM12A2 has recently been designated the type strain of a new species, R. indicum (Rahi et al. 2020 https://doi.org/10.1016/j.syapm.2020.126127), with JKLM13E as another member of the species. It is clear that the three WYCCWR strains in this clade also belong to this species, because their minimum ANI with JKLM12A2 is 97.91%. The next highest ANI values in the Rlc are Vaf12 at 95.15 (unique strain) and JHI2450 at 95.14 (gsK), both related to gsB (see Rhizobium leguminosarum 11), so R. indicum is well separated from other genospecies.

Genospecies G – Rhizobium sophorae

This is the next group out – the sister clade to gsI and its sister:

CCBAU03386 is the type strain of R. sophorae (Jiao et al. 1015 https://doi.org/10.1099/ijs.0.068916-0), and the other two CCBAU strains definitely belong to this species (ANI at least 98.66). On the other hand the ANI values for WYCCWR11279 (95.55) and WYCCWR11290 (95.50) are considerably below the level of 96% or above that we have consistently seen as a natural breakpoint in genospecies with more representatives, so I think we should consider these two strains as a separate genospecies, gsS. That leaves Tri-43, which has no close affinity to anything else. Its highest ANI is 94.82 with WYCCWR11146, in R. indicum.

WYCCWR 10014

I have looked through the entire phylogeny of the Rlc, and I can only find one strain that we have not yet discussed. WYCCWR10014 is all by itself as the sister taxon to the whole clade that we have just been considering (gsB+gsJ+gsK+F-clade+gsI+gsG+gsS). Its highest ANI is with strains in gsB and gsK, but these do not exceed 94.47.

So here is the cast list for today’s production:

gsI = R. indicum

R._sp._JKLM12A2_JKLM12A2_GCF_005862305.2.fna

R._sp._WYCCWR_11152_WYCCWR_11152_GCF_013087615.1.fna

R._sp._WYCCWR_11128_WYCCWR_11128_GCF_013416305.1.fna

R._sp._JKLM13E_JKLM13E_GCF_005860925.2.fna

R._sp._WYCCWR_11146_WYCCWR_11146_GCA_013591995.1.fna

gsG = R. sophorae

R._sophorae_CCBAU_03386_GCF_013087515.1.fna

R._leguminosarum_CCBAU33195_GCF_012276535.1.fna

R._leguminosarum_CCBAU11080_GCF_012276545.1.fna

gsS

R._sp._WYCCWR_11279_WYCCWR_11279_GCF_013087625.1.fna

R._sp._WYCCWR_11290_WYCCWR_11290_GCF_013426945.1.fna

unique

R._leguminosarum_Tri-43_GCF_004123835.1.fna

R._sp._WYCCWR10014_WYCCWR10014_GCF_001657485.1.fna

Unless something has slipped through the net, I think that completes the first stage of the analysis – we have considered all the genomes that fall within the Rlc and classified them into groups that we are tentatively considering as possible genospecies. Next, I’ll try to present an overview of the findings.

Peter Young