MSA 2025 – The Lentinus Project

Preliminary GWAS Results

msa-poster-2025 Download

BSLMM: Associated Genes

The Bayesian sparse linear mixed model produced the smallest number of significantly associated genes, mostly from the same regions identified by the linear model. This short list of genes includes ones involved in carbon metabolism and protein synthesis.

PCA Correlation	SNP ID	SNP Scaff.	SNP Pos.	Effect	Downstream Gene Scaff.	Downstream Gene Type	Downstream Gene Start	Downstream Gene Stop	Downstream Gene Strand	Downstream Gene ID	Downstream Gene Name	Upstream Gene Scaff.	Upstream Gene Type	Upstream Gene Start	Upstream Gene Stop	Upstream Gene Strand	Upstream Gene ID	Upstream Gene Name
aga	Chr1047:20020\T,\C	5	20020	-0.01993098	5	gene	18386	20564	+	rna-gnl\|WGS:QKKD\|L226DRAFT_mRNA480625	mannanase
aga	Chr1047:362588\A,\G	5	362588	-0.0141599863274	5	gene	362276	364189	–	rna-gnl\|WGS:QKKD\|L226DRAFT_mRNA456986	homoserine O-acetyltransferase
aga	Chr1047:779150\T,\G	5	779150	-0.02569876	5	gene	778480	779139	–	rna-gnl\|WGS:QKKD\|L226DRAFT_mRNA289929	hypothetical protein	5	gene	779329	781713	–	rna-gnl\|WGS:QKKD\|L226DRAFT_mRNA610287	hypothetical protein
aga	Chr1047:837099\G,\A	5	837099	-0.02058652	5	gene	835929	838479	+	rna-gnl\|WGS:QKKD\|L226DRAFT_mRNA531717	hypothetical protein
aga	Chr1047:959366\G,\A	5	959366	-0.0183342358056	5	gene	956932	959245	+	rna-gnl\|WGS:QKKD\|L226DRAFT_mRNA610344	hypothetical protein
aga	Chr1047:1082394\A,\G	5	1082394	-0.02622976	5	gene	1082045	1083561	–	rna-gnl\|WGS:QKKD\|L226DRAFT_mRNA551180	phosphopantothenate-cysteine ligase
aga	Chr1047:1134386\A,\G	5	1134386	-0.01627176	5	gene	1134353	1135371	+	rna-gnl\|WGS:QKKD\|L226DRAFT_mRNA531833	hypothetical protein
aga	Chr1047:1167402\T,\C	5	1167402	-0.0193230869364	5	gene	1163998	1167020	–	rna-gnl\|WGS:QKKD\|L226DRAFT_mRNA531846	hypothetical protein	5	gene	1167451	1168064	+	rna-gnl\|WGS:QKKD\|L226DRAFT_mRNA457574	UPF0041-domain-containing protein
aga	Chr1047:1317050\A,\C	5	1317050	-0.02262773	5	gene	1316530	1317709	–	rna-gnl\|WGS:QKKD\|L226DRAFT_mRNA568174	hypothetical protein
aga	Chr1047:1350097\T,\C	5	1350097	-0.01868771
aga	Chr1047:1423286\G,\T	5	1423286	-0.02649434	5	gene	1420120	1423186	–	rna-gnl\|WGS:QKKD\|L226DRAFT_mRNA610474	hypothetical protein	5	gene	1423453	1424456	–	rna-gnl\|WGS:QKKD\|L226DRAFT_mRNA610475	hypothetical protein
aga	Chr1047:1546379\T,\C	5	1546379	-0.0205141	5	gene	1546158	1550440	+	rna-gnl\|WGS:QKKD\|L226DRAFT_mRNA531964	RNA polymerase II-associated protein
aga	Chr1056:290190\T,\C	14	290190	-0.0183155	14	gene	281852	293292	–	rna-gnl\|WGS:QKKD\|L226DRAFT_mRNA613564	hypothetical protein
aga	Chr1056:301085\C,\T	14	301085	-0.028432977118								14	gene	301299	302875	–	rna-gnl\|WGS:QKKD\|L226DRAFT_mRNA464361	NAD(P)-binding protein
aga	Chr1056:372057\A,\G	14	372057	-0.02773642	14	gene	370578	372984	+	rna-gnl\|WGS:QKKD\|L226DRAFT_mRNA535672	S-adenosyl-L-methionine-dependent methyltransferase
aga	Chr1056:379782\G,\T	14	379782	-0.04061589
aga	Chr1056:577692\G,\A	14	577692	-0.02634238	14	gene	575623	579266	+	rna-gnl\|WGS:QKKD\|L226DRAFT_mRNA535743	hypothetical protein
aga	Chr1056:657424\G,\A	14	657424	-0.02265925
aga	Chr1177:1001\T,\C	135	1001	-0.02463647								135	gene	1352	3889	–	rna-gnl\|WGS:QKKD\|L226DRAFT_mRNA527722	hypothetical protein
sec	Chr1047:581962\C,\T	5	581962	0.02195912	5	gene	581677	582232	+	rna-gnl\|WGS:QKKD\|L226DRAFT_mRNA567930	hypothetical protein
sec	Chr1047:1533358\C,\T	5	1533358	0.0174058
sec	Chr1101:124187\A,\G	59	124187	0.01755269	59	gene	123327	124894	–	rna-gnl\|WGS:QKKD\|L226DRAFT_mRNA473141	hypothetical protein

GWAS with Updated Genome

Past work on Lentinus tigrinus used the publicly available Lenti7 genome. The results in the poster above use the same Lenti7 genome. The four scaffolds associated with the agaricoid/secotioid morphology look like they fit together, but without a better genome assembly we cannot be sure. Our lab recently produced a new genome for Lentinus tigrinus using Illumina data for the Lenti6 genome and new Nanopore data from Lenti6. We mapped the GWAS results onto this new genome to produce the following plots:

Manhattan plot: linear model SNP significance mapped onto Lenti6 genome

Manhattan plot: Bayesian sparse linear mixed model SNP effect mapped onto Lenti6 genome

Lenti6 BSLMM: Associated Genes

Six Bayesian sparse linear mixed model runs identified a collective 26 associated SNPs, mostly on scaffold 13 of the new Lenti6 genome and none of which were identified in the PCA analysis. The two known genes identified in this analysis were hexokinase and vacuolar protein sorting-associated protein 1.

Run	SNP ID	SNP Scaff.	SNP Pos.	Effect	Downstream Gene Scaff.	Downstream Gene Type	Downstream Gene Start	Downstream Gene Stop	Downstream Gene Strand	Downstream Gene ID	Downstream Gene Name	Upstream Gene Scaff.	Upstream Gene Type	Upstream Gene Start	Upstream Gene Stop	Upstream Gene Strand	Upstream Gene ID	Upstream Gene Name
0	Chr104:1032336\T,\A	4	1032336	0.0698733181312	4	gene	1032631	1033578	–	FUN_005221-T1	hypothetical protein
5	Chr1013:688090\C,\T	13	688090	-0.2489215	13	gene	687692	689188	–	FUN_010699-T1	hypothetical protein
5	Chr1013:696605\G,\A	13	696605	-0.2383887	13	gene	693661	696437	+	FUN_010701-T1	hypothetical protein	13	gene	699007	700773	–	FUN_010702-T1	vacuolar protein sorting-associated protein 1
2	Chr1013:696631\T,\G	13	696631	0.3346923	13	gene	693661	696437	+	FUN_010701-T1	hypothetical protein	13	gene	699007	700773	–	FUN_010702-T1	vacuolar protein sorting-associated protein 1
4	Chr1013:696677\C,\G	13	696677	0.203313200385	13	gene	693661	696437	+	FUN_010701-T1	hypothetical protein	13	gene	699007	700773	–	FUN_010702-T1	vacuolar protein sorting-associated protein 1
4	Chr1013:702665\T,\C	13	702665	0.23511962228	13	gene	702068	702784	–	FUN_010703-T1	hypothetical protein
1	Chr1013:708962\G,\C	13	708962	0.1137285
3	Chr1013:712112\T,\C	13	712112	0.2080436
3	Chr1013:759975\C,\G	13	759975	-0.2796142								13	gene	760036	761416	–	FUN_010719-T1	hypothetical protein
0	Chr1013:767076\A,\C	13	767076	0.487800357066	13	gene	766613	768561	+	FUN_010722-T1	hypothetical protein
1	Chr1013:770510\G,\A	13	770510	0.1488966	13	gene	768993	771310	–	FUN_010723-T1	hypothetical protein
1	Chr1013:773674\G,\A	13	773674	0.130354093192	13	gene	773584	774931	–	FUN_010724-T1	hypothetical protein
2	Chr1013:823573\G,\C	13	823573	-0.2246705	13	gene	822010	824549	+	FUN_010745-T1	hypothetical protein
5	Chr1013:828374\C,\G	13	828374	-0.1447514	13	gene	827711	828484	–	FUN_010747-T1	hypothetical protein
5	Chr1013:828701\T,\C	13	828701	-0.1704084	13	gene	827711	828484	–	FUN_010747-T1	hypothetical protein	13	gene	828798	829330	–	FUN_010748-T1	hypothetical protein
4	Chr1013:829242\C,\T	13	829242	0.194681302877	13	gene	828798	829330	–	FUN_010748-T1	hypothetical protein
0	Chr1013:834604\A,\C	13	834604	0.202942721124	13	gene	834394	836452	–	FUN_010751-T1	hypothetical protein
3	Chr1013:834604\A,\C	13	834604	0.1226208	13	gene	834394	836452	–	FUN_010751-T1	hypothetical protein
0	Chr1013:842534\G,\A	13	842534	-0.28680617856
4	Chr1013:848994\A,\G	13	848994	-0.252565932983	13	gene	846915	848958	+	FUN_010756-T1	hexokinase
1	Chr1013:854661\A,\C	13	854661	-0.143584635222								13	gene	855217	856061	+	FUN_010758-T1	hypothetical protein
1	Chr1013:854869\A,\C	13	854869	0.1346482								13	gene	855217	856061	+	FUN_010758-T1	hypothetical protein
3	Chr1013:862356\G,\A	13	862356	0.1544015	13	gene	862003	862757	–	FUN_010762-T1	hypothetical protein
5	Chr1013:862514\T,\C	13	862514	0.1712555	13	gene	862003	862757	–	FUN_010762-T1	hypothetical protein
1	Chr1013:865973\T,\C	13	865973	-0.1337842	13	gene	865503	867116	+	FUN_010764-T1	hypothetical protein
3	Chr1013:870821\T,\C	13	870821	-0.1515588	13	gene	871143	874676	+	FUN_010766-T1	hypothetical protein

Methods Details

The specimen SI.Lt.038 (aga/sec) was collected from the Ipswich River (Topsfield, MA, USA) in June, 2023. Spores were isolated from that specimen and grown out as monokaryons. The monokaryons were then test crossed to the tester strain FP.102501.T_SSI.5 (sec) and mushrooms were grown from the resulting dikaryons. Hymenophore morphology of these test crosses was used to genotype the SI.Lt.038 single spore isolates (SSIs). Two SSIs – SI.Lt.038_SSI.4 (aga) and SI.Lt.038_SSI.6 (sec) – were crossed together to produce the GWAS parent. From this parent, 150 SSIs were produced. The GWAS SSIs were then test crossed to FP.102501.T_SSI.5 and genotyped by fruiting.

DNA was extracted using a standard SDS extraction protocol and was cleaned using a Zymo DNA Clean & Concentrator-5 kit. Illumina sequencing was performed by Novogene, Nanopore sequencing was performed in-house on an R10.4.1 flowcell using the Ligation Sequencing Kit v14. SNPs were aligned and called using the standard GATK4 variant calling pipeline.

A new Lenti6 genome was produced by hybrid assembly of Illumina and Nanopore data. Reads were trimmed and quality controlled using FastP. Nanopore long reads were basecalled using dorado and the Super Accurate “SUP” algorithm. Hybrid assembly was then performed using MaSuRCA with the CABOG assembler. Post assembly, genome polishing was performed with POLCA and homozygous scaffolds were collapsed using Redundands.

The genome-wide association was conducted with vcf2gwas, a pipeline which uses GEMMA to do the statistical analysis. Samples GWAS_SSI.060, 096, and 007 were removed from analysis until their hymenophore morphologies can be re-checked. The program was run using either the Lenti7 genome and annotations available on NCBI or the newly produced Lenti6 genome and annotations. We supplied principal components 2-6 as covariates, which covered a combined 49% of variation. Principal component 1 was not included because it strongly correlated to the phenotype of interest. To calculate significantly-associated SNPs, GEMMA was run using the linear model, linear mixed model, and Bayesian sparse linear mixed model. For the linear model and linear mixed model, SNPs were considered significant if they had a p-value lower than the lowest p-value of any SNP not identified in PCA analysis. For the Bayesian sparse linear mixed model, SNPs were considered significant if the effect size was higher than the highest or lower than the lowest SNP not identified by PCA analysis. In cases where the greatest effects were from non-PCA identified SNPs, significant SNPs were considered to be the ones greater or less than the most extreme PCA-identified SNPs. The linear mixed model produced results similar to the linear model, except without intermediately-significant SNPs and so for simplicity those results are not shown here. At least 6 runs of the Bayesian sparse linear mixed model were conducted – the ones shown here are the most representative results of those runs. Results were visualized using ggplot2 in the R statistical computing software version 4.4.1.