摘要: |
厚朴为著名的传统药用植物,归于木兰科、木兰属,于我国广泛种植,其树皮、根皮、枝皮、叶片、花、果实均能入药或食用。为获取厚朴全基因组序列信息,该文以厚朴叶片DNA为材料,采用Pacbio Sequel第三代测序技术构建厚朴全基因组数据库,并利用生物信息学方法对获得的核苷酸序列进行组装、功能注释以及进化分析研究。结果表明:(1)原始测序数据过滤后获得140.91 Gb三代数据,Read N50约为13 784 bp,经过组装得到厚朴基因组大小为1.68 Gb,Contig N50约为222 069 bp,单拷贝基因完整性为81.0%。(2)组装后的序列通过与NR、KOG、KEGG等功能数据库比对,共有98.40%的基因得到了功能注释,其中KOG功能注释结果发现厚朴的蛋白功能主要集中在一般功能预测、翻译后修饰、蛋白质转换、伴侣以及信号转导机制; GO功能分类表明厚朴的基因集中在细胞组分及生物学过程; KEGG分析发现厚朴参与代谢通路的基因占主要地位。(3)通过与葡萄、拟南芥、水稻、杨树、银杏、无油樟、茶树及牛樟基因组的比对分析,发现厚朴23 424个基因中有20 801个基因可以分类到12 129个家族,其中有515个基因家族为厚朴所特有,而厚朴与牛樟(樟科)亲缘关系较近,两者的分化时间约在122.5百万年前(mya)。该研究首次利用第三代测序技术对厚朴全基因组解析,有利于对其进一步进行深入的开发与利用,也为研究其他药用植物全基因组奠定了基础。 |
关键词: 厚朴, 基因组, 第三代测序技术, 基因注释, 药用植物 |
DOI:10.11931/guihaia.gxzw201912037 |
分类号:Q943.2 |
文章编号:1000-3142(2021)08-1251-12 |
Fund project:四川省中医药管理局项目(2018QN001,2016ZY008); 中药学四川省科技厅创新团队(2017TD0001)[Supported by Sichuan Provincial Administration of Traditional Chinese Medicine Program(2018QN001, 2016ZY008); Innovation Team of Sichuan Science and Technology Department(2017TD0001)]。 |
|
Genomic sequencing analysis of Magnolia officinalis based on Pacbio's third-generation sequencing technology |
YIN Yanpeng, DING Qiaojiao, LUO Jiawei, LIN Xinna, ZHANG Min,
PENG Cheng, GAO Jihai*
|
Key Laboratory of Distinctive Chinese Medicine Resources in Southwest China, Pharmacy College,
Chengdu University of Traditional Chinese Medicine, Chengdu 611137, China
|
Abstract: |
Magnolia officinalis is a famous traditional medicinal plant, belonging to the Magnoliaceae family and Magnolia L. genus and being widely cultivated in China. Its barks, root barks, branch barks, leaves, flowers and fruits could be used as medicine or food. However, the whole genome information is little known for this plant species. In order to obtain the whole genome sequence information of M. officinalis, the leaf DNA was used as the material, and the third-generation sequencing technology of Pacbio Sequel was used to establish its nucleotide sequence database. Then genome assembly, function annotation and evolution analysis were carried out by bioinformatic methods. The results were as follows:(1)140.91 Gb the third-generation data were obtained after the original sequencing data, with the Read N50 about 13 784 bp. The assembled M. officinals genome size was 1.68 Gb, contig N50 being about 222 069 bp, and the integrity of single-copy gene being 81.0%.(2)98.40% of the genes from the assembled sequence got gene annotation after being compared with functional databases such as NR, KOG and KEGG. The result of KOG gene annotation was that the protein function of M. officinalis concentrated in the general functional prediction only, posttranslational modification, protein turnover, chaperones signal transduction mechanisms. GO functional classification indicated that the genes of M. officinalis concentrated on cell components and biological processes. KEGG analysis found that the M. officinalis genes mostly involved in metabolic pathways.(3)By comparative genomics analysis, the genomes of Vitis vinifera, Arabidopsis thaliana, Oryza sativa, Poplar trichocarpa, Ginkgo biloba, Amborella trichopoda, Camellia sinensis and Cinnamomum kanehirae were aligned. It was found that 20 801 of 23 424 genes in M. officinalis could be classified into 12 129 families, 515 gene families being unique to M. officinalis. The genetic evolution tree constructed from the genomes of the selected reference species pointed that the M. officinalis(Magnoliaceae)was closely related to Cinnamomum kanehirae(Lauraceae), and the divergence time between the two species was about 122.5 mya. It is the first time to use the third-generation sequencing technology to analyze the whole genome of M. officinalis in the study, which is conducive to its further development and utilization, and also provides the information for the study of the whole genome of other medicinal plants. |
Key words: Magnolia officinalis, genome, the third-generation sequencing technology, gene annotation, medicinal plant |