Metagenomic, single cell and metatranscriptomic analysis

Starting Month: M7

Ending Month: M36

Objectives

UniquELT will determine the functional mechanisms of the sediment from the Etoliko lagoon with the use of advanced tools. Single Cell Genomics, metagenomics and metatranscriptomics will be deployed in which variants will be used to predict (a) the metabolic role of the most-dominant microbes, (b) the assembly of the sediment communities, (c) growth characteristics, (d) functional roles, (e) emergence of unique genetic traits and (f) sediment virome.

<aside> 💡 Based on the results from WP1 and WP2, at least five (5) depths will be selected for further analysis via metagenomic, single cell and metatranscriptomic sequencing. Libraries will be constructed using standard Illumina protocols.

</aside>

Description of Work & Tasks

Task 3.1 – Generation of Single-Cell Amplified Genomes M7-M26.

Task 3.2 – Metagenomic definition of the functional diversity of the Etoliko lagoon M13-M36.

Task Leader: Assoc. Prof. George Tsiamis, University of Patras

Single-cell sorting will be performed at Bigelow Laboratory Single Cell Genomics Center (see attached support letter) using its established protocols 40, while whole-genome amplification, and 16S rRNA gene PCR screening of single cells will be performed at the Laboratory of Molecular Genetics and Microbiology, University of Patras. Regarding, the Single Cell Genomics the experimental plan will be as follows: Fluorescence Activated Cell Sorting will be performed with an Influx Cell Sorter (BD Biosciences) into 384-well plates containing 0.9 μl of UV-treated TE. The cells will be stained with SYBR Green I (Invitrogen) and illuminated by a 488 nm laser (Coherent Inc.). The sorting window will be based on size determined by side scatter and green fluorescence (531/40 bp filter). For each plate, single cells will be sorted into 24 columns, including three columns of negative controls and one column of positive controls with 100 and 10 cells per well. The sort purity will be determined by depositing a predefined number of 2 μm fluorescent beads on glass slides and the results will be checked via epifluorescence microscopy to verify the presence of the correct number of beads. Sorted cells will be lysed for 10 min at room temperature using alkaline solution from the Repli-G UltraFast Mini Kit (Qiagen) according to manufacturer's instructions. After neutralization, the samples will be amplified using the Repliphi Phi29 reagents (Epicentre). For each library, sequencing error rates will be quantified by analyzing the sequencing reads from the phiX174 phage genomic DNA that will be added to all sequencing runs as an internal control as part of the standard Illumina sequencing protocol. Single Amplified Genomes (SAGs) will be sequenced on a HiSeq instrument with paired-end 2×250-bp chemistry. Draft sequences for SAGs will be analyzed for genome completeness based on a set of conserved single copy genes. For the phylogenetic analysis, we will scan the assemblies for homologs of a set of 38 marker genes and generate a series of maximum likelihood marker gene trees in order to detect robust relationships between phyla and explore possible superphyla formations. The marker gene tree results will be compared to 16S rRNA gene phylogenies. We will reconstruct the main metabolic features of the SAGs including a scan for sugar and amino acid degrading enzymes, autotrophic pathways, enzymes involved in the energy metabolism, and the environmental response. We also anticipate discovering novel and unusual metabolic features when screening these unique SAGs, as we have reported previously for other candidate phyla.

For the metagenomic approach the Illumina platform will be used. Briefly, Illumina libraries, 1ug of genomic DNA will be fragmented by nebulization and column purification (Qiagen). The DNA fragments will be treated with end repair, A-tailing, and adapter ligation using the Illumina Genomic DNA Sample Preparation Kit (Illumina). The ligated products will be gel size selected. The purified products will be enriched with 12 cycles of PCR. The prepared sample libraries will be quantified using KAPA Biosystem’s next-generation sequencing library qPCR kit and run on a Roche LightCycler 480 real-time PCR instrument. The Illumina libraries will then be prepared for sequencing on an Illumina HiSeq sequencing platform utilizing a TruSeq paired-end cluster kit, v3, and Illumina’s cBot instrument to generate a clustered flowcell for sequencing. Sequencing of the flowcell will be performed on the Illumina HiSeq 2000 sequencer using Illumina TruSeq SBS sequencing kits, v3, following a 2x250 indexed high-output run recipe. Metagenomic data will be analyzed using the IMG system (DOE-JGI). From the anticipated datasets, we will construct complete or partial prokaryotic and single eukaryotic genomes. Using Hidden Markov Models and publicly available datasets we will functionally annotate detected genes, recognize signal peptides and attempt to detect the localization of the enzymes. Furthermore, we will use KEGG 64 and MetaCyc 65 databases to identify the metabolic potential, reconstruct catabolism and respiration strategies. We will reconstruct the main metabolic features of the most dominant bacterial/archaeal strains including a scan for sugar and amino acid degrading enzymes, autotrophic pathways, enzymes involved in the energy metabolism, and the environmental response. Metagenomic data will be analyzed using the IMG system 66. From the anticipated datasets, we will construct complete or partial prokaryotic and single eukaryotic genomes. Using Hidden Markov Models and publicly available datasets we will functionally annotate detected genes, recognize signal peptides and attempt to detect the localization of the enzymes. The viral sequence repository IMG/VR will be used as a reference for this mapping exercise for the viral grouping step and will be used to hint at the presence of viruses with low abundance in a target metagenome.

Task 3.3 – Metatranscriptomic definition of the functional diversity of the Etoliko lagoon M14-M36.

Task 3.4 – Communication / Dissemination activities M22-M36.

Task Leader: Assoc. Prof. George Tsiamis, University of Patras

For the metatranscriptomes, RNA from the same samples as T3.2 will be isolated using the MOBIO RNA Power Isolation kit. mRNA enrichment will be achieved using subtractive hybridization as it has been found to be more effective in preserving the relative abundance of different transcripts 67. In most cases, Soil RNA extraction typically yields small amounts of mRNA; in this case an additional linear ampliﬁcation step will be introduced in order to achieve sufficient starting material for downstream applications. A random-primed cDNA library will be prepared. Briefly, total RNA will be first treated with 5-P dependent terminator exonuclease (Epicentre) to enrich for full-length mRNA carrying 5cap or triphosphate structures. Then, first-strand cDNA will be synthesized using a N6 random primer and M-MLV-RNaseH reverse transcriptase, and second-strand cDNA synthesis will be performed according to the Gubler-Hoffman protocol. During, the bioinformatics pipeline analysis, sequences would be assigned a description by comparison with publically available databases, such as the National Centre for Biotechnological Information (NCBI) non-redundant (nr) database (http://www.ncbi.nlm.nih.gov/), the Integrated Microbial Genomes database (IMG/M; http://img.jgi.doe.gov/) and the Metagenomics Analysis Server (MG-RAST, http://metagenomics.anl.gov). Mapping reads to known sequences from T4.3 will also allow to determine whether genes are up- or down-regulated, gene frequencies would be normalized by the gene abundances within a coupled genome/metagenome from the same nucleic acid extraction. In order to obtain more useful information reads will be mapped to custom databases generated using the metagenomic data from the previous subtask. Mapping will be performed using a range of tools, including the Burrow-Wheeler Aligner (BWA, http://bio-bwa.sourceforge.net/) or the Blast-Like Alignment Tool. Functional categorization of transcripts will be obtained using the Kyoto Encyclopedia of Genes and Genomes (KEGG), the Clusters of Orthologous Groups (COGs), and the evolutionary genealogy of genes: Non-supervised Orthologous Groups (eggnog) databases. These databases consist of groups of genes that have been assigned to different functional pathways (e.g. denitriﬁcation or nitrogen ﬁxation) based on the similarity of protein orthologs from sequenced isolate microbial genomes. A BLAST against these databases will assign transcripts with signiﬁcant similarity to functional pathways. This approach will assist us to determine whether whole pathways, rather than single genes, are differentially expressed between treatments.

Scientific conferences and other external events: The main conferences in the field, at which the participation of UniquELT partners is expected, are: Bacterial Genetics and Ecology (BAGECO), International Society for Microbial Ecology (ISME), Ecology of Soil Microorganisms, FEMS Microbiology, etc. A standard presentation template will be prepared.

Scientific publications: open access to all the scientific publications will be ensured.

1-day workshop: for training on the use of the omic technologies will be organised.

Milestones 3.1 in silico decontamination of Single Amplified Genomes (M26) - 3.2 Metabolic profiling of the Etoliko lagoon (M34)

Deliverables

3.1 Single amplified genomes (T3.1; M26) - 3.2 Metagenomic and meta-metatranscriptomic libraries (T3.2; M18) - 3.3 Metabolic features of dominant microorganisms (T3.1 & 3.2; M36) - 3.4 Sediment virome (T3.2; M36) - 3.5 One publication to peer review journal (T3.4; M36) - 3.6 Organization of a workshop in omic approaches (T3.4; M26) - 3.7 Participation to an international conference (T3.4; M26)