Bioinformatics Analysis

Vasso Makrantoni, Daniel Robertson, Adele L. Marston

Published: 2021-09-22 DOI: 10.17504/protocols.io.bn4bmgsn

Abstract

A plethora of biological processes like gene transcription, DNA replication, DNA recombination, and chromosome segregation are mediated through protein–DNA interactions. A powerful method for investigating proteins within a native chromatin environment in the cell is chromatin immunoprecipitation (ChIP). Combined with the recent technological advancement in next generation sequencing, the ChIP assay can map the exact binding sites of a protein of interest across the entire genome. Here we describe a-step-by step protocol for ChIP followed by library preparation for ChIP-seq from yeast cells.

Before start

Steps

Bioinformatics Analysis

1.

Carry out all data processing on an Ubuntu 16.04 (xenial) operating system. Perform basecalls using Illumina Real-Time Analysis (RTA2) software on the MiniSeq System. Use FastQC to assess the quality of the raw sequence data (fastq reads), with fastq-screen used to detect any unwanted contamination.

2.

Aggregate all quality control reports with MultiQC [17]. Trim ChIP-seq paired end reads with cutadapt, remove any adapter sequence from the 3′ end of reads using standard Illumina adapter sequences. Also, perform quality trimming from the 3′ end using a user-defined cutoff (phred-33 quality 10).

3.

After adapter and quality trimming, remove any read less than the defined minimum length (30 bp). Map reads to both S. pombe calibration genome and S. cerevisiae w303 experimental genome, retaining only those reads that map to each reference.

4.

To obtain reads mapping only to SacCer W303; trimmed fastq reads should first be mapped with the MiniMap2 alignment tool [18] (“-ax sr” short genomic reads) to reference S. pombe , whereas unmapped S. pombe reads should be selected using SAMtools [19] (include SAM Flag -F 4) and convert back into fastq format (interleaved), those unmapped S. pombe reads can then be mapped to SacCer W303.

5.

Here, any unmapped reads can be filtered out using samtools (exclude SAM Flag -F 4) and remove rDNA regions from the section of chromosome XII which corresponds to the repetitive rDNA using BEDtools intersect [20], as this region is saturated with reads.

6.

To obtain reads mapping only to S. pombe the above process should be performed in reverse. The original trimmed reads should also be mapped to SacCer w303, select unmapped SacCer w303 reads using SAMtools, map those unmapped SacCer w303 reads to S. pombe , and filter unmapped reads out using SAMtools. Exclude mitochondrial DNA using SAMtools for both genomes.

7.

In order to visualize mapped reads, create bedGraphs from the aligned Binary Alignment Map (BAM) files using BEDtools genomeCoverageBed with reads per millions (RPM) normalization (calculated with custom script using SAMtools flagstat output) & use UCSC wigToBigWig to convert these into BigWigs.

8.

For meiotic samples, where SK1 strains were used, perform mapping to the SK1 genome, rather than SacCer3 as described above.

9.

To generate the calibrated ChIP bigWigs; use SAMtools flagstat to count reads mapping to SacCer3 w303 and S. pombe only for each sample, these values can then be used to calculate the Occupancy Ratio (OR) value as previously described [8]; WcIPx/WxIPc (W = Input; IP = chIP; c = calibration genome ( S. pombe ); x = experimental genome (sacCer w303)).

10.

Use each OR value to calibrate ChIP bedgraphs using BEDtools genomeCoverageBed and convert to bigWig with UCSC wigToBigWig. These bigWigs are viewable in a genome browser such as Integrative Genomics Viewer (IGV) [21] or the ensembl genome browser.

Note
All bigWigs from our published analyses are submitted to the Genome Expression Omnibus (GEO) archive and raw reads to the Sequence Read Archive (SRA).

推荐阅读

Nature Protocols
Protocols IO
Current Protocols
扫码咨询