Regulation of mitophagy by the NSL complex underlies genetic risk for Parkinson’s disease: Bioinformatic Prioritisation and Hit Validation
Karishma D’Sa, Sebastian Guelfi, David Zhang, Alan Pittman, Daniah Trabzuni, Demis A. Kia, Nicholas W Wood, John Hardy, Claudia Manzoni, Mina Ryten
High Content Screening
Weighted Protein-Protein Interaction Network Analysis (WPPINA)
Parkinson's disease
GWAS
Putamen
Substantia Nigra
Allele-specific expression (ASE)
ASAPCRN
Abstract
This protocol describes the Bioinformatic Prioritisation of PD GWAS candidates for High Content Screening, and Hit Validation by allele-specific expression (ASE) analysis.
Steps
Selection of genes for High Content Screening
WPPINA
WPPINA analysis is reported in Ferrari et al., 2018 where the 2014 PD GWAS reported by Nalls et., 2014 was analysed; candidate genes where selected among those prioritised and with a linkage disequilibrium (LD) r2 ≥ 0.8.
Apply the same pipeline to the 2017 PD GWAS reported by Chang et al., 2017, to update the list of candidate genes.
Briefly, create a protein-protein interaction network based on the Mendelian genes for PD (seeds) using data from databases within the IMEx consortium (https://www.imexconsortium.org/).
Topologically analyse the network to extract the core network (i.e. the most interconnected part of the network).
This core network should contain the proteins/genes that can connect >60% of the initial seeds and are therefore considered relevant for sustaining communal processes and pathways, shared by the seeds.
Evaluate these processes with Gene Ontology Biological Processes enrichment analysis (http://geneontology.org/).
Use the top single nucleotide polymorphisms (SNPs) of the 2017 PD GWAS reported by Chang et al., 2017, to extract open reading frames (ORFs) in cis-haplotypes defined by LD r2 ≥ 0.8.
Match these ORFs to the core network to identify overlapping proteins/genes in relevant/shared pathways.
Complex Prioritisation
Obtain results of complex prioritization (neurocentric prioritization strategy) applied to the 2017 PD GWAS from the manuscript by Chang et al., 2017.
Coloc Analysis
Coloc analysis has been reported in the manuscript by Kia et al., 2019.
Calculate posterior probabilities for each gene for the hypothesis that both traits, the regulation of expression of a given gene and the risk for PD share a causal variant (PPH4).
Consider genes with PPH4 ≥ 0.75 to have strong evidence for colocalization.
Obtain summary statistics from the most recent PD GWAS reported in the manuscript by Nalls et al., 2019, and use these for regional association plotting using LocusZoom (https://locuszoom.sph.umich.edu/).
ASEs (Sites of Allele-Specific Expression)
Identify sites of allele-specific expression (ASE) as described by Guelfi et al, 2020 by mapping RNA-seq data to personalised genomes.
This approach is specifically chosen because it aims to minimise the impact of mapping biases.
Use RNA-seq data generated from 49 putamen and 35 substantia nigra tissue samples from the UK Brain Expression Consortium (https://ukbec.wordpress.com) for this analysis.
Only consider sites present in non-overlapping genes, and consider data from both the tissues together to increase power.
Mark ASE sites as sites with minimum false discovery rate (FDR) < 5% across samples.
Generate plots using Gviz3, with gene and transcript details obtained from Ensembl v92 (https://www.ensembl.org/index.html)