Gene expression variablity in breast tumours

Project Description

In New Zealand, approximately 3300 women are diagnosed annually with breast cancer and up to 30% show a patterns of heritability. Pathogenic variants in the genes BRCA1 and BRCA1 account for the majority of inherited disease. Women whom carry a pathogenic varaints have a significant increase in lifetime risk for both breast and ovarian cancers.

BRCA1 and BRCA2 lifetime risk

Gene expression profiles have been extensively used to classify breast tumours including the prediction of intrinsic subtypes and response to treatments. However, despite several attempts, there has been limited success in identifying gene expression profiles related to BRCA1 and BRCA2 pathogenic variant status.

Here, we explore gene expression variability in three independent familial breast cancer dataset with germline BRCA1 and BRCA2 status. Additionally, we explore one large meta-cohort with gene expression data on 2116 individuals with breast cancer.

App description

This app presents the data generate during my thesis on gene expression variability. Firstly, global gene expression is displayed in the 'Gloabl Expression' tab, these are static images displaying tumour differences in gloabl statistics. Differentially varaiblity and expression can be veiwed in the 'Differential Var' tab, where boxplots illustrate expression in tumour groups and data table presents the statistics. The 'gene of interest' option in the side bar will allow users to view boxplots of specific genes. However, the data table will need to be searched using the search function.

Data acquisition and normalisation

Global expression analysis

Gene expression analysis

Differential variable (Dvar) gene expression analysis was performed on each of the four datasets for given comparisons (e.g. BRCA1 -associated breast tumours compared to BRCAx/Sporadic breast tumours). Dvar was calculated using the Brown-Forsythe test, a robost test similar to the levenes test, on each gene that passed filtering.

The median absolute deviation is calculated as \(Z_{ij} = |y_{ij}-y_{j}.|\) where \(y_{j.}\) is the median in group \(j\), and \(W\) (Brown–Forsythe test-statistics) is defined by: \[ W = \frac{(N - k) \sum_{i=1}^{k}N_{i}(Z_{i.}- Z_{..})^2}{(k - 1)\sum_{i=1}^{k}\sum_{i=1}^{N_{i}}(Z_{ij}-Z_{i.})^2} \] Where \(N\) is the number of samples, \(k\) is the number of tumour groups compared (2), \(N_i\) is the number of samples in group \(i\) , \(Zi\) is the mean of the absolute deviation from the medians for group \(i\), \(Z_{..}\) is the mean of the absolute deviations from all samples from their respective group medians and \(Z_{ij}\) is the absolute deviation from the median for sample \(j\) from group \(i\). The resulting \(W\) statistics follows the F-distribution with degrees of freedom \(df1 = k-1\) and \(df2 = N-k\).

Similarly, a modified t statistic was used to calculate differentially expressed genes on the same tumour comparisons. p -values were adjusted using the Benjamini-Hochberg (BH) procedure. Boxplots of each probe for respective gene of interest are displayed for each avaliable datasets

Probe level boxplots

Statistical analysis (Gene-level)

p-values were adjust by the BH method