We offer analysis from a growing list of high-throughput data types, including:
- Single-cell/bulk RNA/DNA sequencing for gene expression, single nuclei RNA-seq, ribometh-seq, ribosome and polysome profiling, methyl-seq, ChIP-seq, ATAC-seq, metagenomics, variant analysis
- Mass spectrometry for proteomics including post-translational modifications (e.g. phosphoproteomics, ubiquitomics), metabolomics, lipidomics
- Microarrays for gene expression, SNPs, protein arrays
For sequencing data, we first, trim adapters, align and quantify expression. Once we have an abundance table, we typically normalize, perform quality control, account for missing values (for mass spectrometry data), assess clustering with principal component analysis (PCA) or t-distributed stochastic neighbor embedding (t-SNE), test association to phenotype or differential abundance between groups for analytes (e.g. genes) and pathways, and produce visualizations such as heatmaps, volcano plots, violin plots, and interactive plots. We produce a report from this analysis with a description of our methods and results.
This pipeline takes approximately 5 hours if we are given normalized data set, 10 hours for common raw data sets (such as bulk RNA-seq and metabolomics) but for single-cell data it takes approximately 20 hours. These costs are insensitive to the data set’s sample size.
Additional features:
- Sample size and power calculations for high-throughput studies
- Clustering and classification
- State-of-the-art reproducible workflows
- Analysis and meta-analysis of public data
- Network analysis
- Integration of multiple data types, including clinical covariates
- Causal inference testing (AKA mediation analysis)
- Metabolic flux inference, e.g. from Seahorse assays
- (For Joslin investigators) an in-house searchable gene expression database with >75 studies and seminars on the free R software