摘要:本笔记目的在于介绍如何使用 easyTCGA 。
# getmrnaexpr() 获取数据集
getmrnaexpr(project) |
project可以是任何 TCGA 项目自动下载并整理
mRNA和lncRNA的counts,tpm,fpkm共 6 种表达矩阵(直接从官网的原始数据提取,未进行任何修改,所以是没有经过 log 转换的),以及对应的临床信息,临床信息样本顺序和表达矩阵样本顺序完全一致,无需再次整理自动保存以上 6 种表达矩阵和临床信息到当前工作目录下的
output_mRNA_lncRNA_expr文件夹下,并且同时保存rdata和csv两种文件格式
library(easyTCGA) | |
getmrnaexpr('TCGA-BRCA') |
其他函数使用方式一致:
getmrnaexpr_xena用于
XENA网站下载的 TCGA 基因表达数据和临床信息的整理(仅限gdchub)直接提供文件名即可,比如:
TCGA-ACC.htseq_counts.tsv.gz, TCGA-ACC.htseq_fpkm.tsv.gz,TCGA-ACC.GDC_phenotype.tsv.gz, TCGA-ACC.survival.tsv自动保存
mRNA、lncRNA表达矩阵和临床信息到当前工作目录下的output_mRNA_expr_xena文件夹下id 转换使用
gtf 22,和XENA保持一致(单独使用 XENA 的表达谱数据和直接用 GDC 官网数据相比没有任何优势)
getmirnaexpr只需要提供正确的
TCGA project名字即可自动下载并整理
miRNA的counts,rpm2 种表达矩阵自动保存以上 2 种表达矩阵和对应的临床信息到当前工作目录下的
output_miRNA_expr文件夹下,并且同时保存rdata和csv两种文件格式下载的数据为最新数据,和
GDC TCGA官网保持一致
getsnvmaf只需要提供正确的
TCGA project名字即可自动下载并整理
TCGA MAF文件 (masked somatic
mutation) 以及对应的临床信息,并自动保存到当前工作目录下的output_snv文件夹下输出结果可以直接通过
maftools::read_maf()函数读取,无需再次整理
getcnv只需要提供正确的
TCGA project名字即可自动下载并整理
copy number variation数据;数据保存到当前工作目录下的output_cnv文件夹下下载的数据为最新数据,和
GDC TCGA官网保持一致
getmethybeta只需要提供正确的
TCGA project名字即可自动下载并整理
450K的DNA methylation的beta值矩阵,以及对应的临床信息,数量和顺序完全一致,无需再次整理自动整理探针信息,比如探针对应的
gene symbol等,基于GRCh 38数据保存在当前工作目录下的
output_methy文件夹下下载的数据为最新数据,和
GDC TCGA官网保持一致
getpancancer_xena实现对泛癌数据的整理,支持
TCGA、GTEx,以及整合TCGA+GTEx只需提供相应的表达矩阵文件和样本信息文件即可
# diff_analysis() 差异分析
与
getmrnaexpr,getmirnaexpr,getmrnaexpr_xena函数无缝对接,直接使用其输出结果即可,无需任何整理(默认对 tumor 和 normal 组进行差异分析)支持
count, tpm, fpkm和GEO数据,如果是count则自动通过 3 个 R 包进行差异分析:DESeq2,edgeR,limma;如果是其他类型(tpm, fpkm和基因表达芯片数据)会自动判断是否需要log2(x + 0.1)转换,然后使用limma和wilcoxon test做差异分析
load("D://Rlearn/TCGA/output_mRNA_lncRNA_expr/TCGA-BRCA_expr.rdata") | |
data |
class: RangedSummarizedExperiment
dim: 60660 1231
metadata(1): data_release
assays(6): unstranded stranded_first ... fpkm_unstrand fpkm_uq_unstrand
rownames(60660): ENSG00000000003.15 ENSG00000000005.6 ...
ENSG00000288674.1 ENSG00000288675.1
rowData names(10): source type ... hgnc_id havana_gene
colnames(1231): TCGA-E9-A1RH-01A-21R-A169-07
TCGA-C8-A26W-01A-11R-A16F-07 ... TCGA-HN-A2OB-01A-21R-A27Q-07
TCGA-A8-A09M-01A-11R-A00Z-07
colData names(88): barcode patient ... paper_PARADIGM Clusters
paper_Pan-Gyn Clusters
``` r
load("D://Rlearn/TCGA/output_mRNA_lncRNA_expr/TCGA-BRCA_mrna_expr_counts.rdata")
diff <- diff_analysis(
mrna_expr_counts,
project = "TCGA-BRCA",
save = F
)
```
=> Running DESeq2
estimating size factors
estimating dispersions
gene-wise dispersion estimates
mean-dispersion relationship
final dispersion estimates
fitting model and testing
-- replacing outliers and refitting for 2832 genes
-- DESeq argument 'minReplicatesForReplace' = 7
-- original counts are preserved in counts(dds)
estimating dispersions
fitting model and testing
=> Running limma voom
=> Running edgeR
=> Analysis done.
查看其中一个结果:
# deg_limma, deg_deseq2, deg_edger | |
head(diff[["deg_limma"]]) |
logFC AveExpr t P.Value adj.P.Val B
VEGFD -5.916531 -0.54965746 -59.02417 0.000000e+00 0.000000e+00 818.4105
CA4 -6.798634 -2.44840657 -49.63097 2.696866e-296 2.303258e-292 668.1375
CD300LG -6.486092 0.07079663 -48.60172 7.288653e-289 4.149916e-285 651.3416
PAMR1 -3.947614 2.51132211 -48.28787 1.376454e-286 5.877802e-283 646.1626
LYVE1 -4.736294 1.50268285 -47.36443 7.280409e-280 2.487133e-276 630.6820
SCARA5 -6.386457 0.44002261 -45.87139 6.502987e-269 1.851292e-265 605.4932
genesymbol
VEGFD VEGFD
CA4 CA4
CD300LG CD300LG
PAMR1 PAMR1
LYVE1 LYVE1
SCARA5 SCARA5
# batch_survival() 生存分析
batch_survival自动进行
logrank检验和单因素cox分析,默认基于最佳截点(P 值最小)与
getmrnaexpr,getmirnaexpr函数无缝对接,直接使用其输出结果即可,无需任何整理支持
count,tpm,fpkm3 种格式的数据,如果是counts,则通过DESeq2::vst()进行转换,如果是tpm/fpkm,则进行log2(x + 0.1)转换
surv <- batch_survival( | |
mrna_expr_counts, | |
clin = clin_info | |
) | |
head(surv$res.logrank) | |
head(surv$res.cox) |
Lines 1-3
计算生存相关基因
Line 6
查看结果(logrank)
Line 7
查看结果(cox 检验)
# 工作环境
devtools::session_info() |
─ Session info ───────────────────────────────────────────────────────────────
setting value
version R version 4.3.1 (2023-06-16 ucrt)
os Windows 11 x64 (build 22621)
system x86_64, mingw32
ui RTerm
language (EN)
collate Chinese (Simplified)_China.utf8
ctype Chinese (Simplified)_China.utf8
tz Asia/Hong_Kong
date 2024-01-04
pandoc 3.1.9 @ C:/Users/HANWAN~1/AppData/Local/Pandoc/ (via rmarkdown)
─ Packages ───────────────────────────────────────────────────────────────────
package * version date (UTC) lib source
abind 1.4-5 2016-07-21 [1] CRAN (R 4.3.0)
AnnotationDbi 1.62.2 2023-07-02 [1] Bioconductor
Biobase 2.60.0 2023-04-25 [1] Bioconductor
BiocFileCache 2.8.0 2023-04-25 [1] Bioconductor
BiocGenerics 0.46.0 2023-04-25 [1] Bioconductor
BiocParallel 1.34.2 2023-05-22 [1] Bioconductor
biomaRt 2.56.1 2023-06-11 [1] Bioconductor
Biostrings 2.68.1 2023-05-16 [1] Bioconductor
bit 4.0.5 2022-11-15 [1] CRAN (R 4.3.1)
bit64 4.0.5 2020-08-30 [1] CRAN (R 4.3.1)
bitops 1.0-7 2021-04-24 [1] CRAN (R 4.3.0)
blob 1.2.4 2023-03-17 [1] CRAN (R 4.3.1)
cachem 1.0.8 2023-05-01 [1] CRAN (R 4.3.1)
callr 3.7.3 2022-11-02 [1] CRAN (R 4.3.1)
cli 3.6.1 2023-03-23 [1] CRAN (R 4.3.1)
codetools 0.2-19 2023-02-01 [2] CRAN (R 4.3.1)
colorspace 2.1-0 2023-01-23 [1] CRAN (R 4.3.1)
crayon 1.5.2 2022-09-29 [1] CRAN (R 4.3.1)
curl 5.0.1 2023-06-07 [1] CRAN (R 4.3.1)
data.table 1.14.8 2023-02-17 [1] CRAN (R 4.3.2)
DBI 1.1.3 2022-06-18 [1] CRAN (R 4.3.1)
dbplyr 2.3.3 2023-07-07 [1] CRAN (R 4.3.1)
DelayedArray 0.26.7 2023-07-28 [1] Bioconductor
DESeq2 1.40.2 2023-06-25 [1] Bioconductor
devtools 2.4.5 2022-10-11 [1] CRAN (R 4.3.2)
digest 0.6.33 2023-07-07 [1] CRAN (R 4.3.1)
downloader 0.4 2015-07-09 [1] CRAN (R 4.3.2)
dplyr 1.1.3 2023-09-03 [1] CRAN (R 4.3.2)
easyTCGA * 0.0.1.8000 2023-11-06 [1] Github (ayueme/easyTCGA@0663870)
edgeR 3.42.4 2023-06-01 [1] Bioconductor
ellipsis 0.3.2 2021-04-29 [1] CRAN (R 4.3.1)
evaluate 0.21 2023-05-05 [1] CRAN (R 4.3.1)
fansi 1.0.4 2023-01-22 [1] CRAN (R 4.3.1)
fastmap 1.1.1 2023-02-24 [1] CRAN (R 4.3.1)
filelock 1.0.2 2018-10-05 [1] CRAN (R 4.3.1)
fs 1.6.3 2023-07-20 [1] CRAN (R 4.3.1)
generics 0.1.3 2022-07-05 [1] CRAN (R 4.3.1)
GenomeInfoDb 1.36.4 2023-10-02 [1] Bioconductor
GenomeInfoDbData 1.2.10 2023-10-18 [1] Bioconductor
GenomicRanges 1.52.1 2023-10-08 [1] Bioconductor
ggplot2 3.4.4 2023-10-12 [1] CRAN (R 4.3.2)
glue 1.6.2 2022-02-24 [1] CRAN (R 4.3.1)
gtable 0.3.3 2023-03-21 [1] CRAN (R 4.3.1)
hms 1.1.3 2023-03-21 [1] CRAN (R 4.3.1)
htmltools 0.5.5 2023-03-23 [1] CRAN (R 4.3.1)
htmlwidgets 1.6.2 2023-03-17 [1] CRAN (R 4.3.1)
httpuv 1.6.11 2023-05-11 [1] CRAN (R 4.3.1)
httr 1.4.6 2023-05-08 [1] CRAN (R 4.3.1)
IRanges 2.34.1 2023-06-22 [1] Bioconductor
jsonlite 1.8.7 2023-06-29 [1] CRAN (R 4.3.1)
KEGGREST 1.40.1 2023-09-29 [1] Bioconductor
knitr 1.45 2023-10-30 [1] CRAN (R 4.3.2)
later 1.3.1 2023-05-02 [1] CRAN (R 4.3.1)
lattice 0.21-8 2023-04-05 [2] CRAN (R 4.3.1)
lifecycle 1.0.3 2022-10-07 [1] CRAN (R 4.3.1)
limma 3.56.2 2023-06-04 [1] Bioconductor
locfit 1.5-9.8 2023-06-11 [1] CRAN (R 4.3.2)
magrittr 2.0.3 2022-03-30 [1] CRAN (R 4.3.1)
Matrix 1.6-1.1 2023-09-18 [1] CRAN (R 4.3.2)
MatrixGenerics 1.12.3 2023-07-30 [1] Bioconductor
matrixStats 1.0.0 2023-06-02 [1] CRAN (R 4.3.1)
memoise 2.0.1 2021-11-26 [1] CRAN (R 4.3.1)
mime 0.12 2021-09-28 [1] CRAN (R 4.3.0)
miniUI 0.1.1.1 2018-05-18 [1] CRAN (R 4.3.1)
munsell 0.5.0 2018-06-12 [1] CRAN (R 4.3.1)
pillar 1.9.0 2023-03-22 [1] CRAN (R 4.3.1)
pkgbuild 1.4.2 2023-06-26 [1] CRAN (R 4.3.1)
pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.3.1)
pkgload 1.3.2.1 2023-07-08 [1] CRAN (R 4.3.1)
plyr 1.8.8 2022-11-11 [1] CRAN (R 4.3.1)
png 0.1-8 2022-11-29 [1] CRAN (R 4.3.0)
prettyunits 1.1.1 2020-01-24 [1] CRAN (R 4.3.1)
processx 3.8.2 2023-06-30 [1] CRAN (R 4.3.1)
profvis 0.3.8 2023-05-02 [1] CRAN (R 4.3.1)
progress 1.2.2 2019-05-16 [1] CRAN (R 4.3.1)
promises 1.2.0.1 2021-02-11 [1] CRAN (R 4.3.1)
ps 1.7.5 2023-04-18 [1] CRAN (R 4.3.1)
purrr 1.0.2 2023-08-10 [1] CRAN (R 4.3.2)
R6 2.5.1 2021-08-19 [1] CRAN (R 4.3.1)
rappdirs 0.3.3 2021-01-31 [1] CRAN (R 4.3.1)
Rcpp 1.0.11 2023-07-06 [1] CRAN (R 4.3.1)
RCurl 1.98-1.12 2023-03-27 [1] CRAN (R 4.3.1)
readr 2.1.4 2023-02-10 [1] CRAN (R 4.3.1)
remotes 2.4.2.1 2023-07-18 [1] CRAN (R 4.3.1)
rlang 1.1.1 2023-04-28 [1] CRAN (R 4.3.1)
rmarkdown 2.23 2023-07-01 [1] CRAN (R 4.3.1)
RSQLite 2.3.1 2023-04-03 [1] CRAN (R 4.3.1)
rstudioapi 0.15.0 2023-07-07 [1] CRAN (R 4.3.1)
rvest 1.0.3 2022-08-19 [1] CRAN (R 4.3.1)
S4Arrays 1.0.6 2023-08-30 [1] Bioconductor
S4Vectors 0.38.2 2023-09-22 [1] Bioconductor
scales 1.2.1 2022-08-20 [1] CRAN (R 4.3.1)
sessioninfo 1.2.2 2021-12-06 [1] CRAN (R 4.3.1)
shiny 1.7.4.1 2023-07-06 [1] CRAN (R 4.3.1)
stringi 1.7.12 2023-01-11 [1] CRAN (R 4.3.0)
stringr 1.5.0 2022-12-02 [1] CRAN (R 4.3.1)
SummarizedExperiment 1.30.2 2023-06-06 [1] Bioconductor
TCGAbiolinks 2.28.4 2023-10-05 [1] Bioconductor
TCGAbiolinksGUI.data 1.20.0 2023-04-27 [1] Bioconductor
tibble 3.2.1 2023-03-20 [1] CRAN (R 4.3.1)
tidyr 1.3.0 2023-01-24 [1] CRAN (R 4.3.1)
tidyselect 1.2.0 2022-10-10 [1] CRAN (R 4.3.1)
tzdb 0.4.0 2023-05-12 [1] CRAN (R 4.3.1)
urlchecker 1.0.1 2021-11-30 [1] CRAN (R 4.3.1)
usethis 2.2.2 2023-07-06 [1] CRAN (R 4.3.1)
utf8 1.2.3 2023-01-31 [1] CRAN (R 4.3.1)
vctrs 0.6.3 2023-06-14 [1] CRAN (R 4.3.1)
withr 2.5.0 2022-03-03 [1] CRAN (R 4.3.1)
xfun 0.39 2023-04-20 [1] CRAN (R 4.3.1)
XML 3.99-0.14 2023-03-19 [1] CRAN (R 4.3.1)
xml2 1.3.5 2023-07-06 [1] CRAN (R 4.3.1)
xtable 1.8-4 2019-04-21 [1] CRAN (R 4.3.1)
XVector 0.40.0 2023-04-25 [1] Bioconductor
yaml 2.3.7 2023-01-23 [1] CRAN (R 4.3.0)
zlibbioc 1.46.0 2023-04-25 [1] Bioconductor
[1] C:/Users/Han Wang/AppData/Local/R/win-library/4.3
[2] C:/Program Files/R/R-4.3.1/library
──────────────────────────────────────────────────────────────────────────────