There are currently two functions for the epigenetic data, one for getting the epigenetic data as is, and one for adding them to a MOAS-type data frame (either the full MOAS or a subsetted MOAS). They function much like the other add
and get
functions already in this package. These functions are at the moment in development, or rather, the data is in active cleaning procedures. Because of the lack of collection time for some of the samples, any analyses including these data must be interpreted with caution, and at the moment only be used as preliminary results. Updates on the state of the epigenetic cleaning/time matching will be posted in the #moas-updates
slack channel.
To be able to use these functions you must know where to files exist:
xlsx
file)MOAS/data-raw/DNA/gID_MOAS_match.tsv
on the lagringshotell)epigen_get
functionThe get function will retrieve a cleaned epiegenetics file for you, altering the genetic ID’s to proper CrossProject IDs. As opposed to the dbs-functions
, this will return multiple rows for any participant that we have epigenetic data on from several data collection waves. You may use the returned data frame from this function to merge with your own data, if you wish to do so your self.
epigen_get(file_path = "../Genetics/LCBC_EPIgenetic_Data/batch1_190919/DNAm_horvath/18092019epiage_lifebrain.xlsx",
match_path = "data-raw/DNA/gID_MOAS_match.tsv")
# A tibble: 1,335 x 11
CrossProject_ID Project_Name Project_Wave EpiGen_DNAmAge EpiGen_noMissingPe… EpiGen_meanMethByS… EpiGen_minMethByS… EpiGen_maxMethByS… EpiGen_predicted… EpiGen_meanXchro… Genetic_ID
<dbl> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <chr> <dbl> <chr>
1 1100133 MemP 3 23.0 1444 0.255 0.000121 0.996 male 0.293 110013301
2 1100241 MemP 3 21.6 1442 0.256 0.000626 0.995 female 0.452 110024101
3 1100268 MemP 3 42.8 1447 0.256 0.000001 0.995 female 0.455 110026801
4 1100326 MemP 3 40.1 1445 0.259 0.000642 0.992 male 0.293 110032601
5 1100334 MemP 3 43.6 1441 0.255 0.000001 0.997 female 0.444 110033401
6 1100741 MemP 3 28.3 1450 0.253 0.00200 0.992 male 0.277 110074101
7 1100760 MemP 3 42.0 1444 0.253 0.00143 0.992 male 0.288 110076001
8 1100768 MemP 3 40.3 1443 0.245 0.000683 0.996 female 0.430 110076801
9 1100211 MemP 3 64.2 1444 0.250 0.000741 0.997 female 0.453 110021101
10 1100322 MemP 3 22.6 1453 0.252 0.00127 0.992 male 0.274 110032201
# … with 1,325 more rows
This function is also convenient if you are looking to get extra information regarding the samples the data is based on, by toggling the debug
argument to TRUE
you will get extra information regarding the sample from the matching file. I recommend doing this if youare using the add
function and are seeing things you are not expecting (and let me know if there in unexpected behaviour of the function).
epigen_get("../Genetics/LCBC_EPIgenetic_Data/batch1_190919/DNAm_horvath/18092019epiage_lifebrain.xlsx",
match_path = "data-raw/DNA/gID_MOAS_match.tsv",
debug=TRUE)
# A tibble: 1,335 x 24
CrossProject_ID Project_Name Project_Wave FID EpiGen_DNAmAge EpiGen_noMissin… EpiGen_meanMeth… EpiGen_minMethB… EpiGen_maxMethB… EpiGen_predicte… EpiGen_meanXchr… EpiGen_debug_IID
<dbl> <chr> <dbl> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <chr> <dbl> <chr>
1 1100133 MemP 3 1100… 23.0 1444 0.255 0.000121 0.996 male 0.293 110013301
2 1100241 MemP 3 1100… 21.6 1442 0.256 0.000626 0.995 female 0.452 110024101
3 1100268 MemP 3 1100… 42.8 1447 0.256 0.000001 0.995 female 0.455 110026801
4 1100326 MemP 3 1100… 40.1 1445 0.259 0.000642 0.992 male 0.293 110032601
5 1100334 MemP 3 1100… 43.6 1441 0.255 0.000001 0.997 female 0.444 110033401
6 1100741 MemP 3 1100… 28.3 1450 0.253 0.00200 0.992 male 0.277 110074101
7 1100760 MemP 3 1100… 42.0 1444 0.253 0.00143 0.992 male 0.288 110076001
8 1100768 MemP 3 1100… 40.3 1443 0.245 0.000683 0.996 female 0.430 110076801
9 1100211 MemP 3 1100… 64.2 1444 0.250 0.000741 0.997 female 0.453 110021101
10 1100322 MemP 3 1100… 22.6 1453 0.252 0.00127 0.992 male 0.274 110032201
# … with 1,325 more rows, and 12 more variables: Genetic_ID <chr>, EpiGen_debug_batch <dbl>, EpiGen_debug_trusted <dbl>, EpiGen_debug_for_gwas <dbl>, EpiGen_debug_for_ewas <dbl>,
# EpiGen_debug_duplicated <dbl>, EpiGen_debug_genetic_duplicate <chr>, EpiGen_debug_european <dbl>, EpiGen_debug_sample_type <chr>, EpiGen_debug_content <chr>,
# EpiGen_debug_date <chr>, EpiGen_debug_comment <chr>
epigen_add
functionGiven you already have a version of the MOAS at hand (either the entire data frame or a subset of it), you can use the epigen_add
function to directly add the epigenetic data to it. Simply explained, this function only really call on the get
funtion and runs a merging join
function to add only the epigenetic data that can be matched to the MOAS-like data provided. This means that if you have already subsetted the MOAS-rows to what you are intereted in, this function will not add any new rows of data. Any timepoint you have previously omitted from the MOAS will not be re-added, and the epigenetic data volumns will be added at end of the data frame.
MOAS %>%
select(1:8) %>% #reducing the data for the example, taking only first 8 columns
epigen_add(file_path="../Genetics/LCBC_EPIgenetic_Data/batch1_190919/DNAm_horvath/18092019epiage_lifebrain.xlsx",
match_path = "data-raw/DNA/gID_MOAS_match.tsv",)
# A tibble: 4,613 x 14
CrossProject_ID Birth_Date Sex Subject_Timepoi… Project_Name Project_Wave EpiGen_DNAmAge EpiGen_noMissin… EpiGen_meanMeth… EpiGen_minMethB… EpiGen_maxMethB… EpiGen_predicte…
<fct> <date> <chr> <dbl> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <chr>
1 1000401 1990-05-18 Fema… 1 NDev 1 NA NA NA NA NA NA
2 1000401 1990-05-18 Fema… 2 NDev 3 32.0 1448 0.241 0.000242 0.991 female
3 1000401 1990-05-18 Fema… 2 NDev 3 32.0 1448 0.241 0.000242 0.991 female
4 1000402 1995-12-07 Fema… 1 NDev 1 NA NA NA NA NA NA
5 1000403 1992-04-03 Fema… 1 NDev 1 NA NA NA NA NA NA
6 1000403 1992-04-03 Fema… 2 NDev 2 NA NA NA NA NA NA
7 1000403 1992-04-03 Fema… 3 NDev 3 37.4 1442 0.243 0.000232 0.996 female
8 1000403 1992-04-03 Fema… 3 NDev 3 37.4 1442 0.243 0.000232 0.996 female
9 1000404 1989-09-28 Fema… 1 NDev 1 NA NA NA NA NA NA
10 1000404 1989-09-28 Fema… 2 NDev 2 NA NA NA NA NA NA
# … with 4,603 more rows, and 2 more variables: EpiGen_meanXchromosome <dbl>, Genetic_ID <chr>
There are three necessary columns needed in the MOAS-like data for the merging to work correctly. It will fail if these are not all present
MOAS %>%
select(1:5) %>% # This will omit the Project_Wave column
epigen_add(file_path="../Genetics/LCBC_EPIgenetic_Data/batch1_190919/DNAm_horvath/18092019epiage_lifebrain.xlsx",
match_path = "data-raw/DNA/gID_MOAS_match.tsv",)
Error: One of 'CrossProject_ID', 'Project_Name', 'Project_Wave' is missing from the MOAS-like data. These are needed for merging.