There are several functions in this package that should aid anyone working with PGS’s to add the PGS they want to their data. While the MOAS-data does contain PGS data, you might want to use some other PGS’s than already provided in the MOAS, or you might want to have more accompanying data with the PGSs (like the SNP counts). There are a couple of requisites to having these functions work:
AAmenarche/AAmenarche.S10.profile
, AAmenarche//AAmenarche.S3.profile
etc.Given these two things, the functions should be fairly simple to use.
Depending on your OS and setup, you may need to change path/to
to something else to make this work. The remaining paths point to two important places. The first being the path containing the PGS subdirectories, and the second to the cleaned file that enables matching between the PGS data and the MOAS. A consequence of using the genetic_match_file
is also that the number of rows in the source PGS files are reduced to only those samples that we have verified as trustworty.
The first key function is pgs_get
which will create a data.frame of the PGS’s you have asked for with the command.
pgs_get(
pgs = c("AD", "AD_Jansen"),
pgs_path = "path/to/Genetics/PGS/PGS_20190618/PGS_wAPOE/",
genetic_match_file = "path/to/MOAS/data-raw/DNA/gID_MOAS_match.tsv"
)
In this example, the user is asking for PGS for AD, and AD_Jansen, and will be given back a data.frame with these PGSs. The functions by default assumes you want the significance levels S1, S7, and S11, but you may overwrite this by using the s_levels
argument.
The two above codes will give data.frames with PGS data alone, and nothing else. If you are confident in how you handle data, the above might provide enough for you to be able to merge it with whatever data you have to work with. If you are less confident, or just want a simple solution, you can use the two functions specifically created for easy MOAS-merging. You need not use the entire MOAS for these two functions to work, you can be working with MOAS data that has already been subsetted, and you want to add PGS to that. The key feature must be that the data is MOAS-derived, as the functions assume a certain structure to the data.
pgs_add(
MOAS,
pgs = c("AD", "AD_Jansen"),
pgs_path = "path/to/Genetics/PGS/PGS_20190618/PGS_wAPOE/",
genetic_match_file = "path/to/MOAS/data-raw/DNA/gID_MOAS_match.tsv",
s_levels = c("S1", "S7", "S11")
)
The pgs_add
function takes all the same arguments as the pgs_get
function, with the addition of needing the MOAS-derived data. The output of this function, is the entire data.frame you provided (MOAS-derived) with the PGS columns appended to it. This function is also created such that you may use the pipe (%>%
) operator on it if you like using it.
_all
functionsThe two functions above have two companions, that end with _all
. The two _all
-functions are made to easily add/get all the available PFS’s from the path specified.
pgs_get_all(
pgs_path = "path/to/Genetics/PGS/PGS_20190618/PGS_wAPOE/",
genetic_match_file = "path/to/MOAS/data-raw/DNA/gID_MOAS_match.tsv"
)
In particular, most will not want all the significance levels outputted (there are 12, and this is the default behaviour for the _all
functions). You may specify which significance levels you want by providing a character vector to the s_levels
argument.
_single
functionsThere is a new function pair, that will allow you to add/get a single PGS from its .profile
. This should enable you to grab and safely merge new PGS data as they come in, and not have to wait for assistance before you can start working with the data.
As the other functions, they come in both get
and add
variety. You’ll need to specify the entire path, all the way to the .profile
file you want to add. This means the folder structure etc. for using this function is not very strict, as the other functions. The name used in the PGS file will be the file name up untill .profile
.
pgs_get_single(
pgs_file = "path/to/Genetics/PGS/PGS_XX/PGS_XX.profile",
genetic_match_file = "path/to/MOAS/data-raw/DNA/gID_MOAS_match.tsv"
)
likewise, if you want to add directly to the MOAS or a MOAS-like file
pgs_add_single(
MOAS,
pgs_file = "path/to/Genetics/PGS/PGS_XX/PGS_XX.profile",
genetic_match_file = "path/to/MOAS/data-raw/DNA/gID_MOAS_match.tsv"
)
If you need to add several single PGS’, you can make a chain of operations to add them.
All the above functions also have a couple of extra arguments, for those interested in keeping certain information that is removed by default
CNT
columnsSome are also interested in keeping the two CNT
columns from the PGS data, as these may provide valuable information about the PGS’s computed. To do this, you can provide the include_cnt = TRUE
to the funciton, and those columns will also be added.
Some might be debugging the data, or want to find some extra information about the source genetic samples. In this case, one can use the include_genetics_debug = TRUE
argument, which will keep all the columns from the genetics-MOAS matching file in the outputted data.