Using the PGS functions

  library(MOAS)

There are several functions in this package that should aid anyone working with PGS’s to add the PGS they want to their data. While the MOAS-data does contain PGS data, you might want to use some other PGS’s than already provided in the MOAS, or you might want to have more accompanying data with the PGSs (like the SNP counts). There are a couple of requisites to having these functions work:

You need to be able to connect to the LCBC lagringshotell, and know the path to the lagringshotell within your system.
The PGS data needs to be ordered in a directory, where each PGS has its own sub-directory with all the files with the different significance levels for that PGS are stored within these subdirectories with naming conventions like these: AAmenarche/AAmenarche.S10.profile, AAmenarche//AAmenarche.S3.profile etc.

Given these two things, the functions should be fairly simple to use.

The two main functions

Depending on your OS and setup, you may need to change path/to to something else to make this work. The remaining paths point to two important places. The first being the path containing the PGS subdirectories, and the second to the cleaned file that enables matching between the PGS data and the MOAS. A consequence of using the genetic_match_file is also that the number of rows in the source PGS files are reduced to only those samples that we have verified as trustworty.

Getting PGS data

The first key function is pgs_get which will create a data.frame of the PGS’s you have asked for with the command.

pgs_get(
  pgs = c("AD", "AD_Jansen"),
  pgs_path = "path/to/Genetics/PGS/PGS_20190618/PGS_wAPOE/",
  genetic_match_file = "path/to/MOAS/data-raw/DNA/gID_MOAS_match.tsv"
)

In this example, the user is asking for PGS for AD, and AD_Jansen, and will be given back a data.frame with these PGSs. The functions by default assumes you want the significance levels S1, S7, and S11, but you may overwrite this by using the s_levels argument.

pgs_get(
  pgs = c("AD", "AD_Jansen"),
  s_levels = c("S1", "S5", "S12", "S7")
  pgs_path = "path/to/Genetics/PGS/PGS_20190618/PGS_wAPOE/",
  genetic_match_file = "path/to/MOAS/data-raw/DNA/gID_MOAS_match.tsv"
)

Adding PGS data to the MOAS

The two above codes will give data.frames with PGS data alone, and nothing else. If you are confident in how you handle data, the above might provide enough for you to be able to merge it with whatever data you have to work with. If you are less confident, or just want a simple solution, you can use the two functions specifically created for easy MOAS-merging. You need not use the entire MOAS for these two functions to work, you can be working with MOAS data that has already been subsetted, and you want to add PGS to that. The key feature must be that the data is MOAS-derived, as the functions assume a certain structure to the data.

pgs_add(
  MOAS, 
  pgs = c("AD", "AD_Jansen"),
  pgs_path = "path/to/Genetics/PGS/PGS_20190618/PGS_wAPOE/",
  genetic_match_file = "path/to/MOAS/data-raw/DNA/gID_MOAS_match.tsv",
  s_levels = c("S1", "S7", "S11")
)

The pgs_add function takes all the same arguments as the pgs_get function, with the addition of needing the MOAS-derived data. The output of this function, is the entire data.frame you provided (MOAS-derived) with the PGS columns appended to it. This function is also created such that you may use the pipe (%>%) operator on it if you like using it.

MOAS %>% 
  pgs_add(
    pgs = c("AD", "AD_Jansen"),
    pgs_path = "path/to/Genetics/PGS/PGS_20190618/PGS_wAPOE/",
    genetic_match_file = "path/to/MOAS/data-raw/DNA/gID_MOAS_match.tsv",
    s_levels = c("S1", "S7", "S11")
  )

The `_all` functions

The two functions above have two companions, that end with _all. The two _all-functions are made to easily add/get all the available PFS’s from the path specified.

pgs_get_all(
  pgs_path = "path/to/Genetics/PGS/PGS_20190618/PGS_wAPOE/",
  genetic_match_file = "path/to/MOAS/data-raw/DNA/gID_MOAS_match.tsv"
)

In particular, most will not want all the significance levels outputted (there are 12, and this is the default behaviour for the _all functions). You may specify which significance levels you want by providing a character vector to the s_levels argument.

pgs_add_all(
  MOAS, 
  s_levels = c("S1", "S7", "S11"),
  pgs_path = "path/to/Genetics/PGS/PGS_20190618/PGS_wAPOE/",
  genetic_match_file = "path/to/MOAS/data-raw/DNA/gID_MOAS_match.tsv"
)

The `_single` functions

There is a new function pair, that will allow you to add/get a single PGS from its .profile. This should enable you to grab and safely merge new PGS data as they come in, and not have to wait for assistance before you can start working with the data.

As the other functions, they come in both get and add variety. You’ll need to specify the entire path, all the way to the .profile file you want to add. This means the folder structure etc. for using this function is not very strict, as the other functions. The name used in the PGS file will be the file name up untill .profile.

pgs_get_single(
  pgs_file = "path/to/Genetics/PGS/PGS_XX/PGS_XX.profile",
  genetic_match_file = "path/to/MOAS/data-raw/DNA/gID_MOAS_match.tsv"
)

likewise, if you want to add directly to the MOAS or a MOAS-like file

pgs_add_single(
  MOAS,
  pgs_file = "path/to/Genetics/PGS/PGS_XX/PGS_XX.profile",
  genetic_match_file = "path/to/MOAS/data-raw/DNA/gID_MOAS_match.tsv"
)

If you need to add several single PGS’, you can make a chain of operations to add them.

MOAS %>% 
  pgs_add_single(
    pgs_file = "path/to/Genetics/PGS/PGS_XX/PGS_XX.profile",
    genetic_match_file = "path/to/MOAS/data-raw/DNA/gID_MOAS_match.tsv"
  ) %>% 
    pgs_add_single(
    pgs_file = "path/to/Genetics/PGS/PGS_YY/PGS_YY.profile",
    genetic_match_file = "path/to/MOAS/data-raw/DNA/gID_MOAS_match.tsv"
  )

Extra options to all functions

All the above functions also have a couple of extra arguments, for those interested in keeping certain information that is removed by default

Keeping the `CNT` columns

Some are also interested in keeping the two CNT columns from the PGS data, as these may provide valuable information about the PGS’s computed. To do this, you can provide the include_cnt = TRUE to the funciton, and those columns will also be added.

pgs_get_all(pgs_path = "path/to/Genetics/PGS/PGS_20190618/PGS_wAPOE/",
            genetic_match_file = "path/to/MOAS/data-raw/DNA/gID_MOAS_match.tsv",
            s_levels = c("S1", "S7", "S11"),
            include_cnt = TRUE)

Keeping information from the genetics-MOAS matching file

Some might be debugging the data, or want to find some extra information about the source genetic samples. In this case, one can use the include_genetics_debug = TRUE argument, which will keep all the columns from the genetics-MOAS matching file in the outputted data.

pgs_get_all(pgs_path = "path/to/Genetics/PGS/PGS_20190618/PGS_wAPOE/",
            genetic_match_file = "path/to/MOAS/data-raw/DNA/gID_MOAS_match.tsv",
            s_levels = c("S1", "S7", "S11"),
            include_genetics_debug = TRUE)

The two main functions

Getting PGS data

Adding PGS data to the MOAS

The _all functions

The _single functions

Extra options to all functions

Keeping the CNT columns

Keeping information from the genetics-MOAS matching file

Contents

The `_all` functions

The `_single` functions

Keeping the `CNT` columns