Background

The Zygocity questionnaire was developed by the Norwegian Public Health Institute (FHI; Folkehelseinstituttet) for their twin registry studies. Its a series of questions probing the similarities between twins, to determine if they are mono- or dizygotic.

Scoring

Classification

This note contains a brief description of the algorithm used to determine zygocity in recruitment in the 2000s.

Name Answer questions about… Used for
Drop You and your twin were like two drops of water in childhood Pairs and singles
Stranger Strangers had trouble telling the difference when you were children Pairs and singles
Eye Similarity in terms of eye color Pairs
Voice Similarity in terms of voice Single
Dexter Similarity in Dexterity Pairs and Singles
Belief What you believe yourself Pairs and Singles

“Single” twins here means those who have responded alone, i.e. there is no data available for both in the pair. The similarity questions that are not found in the table above, e.g. whether or not family members had problems distinguishing the twins is not used in the classification.

Weights

During calculations of the entire zygocity score, weights are applied to the different categories, depending on whether one or both twins have responded to the questionnaire.

Name Answer questions about… Factor single Factor pair
Drop You and your twin were like two drops of water 1.494 2.111
Stranger Strangers had trouble seeing the difference 0.647 0.691
Eye Similarity in terms of eye color 0.394
Voice Similarity in terms of voice 0.347
Dexter Dexterity Similarity 0.458 0.366
Belief What you believe yourself 0.417 0.481
Constant term in the formula 0.007 - 0.087

Coding

“Form value” is the value the answer option has in the data file. “Score value” is the value used in the algorithm when zygocity is calculated.

Variable Answer option Form value Score value
Drop Like two drops of water 1 1
Like most siblings 2 -1
Don’t know 3 0
Stranger Often 1 1
Occasionally 2 0
Never 3 -1
Don’t know 4 0
Belief Monozygotic 1 1
Dizygotic 2 -1
Don’t know 3 0
Eye, Voice & Dexter Exactly the same 1 1
Almost like 2 0
Different 3 -1
Don’t know 4 0

No answer option is used directly in the calculations, only the score values. In the following, it is these values (-1, 0 or 1) that are used in the algorithms. E.g. has Drop in the formula value 1 for a positive answer to whether the twins were equal to two drops of water.

Equation

The higher the absolute value of the final score, the more certain / clearer the classification. For answers that reveal greater uncertainty about the similarity (e.g. a greater proportion of “almost” and “don’t know”), the value will be closer to zero.

Pair formula

For pairs where both have answered, the pair’s average values for all score values are first calculated. That is Drop = (Drop1 + Drop2) / 2, etc., where Drop1 is the score value of the response from twin 1 and Drop2 is the score value of the response from twin 2 in the same pair.

zygocity=(drop1+drop221.494)+(stranger1+stranger220.647)+(dexter1+dexter220.458)+(belief1+belief220.417)+(voice1+voice220.347)+0.007zygocity = (\frac{drop{_1} + drop{_2}}{2} * 1.494) + (\frac{stranger{_1} + stranger{_2}}{2} * 0.647 ) + (\frac{dexter{_1} + dexter{_2}}{2} * 0.458) + (\frac{belief{_1} + belief{_2}}{2} * 0.417) + (\frac{voice{_1} + voice{_2}}{2} * 0.347) + 0.007

The sign of this “pair score” is then used to determine zygocity in the same way as for “single”: Negative value means double, positive value means single.

Single formula

If only one twin in the pair has responded, the following is calculated:

zygocity=drop12.111+stranger10.691+dexter10.366+belief10.481+eye10.3940.087zygocity = drop{_1} * 2.111 + stranger{_1} * 0.691 + dexter{_1} * 0.366 + belief{_1} * 0.481 + eye{_1} * 0.394 - 0.087

The sign of this “single score” is then used to determine the zygocity: Negative value means double egg, positive value means single egg.

Data requirements

Column names

By default, the functions assume that columns have names in the manner of zygocity_XX where XX is a zero-padded (i.e. zero in front of numbers below 9, eg. 09) question number of the inventory. You may have column names in another format, but in that case you will need to supply to the functions the names of those columns using tidy-selectors (see the tidyverse packages for this). The columns should adhere to some naming logic that is easy to specify.

Data values

The values in the columns should be the item number of the question that was answered (i.e. 1, 2, or 3, and for some questions also 4).

Use the zygo functions

Currently undocumented…

library(questionnaires)
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
zygo <- tibble(
  id = 1:10,
  twinpair = rep(1:5, each = 2),
  drop = c(1, 2, 3, NA, 2, 2, 1, 1, NA, 2),
  stranger = c(1, 2, 4, NA, 2, 3, 3, 1, NA, 2),
  dexterity = c(1, 1, 3, NA, 2, 2, 1, 2, NA, 1),
  voice = c(2, 2, 3, NA, 2, 2, 1, 1, NA, 1),
  eye = c(2, 2, 2, NA, 2, 2, 1, 1, NA, 2),
  belief = c(1, 1, 2, NA, 2, 2, 1, 1, NA, 2)
)

zygo_compute(zygo, 
             twin_col = twinpair, 
             cols = 3:6, 
             recode = FALSE)
#> # A tibble: 10 × 8
#>    zygo_eye zygo_drop zygo_stranger zygo_dexterity zygo_voice zygo_belief
#>       <dbl>     <dbl>         <dbl>          <dbl>      <dbl>       <dbl>
#>  1    0.788      2.11         0.691          0.366     NA           0.481
#>  2    0.788      4.22         1.38           0.366     NA           0.481
#>  3   NA          4.48         2.59           1.37       1.04        0.834
#>  4   NA         NA           NA             NA         NA          NA    
#>  5    0.788      4.22         1.38           0.732     NA           0.962
#>  6    0.788      4.22         2.07           0.732     NA           0.962
#>  7    0.394      2.11         2.07           0.366     NA           0.481
#>  8    0.394      2.11         0.691          0.732     NA           0.481
#>  9   NA         NA           NA             NA         NA          NA    
#> 10   NA          2.99         1.29           0.458      0.347       0.834
#> # ℹ 2 more variables: zygo_score <dbl>, zygo_zygocity <chr>

References