Education variables are one of the more complicated variables in the NOAS, as they have been asked in several different ways and can be coded in a multitude of ways. This is even more complicated by the lifespan aspect of our data, where not all participants have been attending school for the same number of years despite completing mandatory schooling in Norway. While the education variables in the NOAS attempt to reflect the real educational levels of the participants, the changes in the Norwegian school system means the variables need to be interpreted with care.
The categorical variables are nice because they treat all participants the same not matter which shcool system they were under. They are also gross simplifications. The numerical columns might be more precise, but participants interpret these differently, and as they are them selves asked to count their years of education. So while more fine grained, these also pose difficulties.
In Norway, a law for 9 years of mandatory schooling was implemented in 1969. Before this, there was a (largely) 7 year primary school (folkeskole) in most districts. After 1969 mandatory school (grunnskole) was implemented which was split into two parts primary school (barneskolen, years 1-6) and secondary school (ungdsomsskolen, year 7-9).
In 1974, a school reform called “Mønsterplanen” (M74) was implemented as a regulation of education. This was an effort to get the schooling in the various districts to become more equal, but still had large focus on the teachers choosing what materials in schools were to be taught. Certain subjects were made mandatory, while the curriculum was decided more by the teachers. English was made mandatory for all, and arts and crafts were introduced.
In 1987 a new plan was implemented, which more precisely dictated what curriculum should be used in the schools. One of the primary reasons for the change was the view that it had become more difficult to grow up and that the schools should compensate for some of this. Schools should now represent a protective environment and culture against the new waves in society. Important elements in M87 were care, work on attitudes and class environments.
Only 10 years later a new reform in primary school was implemented, L97, where the mandatory school was extended to 10 years, and children started school 1 year earlier (at 6 years of age rather than 10).
NOAS variable | Explanation |
---|---|
edu_years | Education Years to highest completed |
edu_total | Education Years of total education rounded down to closest integer |
edu_coded_unknown | Education Coded - unknown |
edu_coded4 | Education Coded 4 categories |
edu_coded10 | Education Coded 10 categories |
edu_desc | Education Free text description |
The functions created in this package is to make it transparent and
easier to work with NOAS functions, particularly the categorical ones.
All the functions start with edu
, and should have names to
indicate what they do.
There are several functions to help assess the different coding
schemes we have (currently two, 4 and 9). To inspect the coding schemas,
you can call the edu_levels()
function, with either
4
or 9
as argument.
library(questionnaires)
edu_levels(4)
#> Primary school (9 years)
#> 9
#> High school
#> 12
#> University/University college (< 4 years)
#> 16
#> University/University college (> 4 years)
#> 19
edu_levels(9)
#> Pre-school/No schooling
#> 0
#> Primary school (6 years)
#> 6
#> Secondary school (9 years)
#> 9
#> High school (12 years)
#> 12
#> High school diploma (13 years)
#> 13
#> High school addition (14 years)
#> 14
#> Lower level University/University college degree (16 years)
#> 16
#> Upper level University/University college (19 years)
#> 19
#> Ph.D. (21 years)
#> 21
These are the schemas that the remaining edu_functions
require to work properly and aplpy the correct schema. The levels scheme
is adaptable, and should we adopt another we do not use now, a new
schemas needs to be specified and hopefully should work farily well with
the other functions. edu_levels
returns a names numeric
vector, meaning that the vector it self contains numbers. This
means you can do computations with the values returned. To access the
names you need to explicitly ask for them through the names
function.
names(edu_levels(4))
#> [1] "Primary school (9 years)"
#> [2] "High school"
#> [3] "University/University college (< 4 years)"
#> [4] "University/University college (> 4 years)"
edu_levels(9) %>%
names()
#> [1] "Pre-school/No schooling"
#> [2] "Primary school (6 years)"
#> [3] "Secondary school (9 years)"
#> [4] "High school (12 years)"
#> [5] "High school diploma (13 years)"
#> [6] "High school addition (14 years)"
#> [7] "Lower level University/University college degree (16 years)"
#> [8] "Upper level University/University college (19 years)"
#> [9] "Ph.D. (21 years)"
While the NOAS is pre-curated, these functions are also applied to
the NOAS-data to ensure consistency across projects. The
edu_factorise
functions help evaluate the data and create
correct factors. These functions require the data inputed to be either
numeric or in character, and can handle is data has been mixed
(i.e. numeric mixed with character).
edu_factorise(c("3", "High school", "University/University college (> 4 years)",
"University/University college (> 4 years)", "University/University college (> 4 years)",
"University/University college (> 4 years)"), levels = 4)
#> [1] University/University college (< 4 years)
#> [2] High school
#> [3] University/University college (> 4 years)
#> [4] University/University college (> 4 years)
#> [5] University/University college (> 4 years)
#> [6] University/University college (> 4 years)
#> 4 Levels: Primary school (9 years) ... University/University college (> 4 years)
#> Numeric levels: 9 12 16 19
edu_factorise(c(7,7,8,4,2,5), levels = 9)
#> [1] Lower level University/University college degree (16 years)
#> [2] Lower level University/University college degree (16 years)
#> [3] Upper level University/University college (19 years)
#> [4] High school (12 years)
#> [5] Primary school (6 years)
#> [6] High school diploma (13 years)
#> 9 Levels: Pre-school/No schooling ... Ph.D. (21 years)
#> Numeric levels: 0 6 9 12 13 14 16 19 21
The main edu_factorise
function takes two arguments, the
vector (x) and levels of the coding scheme to apply (4 or 9). The
function it will return will have the correct labels, and their levels
will correspond to the number of years generally needed to complete said
education. There are specific functions for the two schema, if you want
to call them directly. You will find many of the
edu-functions
have this system, a main function needing a
levels
argument, or specialized functions for the two
coding schema. The specialized functions always call the main one, with
pre-set values to levels
.
edu4_factorise(c("3", "High school", "University/University college (> 4 years)",
"University/University college (> 4 years)", "University/University college (> 4 years)",
"University/University college (> 4 years)"))
#> [1] University/University college (< 4 years)
#> [2] High school
#> [3] University/University college (> 4 years)
#> [4] University/University college (> 4 years)
#> [5] University/University college (> 4 years)
#> [6] University/University college (> 4 years)
#> 4 Levels: Primary school (9 years) ... University/University college (> 4 years)
#> Numeric levels: 9 12 16 19
edu9_factorise(c(7,7,8,4,2,5))
#> [1] Lower level University/University college degree (16 years)
#> [2] Lower level University/University college degree (16 years)
#> [3] Upper level University/University college (19 years)
#> [4] High school (12 years)
#> [5] Primary school (6 years)
#> [6] High school diploma (13 years)
#> 9 Levels: Pre-school/No schooling ... Ph.D. (21 years)
#> Numeric levels: 0 6 9 12 13 14 16 19 21
Data from coding schema, punched as levels of the factors (as seen in
edu_levels
) can be transformed into year equivalents with
the edu_to_year
functions. While these functions do
not require data to already be in factors to work, but double
checking that the expected values are returned when turning data into
factors, before converting the factors to numbers is recommended.
# Check that factors are as expected
c("3", "High school", "University/University college (> 4 years)",
"University/University college (> 4 years)", "University/University college (> 4 years)",
"University/University college (> 4 years)") %>%
edu4_factorise()
#> [1] University/University college (< 4 years)
#> [2] High school
#> [3] University/University college (> 4 years)
#> [4] University/University college (> 4 years)
#> [5] University/University college (> 4 years)
#> [6] University/University college (> 4 years)
#> 4 Levels: Primary school (9 years) ... University/University college (> 4 years)
#> Numeric levels: 9 12 16 19
# or call the main function directly, specifying the levels of the schema
c("3", "High school", "University/University college (> 4 years)",
"University/University college (> 4 years)", "University/University college (> 4 years)",
"University/University college (> 4 years)") %>%
edu_factorise( levels = 4)
#> [1] University/University college (< 4 years)
#> [2] High school
#> [3] University/University college (> 4 years)
#> [4] University/University college (> 4 years)
#> [5] University/University college (> 4 years)
#> [6] University/University college (> 4 years)
#> 4 Levels: Primary school (9 years) ... University/University college (> 4 years)
#> Numeric levels: 9 12 16 19
# convert to factor then to year
c("3", "High school", "University/University college (> 4 years)",
"University/University college (> 4 years)", "University/University college (> 4 years)",
"University/University college (> 4 years)") %>%
edu4_factorise() %>%
edu4_to_years()
#> [1] 16 12 19 19 19 19
# you can skip factorizing as a middle step,
# if you are certain of the schema being applied correctly
c("3", "High school", "University/University college (> 4 years)",
"University/University college (> 4 years)", "University/University college (> 4 years)",
"University/University college (> 4 years)") %>%
edu4_to_years()
#> [1] 16 12 19 19 19 19
Since edu_Coded10
has many more categories than
edu_Coded4
and they have not been collected simultaneously,
you can also reduce the 10 categories to 4 by the function
edu10_reduce
. The categories are reduced by the following
heuristic:
edu_map(from = 9, to =4)
#> # A tibble: 8 × 2
#> from to
#> <dbl> <dbl>
#> 1 0 NA
#> 2 9 9
#> 3 12 12
#> 4 13 12
#> 5 14 12
#> 6 16 16
#> 7 19 19
#> 8 21 19
edu_map_chr(from = 9, to =4)
#> # A tibble: 8 × 2
#> from to
#> <chr> <chr>
#> 1 Pre-school/No schooling NA
#> 2 Secondary school (9 years) Primary school (9…
#> 3 High school (12 years) High school
#> 4 High school diploma (13 years) High school
#> 5 High school addition (14 years) High school
#> 6 Lower level University/University college degree (16 years) University/Univer…
#> 7 Upper level University/University college (19 years) University/Univer…
#> 8 Ph.D. (21 years) University/Univer…
c(7,7,8,4,2,5) %>%
edu9_reduce(to = 4)
#> [1] University/University college (< 4 years)
#> [2] University/University college (< 4 years)
#> [3] University/University college (> 4 years)
#> [4] High school
#> [5] <NA>
#> [6] High school
#> 4 Levels: Primary school (9 years) ... University/University college (> 4 years)
#> Numeric levels: 9 12 16 19
c(7,7,8,4,2,5) %>%
edu_reduce(from = 9, to = 4)
#> [1] University/University college (< 4 years)
#> [2] University/University college (< 4 years)
#> [3] University/University college (> 4 years)
#> [4] High school
#> [5] <NA>
#> [6] High school
#> 4 Levels: Primary school (9 years) ... University/University college (> 4 years)
#> Numeric levels: 9 12 16 19
Some participants will have provided education information to us
several times, likely using different coding schema. This means we might
have better, finer, information regarding participants education at
later points of data collection. Particularly when it comes to
estimating actual years of education, the more levels, the better
precision we have. Also, in later data collection we directly ask
participants to calculate the number of total years they have been in
full-time education. As there are currently three main sources: edu4,
edu10 and edu_years, we use this to get to the best estimate we can. The
edu_compute
function requires a full dataset with at least
three columns that include this information, they do not have to have
the same column-names as in the NOAS, as you will need to specify this
your self. The function will do data checks in the following order:
edu_years
is not NA
,
use this dataedu10
is not NA
, turn
factor to yearsedu4
is not NA
, turn
factor to years
edu_data <- data.frame(
edu4 = c("3", "High school", 1, NA,
"University/University college (> 4 years)", NA,
"University/University college (< 4 years)"),
edu9 = c(7,7,8,NA,"Primary school (6 years)",5, 9),
edu_years = c(NA, 12, 9, NA, 19, 19, NA),
mother = c("3", "High school", 1, NA,
"University/University college (> 4 years)", "University/University college (> 4 years)",
"University/University college (< 4 years)"),
father = c(7,7,8,4,"Primary school (6 years)",5, 10),
stringsAsFactors = FALSE
)
edu_data
#> edu4 edu9 edu_years
#> 1 3 7 NA
#> 2 High school 7 12
#> 3 1 8 9
#> 4 <NA> <NA> NA
#> 5 University/University college (> 4 years) Primary school (6 years) 19
#> 6 <NA> 5 19
#> 7 University/University college (< 4 years) 9 NA
#> mother father
#> 1 3 7
#> 2 High school 7
#> 3 1 8
#> 4 <NA> 4
#> 5 University/University college (> 4 years) Primary school (6 years)
#> 6 University/University college (> 4 years) 5
#> 7 University/University college (< 4 years) 10
edu_compute(edu_data,
edu4 = edu4,
edu9 = edu9,
edu_years = edu_years)
#> New names:
#> • `edu_years` -> `edu_years...3`
#> • `edu_years` -> `edu_years...8`
#> edu4 edu9
#> 1 3 7
#> 2 High school 7
#> 3 1 8
#> 4 <NA> <NA>
#> 5 University/University college (> 4 years) Primary school (6 years)
#> 6 <NA> 5
#> 7 University/University college (< 4 years) 9
#> edu_years...3 mother
#> 1 NA 3
#> 2 12 High school
#> 3 9 1
#> 4 NA <NA>
#> 5 19 University/University college (> 4 years)
#> 6 19 University/University college (> 4 years)
#> 7 NA University/University college (< 4 years)
#> father
#> 1 7
#> 2 7
#> 3 8
#> 4 4
#> 5 Primary school (6 years)
#> 6 5
#> 7 10
#> edu_coded9
#> 1 Lower level University/University college degree (16 years)
#> 2 Lower level University/University college degree (16 years)
#> 3 Upper level University/University college (19 years)
#> 4 <NA>
#> 5 Primary school (6 years)
#> 6 High school diploma (13 years)
#> 7 Ph.D. (21 years)
#> edu_coded4 edu_years...8
#> 1 University/University college (< 4 years) 16
#> 2 High school 12
#> 3 Primary school (9 years) 9
#> 4 <NA> NA
#> 5 University/University college (> 4 years) 19
#> 6 <NA> 19
#> 7 University/University college (< 4 years) 21
You will see that the specified edu_years
column wil
have been populated with data available in the other two columns.
Lastly, because some of our participants are children, using their
education is not possible as they are still completing lower-level
education. In these cases it is nice to have a variable that actually
spans educational information from participants and parents for kids,
for easier reporting and analysis. For this we can use the information
we have from parents to fill in gpas in pediatric data. The rule for
this filler is Participant - Mother - Father, so that only those missing
all of these will have NA
in the data. The columns this is
done in are called edu_Compiled
, and there are three of
them:
edu_Compiled_Coded4
- 4 category coded education from
participant, mother, or fatheredu_Compiled_Years
- Year approximation of
edu_Compiled_Coded4
edu_Compiled_Source
- The
source of the two above, one of “Participant”, “Mother”, or
“Father”.
edu_compile(data = edu_data,
participant = edu4,
mother = edu4_factorise(mother),
father = father
)
#> edu4 edu9 edu_years
#> 1 3 7 NA
#> 2 High school 7 12
#> 3 1 8 9
#> 4 <NA> <NA> NA
#> 5 University/University college (> 4 years) Primary school (6 years) 19
#> 6 <NA> 5 19
#> 7 University/University college (< 4 years) 9 NA
#> mother father
#> 1 3 7
#> 2 High school 7
#> 3 1 8
#> 4 <NA> 4
#> 5 University/University college (> 4 years) Primary school (6 years)
#> 6 University/University college (> 4 years) 5
#> 7 University/University college (< 4 years) 10
#> edu_compiled_code4 edu_compiled_years
#> 1 University/University college (< 4 years) 16
#> 2 High school 12
#> 3 Primary school (9 years) 9
#> 4 University/University college (> 4 years) 19
#> 5 University/University college (> 4 years) 19
#> 6 University/University college (> 4 years) 19
#> 7 University/University college (< 4 years) 16
#> edu_compiled_source
#> 1 Participant
#> 2 Participant
#> 3 Participant
#> 4 Father
#> 5 Participant
#> 6 Mother
#> 7 Participant