diyar

CRAN version CRAN RStudio mirror downloads Coverage status Travis build status

Overview

Record linkage and deduplication of individual-level data, such as repeated spells in hospital, or recurrent cases of infection is a common task in epidemiological analysis and other fields of research.

The diyar package aims to provide a simple and flexible implementation of deterministic record linkage and episode grouping for the application of case definitions in epidemiological analysis.

Installation

# Install the latest CRAN release 
install.packages("diyar")

# Or, install the development version from GitHub
install.packages("devtools")
devtools::install_github("OlisaNsonwu/diyar")

Cheat sheet

Usage

There are two main aspects of the diyar package; multistage record grouping (record_group()) and episode grouping (fixed_episodes(), rolling_episodes() and episode_group()) for applying case definitions in epidemiological analysis. number_line objects are used for both.

library(diyar)

l <- as.Date("01/04/2019", "%d/%m/%Y"); r <- as.Date("30/04/2019", "%d/%m/%Y")
nl <- number_line(l, r)
nl
#> [1] "2019-04-01 -> 2019-04-30"
reverse_number_line(nl)
#> [1] "2019-04-30 <- 2019-04-01"
shift_number_line(nl, -2)
#> [1] "2019-03-30 -> 2019-04-28"
expand_number_line(nl, 2)
#> [1] "2019-03-30 -> 2019-05-02"
number_line_sequence(nl, by =3)
#>  [1] "2019-04-01" "2019-04-04" "2019-04-07" "2019-04-10" "2019-04-13"
#>  [6] "2019-04-16" "2019-04-19" "2019-04-22" "2019-04-25" "2019-04-28"
#> [11] "2019-04-30"
data(infections);
db <- infections[c("date")]
db$date
#>  [1] "2018-04-01" "2018-04-07" "2018-04-13" "2018-04-19" "2018-04-25"
#>  [6] "2018-05-01" "2018-05-07" "2018-05-13" "2018-05-19" "2018-05-25"
#> [11] "2018-05-31"

# Fixed episodes
db$f_epid <- fixed_episodes(date = db$date, case_length = 15, 
                              display = FALSE, to_s4 = TRUE, group_stats = TRUE)
#> Episode grouping complete - 0 record(s) assinged a unique ID.

# Rolling episodes
db$r_epid <- rolling_episodes(date = db$date, case_length = 15, 
                              recurrence_length = 40, display = FALSE, to_s4 = TRUE, 
                              group_stats = TRUE)
#> Episode grouping complete - 0 record(s) assinged a unique ID.
db[c("f_epid","r_epid")]
#> # A tibble: 11 x 2
#>    f_epid                            r_epid                          
#>    <epid>                            <epid>                          
#>  1 E-01 2018-04-01 -> 2018-04-13 (C) E-1 2018-04-01 -> 2018-05-31 (C)
#>  2 E-01 2018-04-01 -> 2018-04-13 (D) E-1 2018-04-01 -> 2018-05-31 (D)
#>  3 E-01 2018-04-01 -> 2018-04-13 (D) E-1 2018-04-01 -> 2018-05-31 (D)
#>  4 E-04 2018-04-19 -> 2018-05-01 (C) E-1 2018-04-01 -> 2018-05-31 (R)
#>  5 E-04 2018-04-19 -> 2018-05-01 (D) E-1 2018-04-01 -> 2018-05-31 (D)
#>  6 E-04 2018-04-19 -> 2018-05-01 (D) E-1 2018-04-01 -> 2018-05-31 (D)
#>  7 E-07 2018-05-07 -> 2018-05-19 (C) E-1 2018-04-01 -> 2018-05-31 (D)
#>  8 E-07 2018-05-07 -> 2018-05-19 (D) E-1 2018-04-01 -> 2018-05-31 (D)
#>  9 E-07 2018-05-07 -> 2018-05-19 (D) E-1 2018-04-01 -> 2018-05-31 (D)
#> 10 E-10 2018-05-25 -> 2018-05-31 (C) E-1 2018-04-01 -> 2018-05-31 (R)
#> 11 E-10 2018-05-25 -> 2018-05-31 (D) E-1 2018-04-01 -> 2018-05-31 (D)
# Two stages of record grouping
data(staff_records);

staff_records$pids_a <- record_group(staff_records, sn = r_id, criteria = c(forename, surname),
                     data_source = sex, display = FALSE, to_s4 = TRUE)
#> Record grouping complete - 1 record(s) assigned a group unique ID.
staff_records
#> # A tibble: 7 x 6
#>    r_id forename surname  sex   dataset    pids_a      
#>   <int> <chr>    <chr>    <chr> <chr>      <pid>       
#> 1     1 James    Green    M     Staff list P-1 (CRI 02)
#> 2     2 <NA>     Anderson M     Staff list P-2 (CRI 02)
#> 3     3 Jamey    Green    M     Pay slips  P-1 (CRI 02)
#> 4     4 ""       <NA>     F     Pay slips  P-4 (No Hit)
#> 5     5 Derrick  Anderson M     Staff list P-2 (CRI 02)
#> 6     6 Darrack  Anderson M     Pay slips  P-2 (CRI 02)
#> 7     7 Christie Green    F     Staff list P-1 (CRI 02)

Find out more here!

Bugs and issues

Please report any bug or issues with using this package here.