---
title: "Writing to a REDCap Project"
author: Will Beasley [Biomedical & Behavior Methodology Core](https://www.ouhsc.edu/bbmc/team/), OUHSC Pediatrics;<br>Raymond Balise, University of Miami School of Medicine
output:
  rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Writing to a REDCap Project}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r}
#| include = FALSE
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  tidy    = FALSE
)
```

Writing data _to_ REDCap is more difficult than reading data _from_ REDCap.
When you read, you receive data in the structure that the REDCap provides you.
You have some control about the columns, rows, and data types,
but there is not a lot you have to be concerned.

In contrast, the structure of the dataset you send to the REDCap server must be precise.
You need to pass special variables so that the REDCap server understands the
hierarchical structure of the data points.
This vignette walks you through that process.

If you are new to REDCap and its API,
please first understand the concepts described in these two [vignettes](https://ouhscbbmc.github.io/REDCapR/articles/):

* [Typical REDCap Workflow for a Data Analyst](https://ouhscbbmc.github.io/REDCapR/articles/workflow-read.html)
* [Retrieving Longitudinal and Repeating Structures](https://ouhscbbmc.github.io/REDCapR/articles/longitudinal-and-repeating.html)

Part 1 - Intro
===================================

Strategy
----------------------------------

As described in the [Retrieving Longitudinal and Repeating Structures](https://ouhscbbmc.github.io/REDCapR/articles/longitudinal-and-repeating.html) vignette,
the best way to read and write data from projects with longitudinal/repeating elements
is to break up the "block matrix" dataset into individual datasets.
Each rectangle should have a coherent grain.

Following this strategy, we'll write to the REDCap server in two distinct steps:

1. Upload the patient-level instrument(s)
1. Upload the each repeating instrument separately.

The actual upload phase is pretty straight-forward
--it's just a call to `REDCapR::redcap_write()`.
Most of the vignette's code prepares the dataset so that the upload will run smoothly.

Pre-requisites
----------------------------------

See the [Typical REDCap Workflow for a Data Analyst](https://ouhscbbmc.github.io/REDCapR/articles/workflow-read.html)
vignette and

1. [Verify REDCapR is installed](https://ouhscbbmc.github.io/REDCapR/articles/workflow-read.html#verify-redcapr-is-installed)
1. [Verify REDCap Access](https://ouhscbbmc.github.io/REDCapR/articles/workflow-read.html#verify-redcap-access)
1. [Review Codebook](https://ouhscbbmc.github.io/REDCapR/articles/workflow-read.html#review-codebook)

Retrieve Token
-------------------------

Please closely read the
[Retrieve Protected Token](https://ouhscbbmc.github.io/REDCapR/articles/workflow-read.html#part-2---retrieve-protected-token) section,
which has important security implications.
The current vignette imports a fake dataset into REDCap,
and we'll use a token stored in a local file.

```r
# retrieve-credential
path_credential <- system.file("misc/example.credentials", package = "REDCapR")
credential  <- REDCapR::retrieve_credential_local(
  path_credential = path_credential,
  project_id      = 3748
)

c(credential$redcap_uri, credential$token)
```

Datasets to Write to Server
-------------------------

To keep this vignette focused on writing/importing/uploading to the server,
we'll start with the data that needs to be written.
These example tables were prepared by [Raymond Balise](https://github.com/RaymondBalise)
for our 2023 [R/Medicine](https://events.linuxfoundation.org/r-medicine/) workshop,
"Using REDCap and R to Rapidly Produce Biomedical Publications".

There are two tables, each with a different [granularity](https://www.1keydata.com/datawarehousing/fact-table-granularity.html):

* `ds_patient`: each row represents one patient,
* `ds_daily`: each row represents one daily measurement per patient.

```r
# load-patient
ds_patient <-
  "test-data/vignette-repeating-write/data-patient.rds" |>
  system.file(package = "REDCapR") |>
  readr::read_rds()

ds_patient
```

```r
# load-repeating
ds_daily <-
  "test-data/vignette-repeating-write/data-daily.rds" |>
  system.file(package = "REDCapR") |>
  readr::read_rds()

ds_daily
```

Part 2 - Write Data: One row per patient
===================================

Besides the [`data.frame`](https://stat.ethz.ch/R-manual/R-devel/library/base/html/data.frame.html)
to write to REDCap,
the only required arguments of the
[`REDCapR::redcap_write()`](https://ouhscbbmc.github.io/REDCapR/reference/redcap_write.html)
function are `redcap_uri` and `token`;
both are contained in the credential object created in the previous section.

As discussed in the [Troubleshooting vignette](https://ouhscbbmc.github.io/REDCapR/articles/TroubleshootingApiCalls.html#writing),
we recommend running these two preliminary checks before trying to write the
dataset to the server for the very first time.

Prep: Stoplight Fields
-------------------------

If the REDCap project isn't longitudinal and doesn't have arms,
uploading a patient-level data.frame to REDCap doesn't require adding variables.
However we typically populate the `*_complete` variables to communicate the record's status.

If the row is needs a human to add more values or inspect the existing values
consider [marking the instrument](https://ouhscbbmc.github.io/REDCapR/reference/constant.html)
"incomplete" or "unverified";
the patient's instrument record will appear red or yellow in REDCap's Record Dashboard.
Otherwise consider marking the instrument "complete" so
it will appear green.

With this example project, the only patient-level instrument is "enrollment",
so the corresponding variable is `enrollment_complete`.

```r
# patient-complete
ds_patient <-
  ds_patient |>
  dplyr::mutate(
    enrollment_complete   = REDCapR::constant("form_complete"),
  )
```

Prep: `REDCapR::validate_for_write()`
-------------------------

`REDCapR::validate_for_write()` inspects a data frame to anticipate potential problems before writing with REDCap's API.
A tibble is returned, with one row per potential problem (and a suggestion how to avoid it).
Ideally an 0-row tibble is returned.

```r
REDCapR::validate_for_write(ds_patient, convert_logical_to_integer = TRUE)
```

If you encounter problems that can be checked with automation,
please tell us in [an issue](https://github.com/OuhscBbmc/REDCapR/issues).
We'll work with you to incorporate the new check into `REDCapR::validate_for_write()`.

When a dataset's problems are caught before reaching the server,
the solutions are easier to identify and implement.

Prep: Write Small Subset First
-------------------------

If this is your first time with a complicated project, consider loading a small subset of rows and columns.
In this case, we start with only three columns and two rows.

```r
# patient-subset
ds_patient |>
  dplyr::select(              # First three columns
    id_code,
    date,
    is_mobile,
  ) |>
  dplyr::slice(1:2) |>        # First two rows
  REDCapR::redcap_write(
    ds_to_write = _,
    redcap_uri  = credential$redcap_uri,
    token       = credential$token,
    convert_logical_to_integer = TRUE
  )
```

Prep: Recode Variables where Necessary
-------------------------

Some variables in the data.frame might be represented differently than in REDCap.

A common transformation is changing strings into the integers that underlie radio buttons.
Common approaches are [`dplyr::case_match()`](https://dplyr.tidyverse.org/reference/case_match.html) and
using joining to lookup tables (if the mappings are expressed in a csv).
Here's an in-line example of `dplyr::case_match()`.

```r
ds_patient <-
  ds_patient |>
  dplyr::mutate(
    race =
      dplyr::case_match(
        race,
        "White"                       ~  1L,
        "Black or African American"   ~  2L,
        "Asian"                       ~  3L,
        "Native American"             ~  4L,
        "Pacific Islander"            ~  5L,
        "Multiracial"                 ~  6L,
        "Refused or don't know"       ~  7L
      )
  )
```

```{r codebook-race}
#| echo = FALSE,
#| out.extra = 'style = "fig.width=1200px"'
knitr::include_graphics("images/codebook-race.png")
```

Write Entire Patient-level Table
-------------------------

If the small subset works, we usually jump ahead and try all columns and rows.

If this larger table fails, split the difference between
(a) the smaller working example and
(b) the larger failing example.
See if this middle point (that has fewer rows and/or columns than the failing point)
succeeds or fails.
Then repeat.
This "bisection" or "binary search" [debugging technique](https://medium.com/codecastpublication/debugging-tools-and-techniques-binary-search-2da5bb4282c7) is helpful in many areas of programming and statistical modeling.

```r
# patient-entire
ds_patient |>
  REDCapR::redcap_write(
    ds_to_write = _,
    redcap_uri  = credential$redcap_uri,
    token       = credential$token,
    convert_logical_to_integer = TRUE
  )
```

Part 3 - Write Data: Repeating Instrument
===================================

Add Plumbing Variables
-------------------------

As stated in the vignette's intro,
the structure of the dataset uploaded to the server must be precise.
When uploading repeating instruments, there are several important columns:

1. `record_id`: typically indicates the patient's id.  (This field can be renamed for the project.)
1. `redcap_event_name`: If the project is longitudinal or has arms, this indicates the event.
  Otherwise, you don't need to add this variable.
1. `redcap_repeat_instrument`: Indicates the instrument/form that is repeating for these columns.
1. `redcap_repeat_instance`: Typically a sequential positive integer (*e.g.*, 1, 2, 3, ...) indicating the order.

The combination of these variables needs to be unique.
Please read the [Retrieving Longitudinal and Repeating Structures](https://ouhscbbmc.github.io/REDCapR/articles/longitudinal-and-repeating.html)
vignette for details of these variables and their meanings.

You need to pass specific variables so that the REDCap server understands the hierarchical structure of the data points.

```r
# repeat-plumbing
ds_daily <-
  ds_daily |>
  dplyr::group_by(id_code) |>
  dplyr::mutate(
    redcap_repeat_instrument  = "daily",
    redcap_repeat_instance    = dplyr::row_number(da_date),
    daily_complete            = REDCapR::constant("form_complete"),
  ) |>
  dplyr::ungroup() |>
  dplyr::select(
    id_code,                        # Or `record_id`, if you didn't rename it
    # redcap_event_name,            # If the project is longitudinal or has arms
    redcap_repeat_instrument,       # The name of the repeating instrument/form
    redcap_repeat_instance,         # The sequence of the repeating instrument
    tidyselect::everything(),       # All columns not explicitly passed to `dplyr::select()`
    daily_complete,                 # Indicates incomplete, unverified, or complete
  )

# Check for potential problems.  (Remember zero rows are good.)
REDCapR::validate_for_write(ds_daily, convert_logical_to_integer = TRUE)

ds_daily
```

Writing Repeating Instrument Variables
-------------------------

```r
# daily-entire
ds_daily |>
  REDCapR::redcap_write(
    ds_to_write = _,
    redcap_uri  = credential$redcap_uri,
    token       = credential$token,
    convert_logical_to_integer = TRUE
  )
```

Part 4 - Next Steps
===================================

More Complexity
-------------------------

This vignette required only two data.frames, but more complex projects sometimes need more.
For example, each repeating instrument should be its own data.frame and
writing step.  Arms and longitudinal events need to be considered too.

Batching
-------------------------

By default, `REDCapR::redcap_write()` requests datasets of 100 patients as a time,
and stacks the resulting subsets together before returning a data.frame.
This can be adjusted to improve performance;
the 'Details' section of `REDCapR::redcap_write()` discusses the trade offs.

I usually shoot for ~10 seconds per batch.

Manual vs API
-------------------------

Manual downloading/uploading might make sense if you're do the operation only once.
But when does it ever stop after the first time?

If you have trouble uploading, consider adding a few fake patients & measurements
and then download the csv.
It might reveal something you didn't anticipate.
But be aware that it will be in the block matrix format
(*i.e.*, everything jammed into one rectangle.)

Notes
===================================

This vignette was originally designed for the
[2023 R/Medicine](https://events.linuxfoundation.org/r-medicine/) workshop,
_Using REDCap and R to Rapidly Produce Biomedical Publications Cleaning Medical Data_
with [Raymond R. Balise](https://github.com/RaymondBalise), Belén Hervera, Daniel Maya, Anna Calderon, Tyler Bartholomew, Stephan Kadauke, and João Pedro Carmezim Correia.
The entire workshop's slides are at <https://github.com/RaymondBalise/r_med_redcap_2023_public>.

This work was made possible in part by the NIH grant [U54GM104938](https://taggs.hhs.gov/Detail/AwardDetail?arg_AwardNum=U54GM104938&arg_ProgOfficeCode=127)
to the [Oklahoma Shared Clinical and Translational Resource)](http://osctr.ouhsc.edu).

Session Information
==================================================================

For the sake of documentation and reproducibility, the current report was rendered in the following environment.  Click the line below to expand.

<details>
  <summary>Environment <span class="glyphicon glyphicon-plus-sign"></span></summary>
```{r session-info, echo=FALSE}
if (requireNamespace("sessioninfo", quietly = TRUE)) {
  sessioninfo::session_info()
} else {
  sessionInfo()
}
```
</details>