Split a code into multiple codes — recode

This function conditionally splits a code into multiple codes. Note that you may want to use recode_addChildCodes() instead to not lose the original coding.

Usage

recode_split(
  input,
  codes,
  splitToCodes,
  filter = TRUE,
  output = NULL,
  filenameRegex = ".*",
  outputPrefix = "",
  outputSuffix = "_recoded",
  decisionLabel = NULL,
  justification = NULL,
  justificationFile = NULL,
  preventOverwriting = rock::opts$get("preventOverwriting"),
  encoding = rock::opts$get("encoding"),
  silent = rock::opts$get("silent")
)

Arguments

input: One of 1) a character string specifying the path to a file with a source; 2) an object with a loaded source as produced by a call to load_source(); 3) a character string specifying the path to a directory containing one or more sources; 4) or an object with a list of loaded sources as produced by a call to load_sources().
codes: A single character value with the code to split.
splitToCodes: A named list with specifying when to split to which new code. Each element of this list is a filtering criterion that will be passed on to get_source_filter() to create the actual filter that will be applied. The name of each element is the code that will be applied to utterances matching that filter. When calling recode_split() for a single source, instead of passing the filtering criterion, it is also possible to pass a filter (i.e. the result of the call to get_source_filter()), which allows more finegrained control. Note that these split filters and the corresponding codes are processed sequentially in the order specified in splitToCodes. This means that once an utterance that was coded with codes has been matched to one of these 'split filters' (and so, recoded with the corresponding 'split code', i.e., with the name of that split filter in splitToCodes), it will not be recoded again even if it also matches with other split filters down the line. Any utterances coded with the code to split up (i.e. specified in codes) that do not match with any of the split filters specified as the splitToCodes elements will not be recoded and so remain coded with codes. To create a catch-all ('else') category, pass ".*" or TRUE as a filter (see the example).
filter: Optionally, a filter to apply to specify a subset of the source(s) to process (see get_source_filter()).
output: If specified, the recoded source(s) will be written here.
filenameRegex: Only process files matching this regular expression.
outputPrefix, outputSuffix: The prefix and suffix to add to the filenames when writing the processed files to disk, in case multiple sources are passed as input.
decisionLabel: A description of the (recoding) decision that was taken.
justification: The justification for this action.
justificationFile: If specified, the justification is appended to this file. If not, it is saved to the justifier::workspace(). This can then be saved or displayed at the end of the R Markdown file or R script using justifier::save_workspace().
preventOverwriting: Whether to prevent overwriting existing files when writing the files to output.
encoding: The encoding to use.
silent: Whether to be chatty or quiet.

Value

Invisibly, the changed source(s) or source(s) object.

Examples

### Get path to example source
examplePath <-
  system.file("extdata", package="rock");

### Get a path to one example file
exampleFile <-
  file.path(examplePath, "example-1.rock");

### Load example source
loadedExample <- rock::load_source(exampleFile);

### Split a code into two codes, showing progress
recoded_source <-
  rock::recode_split(
    loadedExample,
    codes="childCode1",
    splitToCodes = list(
      and_REPLACED = " and ",
      book_REPLACED = "book",
      else_REPLACED = TRUE
    ),
    silent=FALSE
  );
#> Creating 3 source filters.
#> Splitting filtered/matching occurrences of code 'childCode1' into 'and_REPLACED', 'book_REPLACED' & 'else_REPLACED'.
#> Using regular expression '(\[\[|>)childCode1(\]\]|>)'.
#> 
#> Out of the 132 utterances in the provided source, 8 match both the general filter and the split filter for 'and_REPLACED' and have not yet been matched by a previous split filter. Of these, 2 have been coded with code 'childCode1' and will now be coded with code 'and_REPLACED'.
#> --------PRE: Lorem Ipsum is simply dummy text of the printing and typesetting industry. [[parentCode1>childCode1]]
#>        POST: Lorem Ipsum is simply dummy text of the printing and typesetting industry. [[parentCode1>and_REPLACED]]
#> --------PRE: by accident, sometimes on purpose (injected humour and the like). [[parentCode1>childCode1>grandchildCode3]]
#>        POST: by accident, sometimes on purpose (injected humour and the like). [[parentCode1>and_REPLACED>grandchildCode3]]
#> 
#> Out of the 132 utterances in the provided source, 2 match both the general filter and the split filter for 'book_REPLACED' and have not yet been matched by a previous split filter. Of these, 1 have been coded with code 'childCode1' and will now be coded with code 'book_REPLACED'.
#> --------PRE: ~specimen book. [[parentCode1>childCode2]] [[childCode1]] [[intensity||2]]
#>        POST: ~specimen book. [[parentCode1>childCode2]] [[book_REPLACED]] [[intensity||2]]
#> 
#> Out of the 132 utterances in the provided source, 132 match both the general filter and the split filter for 'else_REPLACED' and have not yet been matched by a previous split filter. Of these, 3 have been coded with code 'childCode1' and will now be coded with code 'else_REPLACED'.
#> --------PRE: using 'Content here, content here', making it look like readable English. [[parentCode1>childCode1>grandchildCode1]]
#>        POST: using 'Content here, content here', making it look like readable English. [[parentCode1>else_REPLACED>grandchildCode1]]
#> --------PRE: ~still in their infancy. [[parentCode1>childCode1>grandchildCode2]]
#>        POST: ~still in their infancy. [[parentCode1>else_REPLACED>grandchildCode2]]
#> --------PRE: accompanied by English versions from the 1914 translation by H. Rackham. [[childCode1>grandchildCode2]]
#>        POST: accompanied by English versions from the 1914 translation by H. Rackham. [[else_REPLACED>grandchildCode2]]
#> 
#> Split 6 instances of code 'childCode1' into 'and_REPLACED', 'book_REPLACED' & 'else_REPLACED'.
#>