These functions can be used to mask a set of utterances or one or more sources.

mask_source(
  input,
  output = NULL,
  proportionToMask = 1,
  preventOverwriting = rock::opts$get(preventOverwriting),
  encoding = rock::opts$get(encoding),
  rlWarn = rock::opts$get(rlWarn),
  maskRegex = "[[:alnum:]]",
  maskChar = "X",
  perl = TRUE,
  silent = rock::opts$get(silent)
)

mask_sources(
  input,
  output,
  proportionToMask = 1,
  outputPrefix = "",
  outputSuffix = "_masked",
  maskRegex = "[[:alnum:]]",
  maskChar = "X",
  perl = TRUE,
  recursive = TRUE,
  filenameRegex = ".*",
  filenameReplacement = c("_PRIVATE_", "_public_"),
  preventOverwriting = rock::opts$get(preventOverwriting),
  encoding = rock::opts$get(encoding),
  silent = rock::opts$get(silent)
)

mask_utterances(
  input,
  proportionToMask = 1,
  maskRegex = "[[:alnum:]]",
  maskChar = "X",
  perl = TRUE
)

Arguments

input

For mask_utterance, a character vector where each element is one utterance; for mask_source, either a character vector containing the text of the relevant source or a path to a file that contains the source text; for mask_sources, a path to a directory that contains the sources to mask.

output

For mask_source, if not NULL, this is the name (and path) of the file in which to save the processed source (if it is NULL, the result will be returned visibly). For mask_sources, output is mandatory and is the path to the directory where to store the processed sources. This path will be created with a warning if it does not exist. An exception is if "same" is specified - in that case, every file will be written to the same directory it was read from.

proportionToMask

The proportion of utterances to mask, from 0 (none) to 1 (all).

preventOverwriting

Whether to prevent overwriting of output files.

encoding

The encoding of the source(s).

rlWarn

Whether to let readLines() warn, e.g. if files do not end with a newline character.

maskRegex

A regular expresssion (regex) specifying the characters to mask (i.e. replace with the masking character).

maskChar

The character to replace the character to mask with.

perl

Whether the regular expression is a perl regex or not.

silent

Whether to suppress the warning about not editing the cleaned source.

outputPrefix, outputSuffix

The prefix and suffix to add to the filenames when writing the processed files to disk.

recursive

Whether to search all subdirectories (TRUE) as well or not.

filenameRegex

A regular expression to match against located files; only files matching this regular expression are processed.

filenameReplacement

A character vector with two elements that represent, respectively, the pattern and replacement arguments of the gsub() function. In other words, the first argument specifies a regular expression to search for in every processed filename, and the second argument specifies a regular expression that replaces any matches with the first argument. Set to NULL to not perform any replacement on the output file name.

Value

A character vector for mask_utterance and mask_source, or a list of character vectors, for mask_sources.

Examples

### Mask text but not the codes
rock::mask_utterances(
  paste0(
    "Lorem ipsum dolor sit amet, consectetur adipiscing ",
    "elit. [[expAttitude_expectation_73dnt5z1>earplugsFeelUnpleasant]]"
  )
)
#> [1] "XXXXX XXXXX XXXXX XXX XXXX, XXXXXXXXXXX XXXXXXXXXX XXXX. [[expAttitude_expectation_73dnt5z1>earplugsFeelUnpleasant]]"