These functions can be used to mask a set of utterances or one or more sources.
Usage
mask_source(
input,
output = NULL,
proportionToMask = 1,
preventOverwriting = rock::opts$get(preventOverwriting),
encoding = rock::opts$get(encoding),
rlWarn = rock::opts$get(rlWarn),
maskRegex = "[[:alnum:]]",
maskChar = "X",
perl = TRUE,
silent = rock::opts$get(silent)
)
mask_sources(
input,
output,
proportionToMask = 1,
outputPrefix = "",
outputSuffix = "_masked",
maskRegex = "[[:alnum:]]",
maskChar = "X",
perl = TRUE,
recursive = TRUE,
filenameRegex = ".*",
filenameReplacement = c("_PRIVATE_", "_public_"),
preventOverwriting = rock::opts$get(preventOverwriting),
encoding = rock::opts$get(encoding),
silent = rock::opts$get(silent)
)
mask_utterances(
input,
proportionToMask = 1,
maskRegex = "[[:alnum:]]",
maskChar = "X",
perl = TRUE
)
Arguments
- input
For
mask_utterance
, a character vector where each element is one utterance; formask_source
, either a character vector containing the text of the relevant source or a path to a file that contains the source text; formask_sources
, a path to a directory that contains the sources to mask.- output
For
mask_source
, if notNULL
, this is the name (and path) of the file in which to save the processed source (if it isNULL
, the result will be returned visibly). Formask_sources
,output
is mandatory and is the path to the directory where to store the processed sources. This path will be created with a warning if it does not exist. An exception is if "same
" is specified - in that case, every file will be written to the same directory it was read from.- proportionToMask
The proportion of utterances to mask, from 0 (none) to 1 (all).
- preventOverwriting
Whether to prevent overwriting of output files.
- encoding
The encoding of the source(s).
- rlWarn
Whether to let
readLines()
warn, e.g. if files do not end with a newline character.- maskRegex
A regular expresssion (regex) specifying the characters to mask (i.e. replace with the masking character).
- maskChar
The character to replace the character to mask with.
- perl
Whether the regular expression is a perl regex or not.
- silent
Whether to suppress the warning about not editing the cleaned source.
- outputPrefix, outputSuffix
The prefix and suffix to add to the filenames when writing the processed files to disk.
- recursive
Whether to search all subdirectories (
TRUE
) as well or not.- filenameRegex
A regular expression to match against located files; only files matching this regular expression are processed.
- filenameReplacement
A character vector with two elements that represent, respectively, the
pattern
andreplacement
arguments of thegsub()
function. In other words, the first argument specifies a regular expression to search for in every processed filename, and the second argument specifies a regular expression that replaces any matches with the first argument. Set toNULL
to not perform any replacement on the output file name.
Value
A character vector for mask_utterance
and mask_source
, or a list of
character vectors, for mask_sources
.
Examples
### Mask text but not the codes
rock::mask_utterances(
paste0(
"Lorem ipsum dolor sit amet, consectetur adipiscing ",
"elit. [[expAttitude_expectation_73dnt5z1>earplugsFeelUnpleasant]]"
)
)
#> [1] "XXXXX XXXXX XXXXX XXX XXXX, XXXXXXXXXXX XXXXXXXXXX XXXX. [[expAttitude_expectation_73dnt5z1>earplugsFeelUnpleasant]]"