These function parse one (parse_source
) or more (parse_sources
) sources and the
contained identifiers, sections, and codes.
Usage
parse_source(
text,
file,
utteranceLabelRegexes = NULL,
ignoreOddDelimiters = FALSE,
checkClassInstanceIds = rock::opts$get(checkClassInstanceIds),
postponeDeductiveTreeBuilding = FALSE,
filesWithYAML = NULL,
mergeAttributes = TRUE,
removeSectionBreakRows = rock::opts$get("removeSectionBreakRows"),
removeIdentifierRows = rock::opts$get("removeIdentifierRows"),
removeEmptyRows = rock::opts$get("removeEmptyRows"),
suppressDuplicateInstanceWarnings = rock::opts$get("suppressDuplicateInstanceWarnings"),
rlWarn = rock::opts$get("rlWarn"),
encoding = rock::opts$get("encoding"),
silent = rock::opts$get("silent")
)
# S3 method for class 'rock_parsedSource'
print(x, prefix = "### ", ...)
parse_sources(
path,
extension = "rock|dct",
regex = NULL,
recursive = TRUE,
removeSectionBreakRows = rock::opts$get("removeSectionBreakRows"),
removeIdentifierRows = rock::opts$get("removeIdentifierRows"),
removeEmptyRows = rock::opts$get("removeEmptyRows"),
suppressDuplicateInstanceWarnings = rock::opts$get("suppressDuplicateInstanceWarnings"),
filesWithYAML = NULL,
ignoreOddDelimiters = FALSE,
checkClassInstanceIds = rock::opts$get("checkClassInstanceIds"),
mergeInductiveTrees = FALSE,
encoding = rock::opts$get("encoding"),
silent = rock::opts$get("silent")
)
# S3 method for class 'rock_parsedSources'
print(x, prefix = "### ", ...)
# S3 method for class 'rock_parsedSources'
plot(x, ...)
Arguments
- text, file
As
text
orfile
, you can specify afile
to read with encodingencoding
, which will then be read usingbase::readLines()
. If the argument is namedtext
, whether it is the path to an existing file is checked first, and if it is, that file is read. If the argument is namedfile
, and it does not point to an existing file, an error is produced (useful if calling from other functions). Atext
should be a character vector where every element is a line of the original source (like provided bybase::readLines()
); although if a character vector of one element and including at least one newline character (\\n
) is provided astext
, it is split at the newline characters usingbase::strsplit()
. Basically, this behavior means that the first argument can be either a character vector or the path to a file; and if you're specifying a file and you want to be certain that an error is thrown if it doesn't exist, make sure to name itfile
.- utteranceLabelRegexes
Optionally, a list with two-element vectors to preprocess utterances before they are stored as labels (these 'utterance perl regular expression!
- ignoreOddDelimiters
If an odd number of YAML delimiters is encountered, whether this should result in an error (
FALSE
) or just be silently ignored (TRUE
).- checkClassInstanceIds
Whether to check for the occurrence of class instance identifiers specified in the attributes.
- postponeDeductiveTreeBuilding
Whether to imediately try to build the deductive tree(s) based on the information in this file (
FALSE
) or whether to skip that. Skipping this is useful if the full tree information is distributed over multiple files (in which case you should probably callparse_sources
instead ofparse_source
).- filesWithYAML
Any additional files to process to look for YAML fragments.
- mergeAttributes
Whether to merge the data frame with the attributes into the qualitative data table (i.e., the data frame with the data fragments and codes).
- removeSectionBreakRows, removeIdentifierRows, removeEmptyRows
Whether to remove from the QDT, respectively: rows containing section breaks; rows containing only (class instance) identifiers; and empty rows.
- suppressDuplicateInstanceWarnings
Whether to suppress warnings about duplicate instances (as resulting from inconsistent specifications of attributes for class instances).
- rlWarn
Whether to let
readLines()
warn, e.g. if files do not end with a newline character.- encoding
The encoding of the file to read (in
file
).- silent
Whether to provide (
FALSE
) or suppress (TRUE
) more detailed progress updates.- x
The object to print.
- prefix
The prefix to use before the 'headings' of the printed result.
- ...
Any additional arguments are passed on to the default print method.
- path
The path containing the files to read.
- extension
The extension of the files to read; files with other extensions will be ignored. Multiple extensions can be separated by a pipe (
|
).- regex
Instead of specifing an extension, it's also possible to specify a regular expression; only files matching this regular expression are read. If specified,
regex
takes precedece overextension
,- recursive
Whether to also process subdirectories (
TRUE
) or not (FALSE
).- mergeInductiveTrees
Merge multiple inductive code trees into one; this functionality is currently not yet implemented.
Value
For rock::parse_source()
, an object of class rock_parsedSource
;
for rock::parse_sources()
, an object of class rock_parsedSources
. These
objects contain the original source(s) as well as the final data frame with
utterances and codes, as well as the code structures.
Examples
### Get path to example source
examplePath <-
system.file("extdata", package="rock");
### Get a path to one example file
exampleFile <-
file.path(examplePath, "example-1.rock");
### Parse single example source
parsedExample <- rock::parse_source(exampleFile);
### Show inductive code tree for the codes
### extracted with the regular expression specified with
### the name 'codes':
parsedExample$inductiveCodeTrees$codes;
#> levelName
#> 1 codes
#> 2 ¦--parentCode1
#> 3 ¦ ¦--childCode1
#> 4 ¦ ¦ ¦--grandchildCode1
#> 5 ¦ ¦ ¦--grandchildCode2
#> 6 ¦ ¦ °--grandchildCode3
#> 7 ¦ ¦--childCode2
#> 8 ¦ °--childCode3
#> 9 ¦--parentCode2
#> 10 ¦ ¦--childCode4
#> 11 ¦ ¦ ¦--grandchildCode4
#> 12 ¦ ¦ ¦--grandchildCode5
#> 13 ¦ ¦ °--grandchildCode6
#> 14 ¦ °--childCode5
#> 15 ¦ °--grandchildCode7
#> 16 °--someOtherCode
### If you want `rock` to be chatty, use:
parsedExample <- rock::parse_source(exampleFile,
silent=FALSE);
#> Read the contents of file '/tmp/RtmplEM5kw/temp_libpath4b901f8374f0/rock/extdata/example-1.rock' (133 lines read).
#> Identified 8 lines matching delimiterRegEx '^---$': 1, 7, 91, 102, 105, 110, 112 & 132.
#> Encountered YAML fragments. Parsing them for attributes.
#> Read 0 attributes specifications. Continuing with deductive code trees.
#> Looking for network configuration.
#> Read 11 aesthetic specifications.
#> Done parsing YAML fragments.
#> Found UIDS or class instance identifiers: commencing to process.
#> Found class instance identifiers (for classes with identifiers 'cid'): commencing to process.
#> Processed 1 class instance identifiers ('cid').
#>
#> Starting to process the inductive codes and building a code tree.
#> Building tree containing all 'true local roots'.
#> - Processing 'true local root' 'parentCode1'.
#> - Processing 'true local root' 'parentCode2'.
#> - Processing 'true local root' 'someOtherCode'.
#>
#> Processing subtrees of those 'true local roots'.
#>
#> - Processing subtree consisting of the node sequence 'parentCode1' & 'childCode1'.
#> - This subtree doesn't only consist of the parent/root code, but contains 1 child(ren). Processing child(ren).
#> - Processing node 'childCode1'.
#> - This parent node ('parentCode1') does not yet have a child with the name 'childCode1', so adding it to that parent node (and moving into the new parent node: 'parentCode1').
#>
#> - Processing subtree consisting of the node sequence 'parentCode1', 'childCode1' & 'grandchildCode1'.
#> - This subtree doesn't only consist of the parent/root code, but contains 2 child(ren). Processing child(ren).
#> - Processing node 'childCode1'.
#> - This parent node ('childCode1') already has a child with the name 'childCode1', so not adding anything at this point (moving into new parent node: 'childCode1').
#> - Processing node 'grandchildCode1'.
#> - This parent node ('childCode1') does not yet have a child with the name 'grandchildCode1', so adding it to that parent node (and moving into the new parent node: 'childCode1').
#>
#> - Processing subtree consisting of the node sequence 'parentCode1', 'childCode1' & 'grandchildCode2'.
#> - This subtree doesn't only consist of the parent/root code, but contains 2 child(ren). Processing child(ren).
#> - Processing node 'childCode1'.
#> - This parent node ('childCode1') already has a child with the name 'childCode1', so not adding anything at this point (moving into new parent node: 'childCode1').
#> - Processing node 'grandchildCode2'.
#> - This parent node ('childCode1') does not yet have a child with the name 'grandchildCode2', so adding it to that parent node (and moving into the new parent node: 'childCode1').
#>
#> - Processing subtree consisting of the node sequence 'parentCode1', 'childCode1' & 'grandchildCode3'.
#> - This subtree doesn't only consist of the parent/root code, but contains 2 child(ren). Processing child(ren).
#> - Processing node 'childCode1'.
#> - This parent node ('childCode1') already has a child with the name 'childCode1', so not adding anything at this point (moving into new parent node: 'childCode1').
#> - Processing node 'grandchildCode3'.
#> - This parent node ('childCode1') does not yet have a child with the name 'grandchildCode3', so adding it to that parent node (and moving into the new parent node: 'childCode1').
#>
#> - Processing subtree consisting of the node sequence 'parentCode1' & 'childCode2'.
#> - This subtree doesn't only consist of the parent/root code, but contains 1 child(ren). Processing child(ren).
#> - Processing node 'childCode2'.
#> - This parent node ('parentCode1') does not yet have a child with the name 'childCode2', so adding it to that parent node (and moving into the new parent node: 'parentCode1').
#>
#> - Processing subtree consisting of the node sequence 'parentCode1' & 'childCode3'.
#> - This subtree doesn't only consist of the parent/root code, but contains 1 child(ren). Processing child(ren).
#> - Processing node 'childCode3'.
#> - This parent node ('parentCode1') does not yet have a child with the name 'childCode3', so adding it to that parent node (and moving into the new parent node: 'parentCode1').
#>
#> - Processing subtree consisting of the node sequence 'parentCode2' & 'childCode4'.
#> - This subtree doesn't only consist of the parent/root code, but contains 1 child(ren). Processing child(ren).
#> - Processing node 'childCode4'.
#> - This parent node ('parentCode2') does not yet have a child with the name 'childCode4', so adding it to that parent node (and moving into the new parent node: 'parentCode2').
#>
#> - Processing subtree consisting of the node sequence 'parentCode2', 'childCode4' & 'grandchildCode4'.
#> - This subtree doesn't only consist of the parent/root code, but contains 2 child(ren). Processing child(ren).
#> - Processing node 'childCode4'.
#> - This parent node ('childCode4') already has a child with the name 'childCode4', so not adding anything at this point (moving into new parent node: 'childCode4').
#> - Processing node 'grandchildCode4'.
#> - This parent node ('childCode4') does not yet have a child with the name 'grandchildCode4', so adding it to that parent node (and moving into the new parent node: 'childCode4').
#>
#> - Processing subtree consisting of the node sequence 'parentCode2', 'childCode4' & 'grandchildCode5'.
#> - This subtree doesn't only consist of the parent/root code, but contains 2 child(ren). Processing child(ren).
#> - Processing node 'childCode4'.
#> - This parent node ('childCode4') already has a child with the name 'childCode4', so not adding anything at this point (moving into new parent node: 'childCode4').
#> - Processing node 'grandchildCode5'.
#> - This parent node ('childCode4') does not yet have a child with the name 'grandchildCode5', so adding it to that parent node (and moving into the new parent node: 'childCode4').
#>
#> - Processing subtree consisting of the node sequence 'parentCode2', 'childCode4' & 'grandchildCode6'.
#> - This subtree doesn't only consist of the parent/root code, but contains 2 child(ren). Processing child(ren).
#> - Processing node 'childCode4'.
#> - This parent node ('childCode4') already has a child with the name 'childCode4', so not adding anything at this point (moving into new parent node: 'childCode4').
#> - Processing node 'grandchildCode6'.
#> - This parent node ('childCode4') does not yet have a child with the name 'grandchildCode6', so adding it to that parent node (and moving into the new parent node: 'childCode4').
#>
#> - Processing subtree consisting of the node sequence 'parentCode2' & 'childCode5'.
#> - This subtree doesn't only consist of the parent/root code, but contains 1 child(ren). Processing child(ren).
#> - Processing node 'childCode5'.
#> - This parent node ('parentCode2') does not yet have a child with the name 'childCode5', so adding it to that parent node (and moving into the new parent node: 'parentCode2').
#>
#> - Processing subtree consisting of the node sequence 'someOtherCode'.
#> - This 'subtree' only consists of the parent/root code, so no further processing required.
#>
#> Processing subtrees of 'local roots that are branches', i.e. single codes that are descendants of other codes (without the full path to the root being specified in the code), or subtrees where the local root is in fact a descendant.
#>
#> - Processing subtree consisting of the node sequence 'childCode1'.
#> - This 'subtree' only consists of the parent/root code, so no further processing required.
#>
#> - Processing subtree consisting of the node sequence 'childCode1' & 'grandchildCode2'.
#> - This subtree doesn't only consist of the parent/root code, but contains 1 child(ren). Processing child(ren).
#> - This parent node already has a child with the name 'grandchildCode2', so not adding anything at this point (moving into new parent node: 'grandchildCode2').
#>
#> - Processing subtree consisting of the node sequence 'childCode5' & 'grandchildCode7'.
#> - This subtree doesn't only consist of the parent/root code, but contains 1 child(ren). Processing child(ren).
#> - This parent node does not yet have a child with the name 'grandchildCode7', so adding it to that parent node (and moving into the new parent node: 'childCode5').
#>
#> - Processing subtree consisting of the node sequence 'grandchildCode2'.
#> - This 'subtree' only consists of the parent/root code, so no further processing required.
#>
#> Done processing the inductive code tree.
#>
### Parse as selection of example sources in that directory
parsedExamples <-
rock::parse_sources(
examplePath,
regex = "(test|example)(.txt|.rock)"
);
### Show combined inductive code tree for the codes
### extracted with the regular expression specified with
### the name 'codes':
parsedExamples$inductiveCodeTrees$codes;
#> levelName
#> 1 codes
#> 2 ¦--Topic1
#> 3 ¦--Topic2
#> 4 ¦--att_ins_eval
#> 5 ¦--chairs
#> 6 ¦--inductFather
#> 7 ¦ ¦--inducChild3
#> 8 ¦ ¦--inducChild4
#> 9 ¦ °--inducChild5
#> 10 ¦--inductMother
#> 11 ¦ ¦--inducChild1
#> 12 ¦ °--inducChild2
#> 13 ¦--internet
#> 14 ¦--oaken_chests
#> 15 ¦--people
#> 16 ¦--tables
#> 17 ¦--behavior
#> 18 ¦ ¦--click
#> 19 ¦ ¦ °--link
#> 20 ¦ ¦ °--open_new_tab
#> 21 ¦ ¦ °--search_engine_hit
#> 22 ¦ ¦--process_query
#> 23 ¦ ¦--scroll
#> 24 ¦ ¦ °--down
#> 25 ¦ °--typing
#> 26 °--screen
#> 27 °--google
### Show a souce coded with the Qualitative Network Approach
qnaExample <-
rock::parse_source(
file.path(
examplePath,
"network-example-1.rock"
)
);