Parsing sources — parse

These function parse one (parse_source) or more (parse_sources) sources and the contained identifiers, sections, and codes.

parse_source(
  text,
  file,
  utteranceLabelRegexes = NULL,
  ignoreOddDelimiters = FALSE,
  checkClassInstanceIds = rock::opts$get(checkClassInstanceIds),
  postponeDeductiveTreeBuilding = FALSE,
  filesWithYAML = NULL,
  removeSectionBreakRows = rock::opts$get("removeSectionBreakRows"),
  removeIdentifierRows = rock::opts$get("removeIdentifierRows"),
  removeEmptyRows = rock::opts$get("removeEmptyRows"),
  rlWarn = rock::opts$get("rlWarn"),
  encoding = rock::opts$get("encoding"),
  silent = rock::opts$get("silent")
)

# S3 method for rock_parsedSource
print(x, prefix = "### ", ...)

parse_sources(
  path,
  extension = "rock|dct",
  regex = NULL,
  recursive = TRUE,
  removeSectionBreakRows = rock::opts$get("removeSectionBreakRows"),
  removeIdentifierRows = rock::opts$get("removeIdentifierRows"),
  removeEmptyRows = rock::opts$get("removeEmptyRows"),
  ignoreOddDelimiters = FALSE,
  checkClassInstanceIds = rock::opts$get(checkClassInstanceIds),
  mergeInductiveTrees = FALSE,
  encoding = rock::opts$get(encoding),
  silent = rock::opts$get(silent)
)

# S3 method for rock_parsedSources
print(x, prefix = "### ", ...)

# S3 method for rock_parsedSources
plot(x, ...)

Arguments

text, file: As text or file, you can specify a file to read with encoding encoding, which will then be read using base::readLines(). If the argument is named text, whether it is the path to an existing file is checked first, and if it is, that file is read. If the argument is named file, and it does not point to an existing file, an error is produced (useful if calling from other functions). A text should be a character vector where every element is a line of the original source (like provided by base::readLines()); although if a character vector of one element and including at least one newline character (\\n) is provided as text, it is split at the newline characters using base::strsplit(). Basically, this behavior means that the first argument can be either a character vector or the path to a file; and if you're specifying a file and you want to be certain that an error is thrown if it doesn't exist, make sure to name it file.
utteranceLabelRegexes: Optionally, a list with two-element vectors to preprocess utterances before they are stored as labels (these 'utterance perl regular expression!
ignoreOddDelimiters: If an odd number of YAML delimiters is encountered, whether this should result in an error (FALSE) or just be silently ignored (TRUE).
checkClassInstanceIds: Whether to check for the occurrence of class instance identifiers specified in the attributes.
postponeDeductiveTreeBuilding: Whether to imediately try to build the deductive tree(s) based on the information in this file (FALSE) or whether to skip that. Skipping this is useful if the full tree information is distributed over multiple files (in which case you should probably call parse_sources instead of parse_source).
filesWithYAML: Any additional files to process to look for YAML fragments.
removeSectionBreakRows, removeIdentifierRows, removeEmptyRows: Whether to remove from the QDT, respectively: rows containing section breaks; rows containing only (class instance) identifiers; and empty rows.
rlWarn: Whether to let readLines() warn, e.g. if files do not end with a newline character.
encoding: The encoding of the file to read (in file).
silent: Whether to provide (FALSE) or suppress (TRUE) more detailed progress updates.
x: The object to print.
prefix: The prefix to use before the 'headings' of the printed result.
...: Any additional arguments are passed on to the default print method.
path: The path containing the files to read.
extension: The extension of the files to read; files with other extensions will be ignored. Multiple extensions can be separated by a pipe (|).
regex: Instead of specifing an extension, it's also possible to specify a regular expression; only files matching this regular expression are read. If specified, regex takes precedece over extension,
recursive: Whether to also process subdirectories (TRUE) or not (FALSE).
mergeInductiveTrees: Merge multiple inductive code trees into one; this functionality is currently not yet implemented.

Value

For rock::parse_source(), an object of class rock_parsedSource; for rock::parse_sources(), an object of class rock_parsedSources. These objects contain the original source(s) as well as the final data frame with utterances and codes, as well as the code structures.

Examples

### Get path to example source
examplePath <-
  system.file("extdata", package="rock");

### Get a path to one example file
exampleFile <-
  file.path(examplePath, "example-1.rock");

### Parse single example source
parsedExample <- rock::parse_source(exampleFile);

### Show inductive code tree for the codes
### extracted with the regular expression specified with
### the name 'codes':
parsedExample$inductiveCodeTrees$codes;
#>                      levelName
#> 1  codes                      
#> 2   ¦--parentCode1            
#> 3   ¦   ¦--childCode1         
#> 4   ¦   ¦   ¦--grandchildCode1
#> 5   ¦   ¦   ¦--grandchildCode2
#> 6   ¦   ¦   °--grandchildCode3
#> 7   ¦   ¦--childCode2         
#> 8   ¦   °--childCode3         
#> 9   ¦--parentCode2            
#> 10  ¦   ¦--childCode4         
#> 11  ¦   ¦   ¦--grandchildCode4
#> 12  ¦   ¦   ¦--grandchildCode5
#> 13  ¦   ¦   °--grandchildCode6
#> 14  ¦   °--childCode5         
#> 15  ¦       °--grandchildCode7
#> 16  °--someOtherCode          

### If you want `rock` to be chatty, use:
parsedExample <- rock::parse_source(exampleFile,
                                    silent=FALSE);
#> Read the contents of file '/tmp/RtmpkiPh45/temp_libpath3924362ed76113/rock/extdata/example-1.rock' (133 lines read).
#> Identified 8 lines matching delimiterRegEx '^---$': 1, 7, 91, 102, 105, 110, 112 & 132.
#> Encountered YAML fragments. Parsing them for attributes.
#> Read 0 attributes specifications. Continuing with deductive code trees.
#> Looking for network configuration.
#> Read 11 aesthetic specifications.
#> Done parsing YAML fragments.
#> Found UIDS or class instance identifiers: commencing to process.
#> Found class instance identifiers (for classes with identifiers 'cid'): commencing to process.
#> Processed 1 class instance identifiers ('cid').
#> 
#> Starting to process the inductive codes and building a code tree.
#> Building tree containing all 'true local roots'.
#> - Processing 'true local root' 'parentCode1'.
#> - Processing 'true local root' 'parentCode2'.
#> - Processing 'true local root' 'someOtherCode'.
#> 
#> Processing subtrees of those 'true local roots'.
#> 
#> - Processing subtree consisting of the node sequence 'parentCode1' & 'childCode1'.
#>   - This subtree doesn't only consist of the parent/root code, but contains 1 child(ren). Processing child(ren).
#>     - Processing node 'childCode1'.
#>       - This parent node ('parentCode1') does not yet have a child with the name 'childCode1', so adding it to that parent node (and moving into the new parent node: 'parentCode1').
#> 
#> - Processing subtree consisting of the node sequence 'parentCode1', 'childCode1' & 'grandchildCode1'.
#>   - This subtree doesn't only consist of the parent/root code, but contains 2 child(ren). Processing child(ren).
#>     - Processing node 'childCode1'.
#>       - This parent node ('childCode1') already has a child with the name 'childCode1', so not adding anything at this point (moving into new parent node: 'childCode1').
#>     - Processing node 'grandchildCode1'.
#>       - This parent node ('childCode1') does not yet have a child with the name 'grandchildCode1', so adding it to that parent node (and moving into the new parent node: 'childCode1').
#> 
#> - Processing subtree consisting of the node sequence 'parentCode1', 'childCode1' & 'grandchildCode2'.
#>   - This subtree doesn't only consist of the parent/root code, but contains 2 child(ren). Processing child(ren).
#>     - Processing node 'childCode1'.
#>       - This parent node ('childCode1') already has a child with the name 'childCode1', so not adding anything at this point (moving into new parent node: 'childCode1').
#>     - Processing node 'grandchildCode2'.
#>       - This parent node ('childCode1') does not yet have a child with the name 'grandchildCode2', so adding it to that parent node (and moving into the new parent node: 'childCode1').
#> 
#> - Processing subtree consisting of the node sequence 'parentCode1', 'childCode1' & 'grandchildCode3'.
#>   - This subtree doesn't only consist of the parent/root code, but contains 2 child(ren). Processing child(ren).
#>     - Processing node 'childCode1'.
#>       - This parent node ('childCode1') already has a child with the name 'childCode1', so not adding anything at this point (moving into new parent node: 'childCode1').
#>     - Processing node 'grandchildCode3'.
#>       - This parent node ('childCode1') does not yet have a child with the name 'grandchildCode3', so adding it to that parent node (and moving into the new parent node: 'childCode1').
#> 
#> - Processing subtree consisting of the node sequence 'parentCode1' & 'childCode2'.
#>   - This subtree doesn't only consist of the parent/root code, but contains 1 child(ren). Processing child(ren).
#>     - Processing node 'childCode2'.
#>       - This parent node ('parentCode1') does not yet have a child with the name 'childCode2', so adding it to that parent node (and moving into the new parent node: 'parentCode1').
#> 
#> - Processing subtree consisting of the node sequence 'parentCode1' & 'childCode3'.
#>   - This subtree doesn't only consist of the parent/root code, but contains 1 child(ren). Processing child(ren).
#>     - Processing node 'childCode3'.
#>       - This parent node ('parentCode1') does not yet have a child with the name 'childCode3', so adding it to that parent node (and moving into the new parent node: 'parentCode1').
#> 
#> - Processing subtree consisting of the node sequence 'parentCode2' & 'childCode4'.
#>   - This subtree doesn't only consist of the parent/root code, but contains 1 child(ren). Processing child(ren).
#>     - Processing node 'childCode4'.
#>       - This parent node ('parentCode2') does not yet have a child with the name 'childCode4', so adding it to that parent node (and moving into the new parent node: 'parentCode2').
#> 
#> - Processing subtree consisting of the node sequence 'parentCode2', 'childCode4' & 'grandchildCode4'.
#>   - This subtree doesn't only consist of the parent/root code, but contains 2 child(ren). Processing child(ren).
#>     - Processing node 'childCode4'.
#>       - This parent node ('childCode4') already has a child with the name 'childCode4', so not adding anything at this point (moving into new parent node: 'childCode4').
#>     - Processing node 'grandchildCode4'.
#>       - This parent node ('childCode4') does not yet have a child with the name 'grandchildCode4', so adding it to that parent node (and moving into the new parent node: 'childCode4').
#> 
#> - Processing subtree consisting of the node sequence 'parentCode2', 'childCode4' & 'grandchildCode5'.
#>   - This subtree doesn't only consist of the parent/root code, but contains 2 child(ren). Processing child(ren).
#>     - Processing node 'childCode4'.
#>       - This parent node ('childCode4') already has a child with the name 'childCode4', so not adding anything at this point (moving into new parent node: 'childCode4').
#>     - Processing node 'grandchildCode5'.
#>       - This parent node ('childCode4') does not yet have a child with the name 'grandchildCode5', so adding it to that parent node (and moving into the new parent node: 'childCode4').
#> 
#> - Processing subtree consisting of the node sequence 'parentCode2', 'childCode4' & 'grandchildCode6'.
#>   - This subtree doesn't only consist of the parent/root code, but contains 2 child(ren). Processing child(ren).
#>     - Processing node 'childCode4'.
#>       - This parent node ('childCode4') already has a child with the name 'childCode4', so not adding anything at this point (moving into new parent node: 'childCode4').
#>     - Processing node 'grandchildCode6'.
#>       - This parent node ('childCode4') does not yet have a child with the name 'grandchildCode6', so adding it to that parent node (and moving into the new parent node: 'childCode4').
#> 
#> - Processing subtree consisting of the node sequence 'parentCode2' & 'childCode5'.
#>   - This subtree doesn't only consist of the parent/root code, but contains 1 child(ren). Processing child(ren).
#>     - Processing node 'childCode5'.
#>       - This parent node ('parentCode2') does not yet have a child with the name 'childCode5', so adding it to that parent node (and moving into the new parent node: 'parentCode2').
#> 
#> - Processing subtree consisting of the node sequence 'someOtherCode'.
#>   - This 'subtree' only consists of the parent/root code, so no further processing required.
#> 
#> Processing subtrees of 'local roots that are branches', i.e. single codes that are descendants of other codes (without the full path to the root being specified in the code), or subtrees where the local root is in fact a descendant.
#> 
#> - Processing subtree consisting of the node sequence 'childCode1'.
#>   - This 'subtree' only consists of the parent/root code, so no further processing required.
#> 
#> - Processing subtree consisting of the node sequence 'childCode1' & 'grandchildCode2'.
#>   - This subtree doesn't only consist of the parent/root code, but contains 1 child(ren). Processing child(ren).
#>       - This parent node already has a child with the name 'grandchildCode2', so not adding anything at this point (moving into new parent node: 'grandchildCode2').
#> 
#> - Processing subtree consisting of the node sequence 'childCode5' & 'grandchildCode7'.
#>   - This subtree doesn't only consist of the parent/root code, but contains 1 child(ren). Processing child(ren).
#>       - This parent node does not yet have a child with the name 'grandchildCode7', so adding it to that parent node (and moving into the new parent node: 'childCode5').
#> 
#> - Processing subtree consisting of the node sequence 'grandchildCode2'.
#>   - This 'subtree' only consists of the parent/root code, so no further processing required.
#> 
#> Done processing the inductive code tree.
#> 

### Parse as selection of example sources in that directory
parsedExamples <-
  rock::parse_sources(
    examplePath,
    regex = "(test|example)(.txt|.rock)"
  );

### Show combined inductive code tree for the codes
### extracted with the regular expression specified with
### the name 'codes':
parsedExamples$inductiveCodeTrees$codes;
#>                                levelName
#> 1  codes                                
#> 2   ¦--Topic1                           
#> 3   ¦--Topic2                           
#> 4   ¦--att_ins_eval                     
#> 5   ¦--chairs                           
#> 6   ¦--inductFather                     
#> 7   ¦   ¦--inducChild3                  
#> 8   ¦   ¦--inducChild4                  
#> 9   ¦   °--inducChild5                  
#> 10  ¦--inductMother                     
#> 11  ¦   ¦--inducChild1                  
#> 12  ¦   °--inducChild2                  
#> 13  ¦--internet                         
#> 14  ¦--oaken_chests                     
#> 15  ¦--people                           
#> 16  ¦--tables                           
#> 17  ¦--behavior                         
#> 18  ¦   ¦--click                        
#> 19  ¦   ¦   °--link                     
#> 20  ¦   ¦       °--open_new_tab         
#> 21  ¦   ¦           °--search_engine_hit
#> 22  ¦   ¦--process_query                
#> 23  ¦   ¦--scroll                       
#> 24  ¦   ¦   °--down                     
#> 25  ¦   °--typing                       
#> 26  °--screen                           
#> 27      °--google                       

### Show a souce coded with the Qualitative Network Approach
qnaExample <-
  rock::parse_source(
    file.path(
      examplePath,
      "network-example-1.rock"
    )
  );