| VCorpus {tm} | R Documentation |
Data structures and operators for volatile corpora.
Corpus(x, readerControl = list(reader = x$DefaultReader, language = "en")) VCorpus(x, readerControl = list(reader = x$DefaultReader, language = "en")) ## S3 method for class 'VCorpus' DMetaData(x) ## S3 method for class 'Corpus' CMetaData(x)
x |
A |
readerControl |
A list with the named components |
Volatile means that the corpus is fully kept in memory and thus all
changes only affect the corresponding R object. In contrast there is
also a corpus implementation available providing a permanent semantics
(see PCorpus).
The constructed corpus object inherits from a list and has two
attributes containing meta information:
CMetaDataCorpus Meta Data contains corpus specific meta data in form of tag-value pairs and information about children in form of a binary tree. This information is useful for reconstructing meta data after e.g. merging corpora.
DMetaDataDocument Meta Data of class
data.frame contains document specific meta data for the
corpus. This data frame typically encompasses clustering or
classification results which basically are metadata for documents
but form an own entity (e.g., with its name, the value range,
etc.).
An object of class VCorpus which extends the classes
Corpus and list containing a collection of text
documents.
Ingo Feinerer
reut21578 <- system.file("texts", "crude", package = "tm")
(r <- Corpus(DirSource(reut21578),
readerControl = list(reader = readReut21578XMLasPlain)))