| PCorpus {tm} | R Documentation |
Construct a permanent corpus.
PCorpus(x,
readerControl = list(reader = x$DefaultReader, language = "en"),
dbControl = list(dbName = "", dbType = "DB1"))
DBControl(x)
## S3 method for class 'PCorpus'
DMetaData(x)
x |
A |
readerControl |
A list with the named components |
dbControl |
A list with the named components |
Permanent means that documents are physically stored outside of R (e.g., in a database) and R objects are only pointers to external structures. I.e., changes in the underlying external representation can affect multiple R objects simultaneously.
The constructed corpus object inherits from a list and has
three attributes containing meta and database management
information:
CMetaDataCorpus Meta Data contains corpus specific meta data in form of tag-value pairs and information about children in form of a binary tree. This information is useful for reconstructing meta data after e.g. merging corpora.
DMetaDataDocument Meta Data of class
data.frame contains document specific meta data for the
corpus. This data frame typically encompasses clustering or
classification results which basically are metadata for documents
but form an own entity (e.g., with its name, the value range,
etc.).
DBControlDatabase control field is a list with
two named components: dbName holds the path to the
permanent database storage, and dbType stores the database
type.
An object of class PCorpus which extends the classes
Corpus and list containing a permanent corpus.
Ingo Feinerer
txt <- system.file("texts", "txt", package = "tm")
## Not run: PCorpus(DirSource(txt),
dbControl = list(dbName = "myDB.db", dbType = "DB1"))
## End(Not run)