| ReutersSource {tm} | R Documentation |
Construct a source for an input containing several Reuters-21578 XML documents.
ReutersSource(x, encoding = "unknown")
x |
Either a character identifying the file or a connection. |
encoding |
encoding to be assumed for input strings. It is used to mark character strings as known to be in Latin-1 or UTF-8: it is not used to re-encode the input. |
An object of class XMLSource which extends the class
Source representing a Reuters-21578 XML document.
Ingo Feinerer
Lewis, David (1997) Reuters-21578 Text Categorization Collection Distribution 1.0. http://kdd.ics.uci.edu/databases/reuters21578/reuters21578.html
Luz, Saturnino XML-encoded version of Reuters-21578. http://modnlp.berlios.de/reuters21578.html
getSources to list available sources. Encoding on encodings in R.
reuters21578 <- system.file("texts", "reuters-21578.xml", package = "tm")
rs <- ReutersSource(reuters21578)
inspect(Corpus(rs)[1:2])