ReutersSource {tm}R Documentation

Reuters-21578 XML Source

Description

Construct a source for an input containing several Reuters-21578 XML documents.

Usage

ReutersSource(x, encoding = "unknown")

Arguments

x

Either a character identifying the file or a connection.

encoding

encoding to be assumed for input strings. It is used to mark character strings as known to be in Latin-1 or UTF-8: it is not used to re-encode the input.

Value

An object of class XMLSource which extends the class Source representing a Reuters-21578 XML document.

Author(s)

Ingo Feinerer

References

Lewis, David (1997) Reuters-21578 Text Categorization Collection Distribution 1.0. http://kdd.ics.uci.edu/databases/reuters21578/reuters21578.html

Luz, Saturnino XML-encoded version of Reuters-21578. http://modnlp.berlios.de/reuters21578.html

See Also

getSources to list available sources. Encoding on encodings in R.

Examples

reuters21578 <- system.file("texts", "reuters-21578.xml", package = "tm")
rs <- ReutersSource(reuters21578)
inspect(Corpus(rs)[1:2])

[Package tm version 0.5-10 Index]