| readXML {tm} | R Documentation |
Return a function which reads in an XML document. The structure of the XML document can be described with a specification.
readXML(spec, doc)
spec |
A named list of lists each containing two
components. The constructed reader will map each list
entry to an attribute or meta datum corresponding to the named list
entry. Valid names include Each list entry must consist of two components: the first must be a string describing the type of the second argument, and the second is the specification entry. Valid combinations are:
|
doc |
An (empty) document of some subclass of |
Formally this function is a function generator, i.e., it returns a function (which reads in a text document) with a well-defined signature, but can access passed over arguments (e.g., the specification) via lexical scoping.
A function with the signature elem, language, id:
elema list with the named component content which
must hold the document to be read in.
languagea string giving the text's language.
ida unique identification string for the returned text document.
The function returns doc augmented by the parsed information
as described by spec out of the XML file in
elem$content.
Ingo Feinerer
Vignette 'Extensions: How to Handle Custom File Formats',
XMLSource.
getReaders to list available reader functions.
readGmane <-
readXML(spec = list(Author = list("node", "/item/creator"),
Content = list("node", "/item/description"),
DateTimeStamp = list("function", function(node)
strptime(sapply(XML::getNodeSet(node, "/item/date"), XML::xmlValue),
format = "%Y-%m-%dT%H:%M:%S",
tz = "GMT")),
Description = list("unevaluated", ""),
Heading = list("node", "/item/title"),
ID = list("node", "/item/link"),
Origin = list("unevaluated", "Gmane Mailing List Archive")),
doc = PlainTextDocument())