The Extensible Markup Language (XML) is a markup language much like HTML or SGML. This is recommended by the World Wide Web Consortium and available as an open standard.
XML is a portable, open source language that allows programmers to develop applications that can be read by other applications, regardless of operating system and/or developmental language.
XML is extremely useful for keeping track of small to medium amounts of data without requiring a SQL-based backbone.
There are two different flavors available for XML parsers −
SAX-like (Stream interfaces) − Here you register callbacks for events of interest and then let the parser proceed through the document. This is useful when your documents are large or you have memory limitations, it parses the file as it reads it from disk, and the entire file is never stored in memory.
DOM-like (Object tree interfaces) − This is World Wide Web Consortium recommendation wherein the entire file is read into memory and stored in a hierarchical (tree-based) form to represent all the features of an XML document.
SAX obviously can't process information as fast as DOM can when working with large files. On the other hand, using DOM exclusively can really kill your resources, especially if used on a lot of small files.
SAX is read-only, while DOM allows changes to the XML file. Since these two different APIs literally complement each other there is no reason why you can't use them both for large projects.
The most common way to manipulate XML is with the REXML library by Sean Russell. Since 2002, REXML has been part of the standard Ruby distribution.
REXML is a pure-Ruby XML processor conforming to the XML 1.0 standard. It is a non-validating processor, passing all of the OASIS non-validating conformance tests.
REXML parser has the following advantages over other available parsers −
For all our XML code examples, let's use a simple XML file as an input
War, Thriller
DVD
2003
PG
10
Talk about a US-Japan war
Anime, Science Fiction
DVD
1989
R
8
A schientific fiction
Anime, Action
DVD
4
PG
10
Vash the Stampede!
Comedy
VHS
PG
2
Viewable boredom
|
Let's first parse our XML data in tree fashion. We begin by requiring the rexml/document library; often we do an include REXML to import into the top-level namespace for convenience
#!/usr/bin/ruby -w
require 'rexml/document'
include REXML
xmlfile = File.new("movies.xml")
xmldoc = Document.new(xmlfile)
# Now get the root element
root = xmldoc.root
puts "Root element : " + root.attributes["shelf"]
# This will output all the movie titles.
xmldoc.elements.each("collection/movie"){
|e| puts "Movie Title : " + e.attributes["title"]
}
# This will output all the movie types.
xmldoc.elements.each("collection/movie/type") {
|e| puts "Movie Type : " + e.text
}
# This will output all the movie description.
xmldoc.elements.each("collection/movie/description") {
|e| puts "Movie Description : " + e.text
}
|
Output
Root element : New Arrivals
Movie Title : Enemy Behind
Movie Title : Transformers
Movie Title : Trigun
Movie Title : Ishtar
Movie Type : War, Thriller
Movie Type : Anime, Science Fiction
Movie Type : Anime, Action
Movie Type : Comedy
Movie Description : Talk about a US-Japan war
Movie Description : A schientific fiction
Movie Description : Vash the Stampede!
Movie Description : Viewable boredom
|
To process the same data, movies.xml, file in a stream-oriented way we will define a listener class whose methods will be the target of callbacks from the parser
#!/usr/bin/ruby -w
require 'rexml/document'
require 'rexml/streamlistener'
include REXML
class MyListener
include REXML::StreamListener
def tag_start(*args)
puts "tag_start: #{args.map {|x| x.inspect}.join(', ')}"
end
def text(data)
return if data =~ /^\w*$/ # whitespace only
abbrev = data[0..40] + (data.length > 40 ? "..." : "")
puts " text : #{abbrev.inspect}"
end
end
list = MyListener.new
xmlfile = File.new("movies.xml")
Document.parse_stream(xmlfile, list)
|
Output
tag_start: "collection", {"shelf"=>"New Arrivals"}
tag_start: "movie", {"title"=>"Enemy Behind"}
tag_start: "type", {}
text : "War, Thriller"
tag_start: "format", {}
tag_start: "year", {}
tag_start: "rating", {}
tag_start: "stars", {}
tag_start: "description", {}
text : "Talk about a US-Japan war"
tag_start: "movie", {"title"=>"Transformers"}
tag_start: "type", {}
text : "Anime, Science Fiction"
tag_start: "format", {}
tag_start: "year", {}
tag_start: "rating", {}
tag_start: "stars", {}
tag_start: "description", {}
text : "A schientific fiction"
tag_start: "movie", {"title"=>"Trigun"}
tag_start: "type", {}
text : "Anime, Action"
tag_start: "format", {}
tag_start: "episodes", {}
tag_start: "rating", {}
tag_start: "stars", {}
tag_start: "description", {}
text : "Vash the Stampede!"
tag_start: "movie", {"title"=>"Ishtar"}
tag_start: "type", {}
tag_start: "format", {}
tag_start: "rating", {}
tag_start: "stars", {}
tag_start: "description", {}
text : "Viewable boredom"
|
An alternative way to view XML is XPath. This is a kind of pseudo-language that describes how to locate specific elements and attributes in an XML document, treating that document as a logical ordered tree.
REXML has XPath support via the XPath class. It assumes tree-based parsing (document object model) as we have seen above
#!/usr/bin/ruby -w
require 'rexml/document'
include REXML
xmlfile = File.new("movies.xml")
xmldoc = Document.new(xmlfile)
# Info for the first movie found
movie = XPath.first(xmldoc, "//movie")
p movie
# Print out all the movie types
XPath.each(xmldoc, "//type") { |e| puts e.text }
# Get an array of all of the movie formats.
names = XPath.match(xmldoc, "//format").map {|x| x.text }
p names
|
Output
...
War, Thriller
Anime, Science Fiction
Anime, Action
Comedy
["DVD", "DVD", "DVD", "VHS"]
|
There are two XSLT parsers available that Ruby can use. A brief description of each is given here.
This parser is written and maintained by Masayoshi Takahashi. This is written primarily for Linux OS and requires the following libraries −
You can find this module at Ruby-Sablotron.
XSLT4R is written by Michael Neumann and can be found at the RAA in the Library section under XML. XSLT4R uses a simple commandline interface, though it can alternatively be used within a third-party application to transform an XML document.
XSLT4R needs XMLScan to operate, which is included within the XSLT4R archive and which is also a 100 percent Ruby module. These modules can be installed using standard Ruby installation method (i.e., ruby install.rb).
XSLT4R has the following syntax
ruby xslt.rb stylesheet.xsl document.xml [arguments]
|
Ruby program to add two integer numbers
how to create an array with Array.[](*args) in Ruby ?
What are the various Ruby runtimes, and how are they different?
Ruby program to check whether the given number is prime or not
Ruby program to reverse a string
Ruby program to check whether the given number is palindrome
Ruby program to print Fibonacci series
How to Replace array elements in Ruby?
Ruby program to print an array
Ruby program to check whether the given number is Armstrong
Program to Print Triangle of Numbers in Ruby
How to add/remove elements to Array in Ruby?
How to shuffle an array in Ruby?
Creating Array with Array.new(size, obj) in Ruby
Ruby program to generate random numbers
Ruby program to Calculate the factorial of given number
What are #method_missing and #send? Why are they useful?
How to Sort Array in Ruby?
How to get index of array element in Ruby
How to Get Input with Gets in Ruby
How to create two dimensional array in ruby?