Java-Web:Exercises - Introduction to XML

From Juneday education
Jump to: navigation, search

Work in progress

This page is a work in progress. Remove this section when the page is production-ready.

Introduction

This chapter contains exercises for the introduction to XML chapter. There isn't much to do with plain XML so the exercises move straight on to parsing XML from Java.

If your application uses XML as a data transmission format (reading XML files from the file system or the internet), you will need to know the basics for how to parse (read and create a Java data type object) XML documents. We'll use the DOM (document object model) from W3C to parse and create Java objects from XML.

With the Java API, you get the org.w3c.dom package, which contains interfaces and classes for various types of documents. The idea is to create a Document object from the XML file (or stream or text) so that we can traverse the document tree and use the data inside Java.

It is important that you read up on the DOM (document object model), so that you easier can understand how the parsing using the org.w3c.dom package works. The previous chapter has links and some lecture slides which explain the basics.

If you feel that you have a basic idea of how the document abstraction used by DOM works, you should be able to follow the instructions in the tasks of the exercise below.

Exercise - parse an XML file and print out the contents

The purpose of this exercise is to give you the basic tools and skills needed to receive and use an XML document in a Java application. There are many APIs for using XML around but we are sticking to standard Java using the API in the org.w3c.dom package.

You will have use for the stuff you learn here, even if you will use a third party API later on for parsing XML. The basic idea is very similar and built around the document object model (DOM) described by W3C.

Task 1 - Download the XML file to parse

As usual, create a fresh directory for the exercise and cd to that directory.

Download users.xml from our github repo.

Optional step for task 1 - validate the file using xmllint

If you want to, you can install xmllint in order to check the xml file. In Cygwin, xmllint is inside the libxml2 package. Start the Cygwin installer and find the application xmllint and install it. In Ubuntu, you can find it in the package libxml2-utils.

Ubuntu installation:

$ sudo apt-get install libxml2-utils

Use xmllint to verify that the xml file you downloaded is well-formed. If it is, it should print it to the standard out without any error messages to standard err.

$ xmllint users.xml 
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<users>
  <user>
    <name>Kalle Anka</name>
    <email>donald@email.dt</email>
    <username>donaldd</username>
  </user>
  <user>
    <name>Joakim von Anka</name>
    <email>scrooge@email.dt</email>
    <username>onkelscrooge</username>
  </user>
  <user>
    <name>Arne Anka</name>
    <email>arne@email.com</email>
    <username>arneanka</username>
  </user>
</users>

You can try to create a corrupt XML file in order to see what an error message looks like for a non-well-formed XML file:

$ cat bad.xml 
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<users>
  <user>
    <name>Kalle Anka</name>
    <email>donald@email.dt</email>
    <username>donaldd</username>
  </user>
</Users>
$ xmllint bad.xml 
bad.xml:8: parser error : Opening and ending tag mismatch: users line 2 and Users
</Users>
       ^

The above was an example of how XML is case sensitive. The root element <users> is misspelled at the end tag as <Users> with a capital U.

Task 2 - create a Java application called Parser

Create a class named Parser with a main method. Import the following packages:

import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.ParserConfigurationException;
import org.w3c.dom.Document;
import org.w3c.dom.Element;
import org.w3c.dom.Node;
import org.w3c.dom.NodeList;
import org.xml.sax.SAXException;
import java.io.IOException;

Add a standard main method declaration.

Task 3 - add a try-catch to the main method

In the main method, add a try-catch catching all of the following exceptions:

  • IOException
  • SAXException
  • ParserConfigurationException

In the catch block, print an error message to standard error about not being able to parse the XML.

Expand using link to the right to see a hint.

try{

}catch(SAXException|IOException|ParserConfigurationException e){
  System.err.println("Error parsing XML: " + e.getMessage());
}

Task 4 - Create the document using a DocumentBuilder

Look at the PDF from the lecture Parsing XML From Java.pdf if you need hints for this task. Create and initialize the following vaiables as the first statements in the try block:

  • A reference to a DocumentBuilderFactory (using DocumentBuilderFactory.newInstance()
  • A reference to a DocumentBuilder (using the factory)
  • A reference to a Document (using the document builder to parse the xml file you downloaded)
  • A reference to a NodeList from the root element using the document and the method getDocumentElement().getChildNodes()

Now you have a reference to a list of all nodes inside the root element, that is, a list of all the <user> elements.

Task 5 - create a loop over all the user nodes

After the above (still inside the try block), create a for loop iterating over all the <user> elements.

You should use an int loop variable intialized to 0, loop as long as the variable is less than the node lists length (obtained by nodeList.getLength() if the node list variable is called nodeList). The loop variable should be incremented by one after each loop iteration. You will use the loop variable as the index of the node list.

Expand using link to the right to see a hint.

for (int i = 0; i < nodeList.getLength(); i++) {

}

Task 6 - print stuff out

Use the loop variable inside the for loop, to print out the current child node's node name, if it is a node of type ELEMENT_NODE.

To check the node type, you should use nodeList.item(i).getNodeType() == Node.ELEMENT_NODE as the condition for the if statement.

To get the node name, you should use nodeList.item(i).getNodeName().

Expand using link to the right to see a hint.

      for (int i = 0; i < nodeList.getLength(); i++) {
        if (nodeList.item(i).getNodeType() == Node.ELEMENT_NODE) {
          System.out.println(nodeList.item(i).getNodeName());
        }
      }

Verify that you get the following output:

user
user
user

What you have done now, is to iterate over all the children of the root element <users> of the document. For all nodes which are of type ELEMENT_NODE (nodes with children), you printed the name of the element, which was user.

Now, add a printout in the if statement, to investigate how many children this user node has:

System.out.println(nodeList.item(i).getChildNodes().getLength());

It should print 7. Why?

You may think of it like this:

The first child (item(0)) is the white space following the <user> element, which is the newline and indentation.

The type of the content is text. If we are interested in the <name> element, we could actually get that as the second child:

    <name>Kalle Anka</name>

And that element, has one child, of type text, which is what we are after.

Why this meaningless exercise, you say? We are trying to give you a feel for where in the document we are during the parsing. It might be surprising to learn that the white space is treated as a node of its own right, but now you know.

As we'll soon see, this doesn't have to be a problem, since we can always find nodes by their names.

Task 7 - Get the text contents from the data nodes of each user

Remove the printout in the if statement, and instead, still inside the if statement, create a reference of Element type, and assign it the reference of the current Node, but cast to an Element reference.

Now, use this element variable to find all its child nodes whose name is "name". It will be only one child with such a name, but since XML allows for several children with the same name, we must use this method (called getElementsByTagName("name")) and we'll get a node list back. Since there's only one element by the name of "name", we can be certain that the node we're looking for will be the one with index 0.

We'll use this fact in order to get the String of the node <name>'s text content by chaining some method calls together (assuming the element reference is called elem):

elem.getElementsByTagName("name").item(0).getTextContent();

Assign the result of the above expression to a String reference name.

Do the same for email and userName and print out the user data as a string to standard out.

Expand using link to the right to see a hint.

        if (node.getNodeType() == Node.ELEMENT_NODE) {
          Element elem = (Element) node;
          String name = elem.getElementsByTagName("name").item(0).getTextContent();
          String email = elem.getElementsByTagName("email").item(0).getTextContent();
          String userName = elem.getElementsByTagName("username").item(0).getTextContent();
          System.out.println("name: " + name + " email: " + email + " userName: " + userName);
        }

It might be interesting to know that there are several alternatives to the above syntax including:

          // long version
          // String name = elem.getElementsByTagName("name").item(0).getFirstChild().getNodeValue();

          // longer version
          // String name = elem.getElementsByTagName("name").item(0).getChildNodes().item(0).getNodeValue();
          // System.out.println(elem.getElementsByTagName("name").getLength());

Task 8 - Run your application and verify that it works

Running the application should render output along the lines of:

name: Kalle Anka email: donald@email.dt userName: donaldd
name: Joakim von Anka email: scrooge@email.dt userName: onkelscrooge
name: Arne Anka email: arne@email.com userName: arneanka

Expand using link to the right to see an example implementation of the parser.

import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.ParserConfigurationException;
import org.w3c.dom.Document;
import org.w3c.dom.Element;
import org.w3c.dom.Node;
import org.w3c.dom.NodeList;
import org.xml.sax.SAXException;
import java.io.IOException;

public class Parser{
  public static void main(String[] args){
    try {
      DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
      DocumentBuilder builder = factory.newDocumentBuilder();
      Document document = builder.parse("users.xml");
      NodeList nodeList = document.getDocumentElement().getChildNodes();
      for (int i = 0; i < nodeList.getLength(); i++) {
        /*
        if (nodeList.item(i).getNodeType() == Node.ELEMENT_NODE) {        
          System.out.println(nodeList.item(i).getNodeName());
        }
        */

        Node node = nodeList.item(i);
        if (node.getNodeType() == Node.ELEMENT_NODE) {
          Element elem = (Element) node;
          String name     = elem.getElementsByTagName("name").item(0).getTextContent();
          String email    = elem.getElementsByTagName("email").item(0).getTextContent();
          String userName = elem.getElementsByTagName("username").item(0).getTextContent();
          System.out.println("name: " + name + " email: " + email + " userName: " + userName);
        }
      }
    }catch(SAXException|IOException|ParserConfigurationException e){
      System.err.println("Error parsing XML: " + e.getMessage());
    }    
  }
}

Links

Source code

External links

Navigation

Up next: Creating XML from Java!

« PreviousBook TOCNext »