Exposing data over http FAQ

From Juneday education
Jump to: navigation, search

This page gathers frequently asked questions regarding the Exposing data over http assignments (the assignment consists of three sub-assignments). The page serves as a resource for students, teachers and supervisors and tries to explain some of the most misunderstood terms and concepts from the assignments.

We have a lecture going through some of the topics here (Exposing data lab Terms and concepts.pdf - videos are coming!) too, but this page serves as the text version of that lecture.

Video lecture

A shorter recap of this page is available as a video lecture:

Web, HTML, HTTP

These closely related concepts are a bit confusing (just the fact that the acronyms look kind of similar sometimes) but it is important that you get some basic understanding of how they relate to each other and what the differences are. This section tries to create a clearer picture for you.

HTTP and Web server

A web server (like any server program) is a program which runs continuously. It is started with the hope that it won't crash or be needed to be restarted. A server offers some kind of service to other programs, so it should be "open for business 24/7".

The purpose of a web server is to offer files (or data) to many clients (clients here being other programs requesting files or data from the server). You can think of a server as a restaurant; the restaurant "serves" food to many "clients". The "client" requests some food, and the response of the restaurant is to serve the food.

A client to a web server is typically a web browser (Chrome, Firefox, Opera, Dillo, Lynx, w3m, Safari, Edge or Internet Exploder - sorry, we meant Explorer). However, as we will learn during the assignments, a client can be any program that knows how to request a resource from a web server.

So, how does the communication between a client and a web server take place? First of all, it takes place using the network (typically, the internet), and the rules are formalized in a so call "protocol" called HTTP. You may recognize "HTTP", since that is often the first part of the "URL" (address) you see in your web browser's address field.

HTTP stands for HyperText Transport Protocol, and specifies how a client should ask for some resource from a web server, e.g. a web page written in HTML markup, or some other file. It also specifies how the server's response to the request must look.

The data which is sent from the web server (if the request could be fulfilled), typically comes from a file on the same machine as the web server is running, or from a program which can create the data on-the-fly. Such data must be formatted in a standardized way, in order for the client program to be able to interpret and make sense of the data.

In the assignment, we use one such format called JSON, which is a standard for representing data about anything, in a format according to the syntax rules for JSON which are specified and openly available online.

The client can save such JSON data in a file on the computer where the client is running (perhaps your computer). If so, it is like fetching any file from the web server (a zip file, an image, a text document etc). But usually, JSON isn't used in this way. Rather, JSON is used simply as a carrier of data which is meant to be interpreted (parsed) by the client, so that the client can use the data for some purpose (perhaps visualize it, or use it for some calculations). In these cases, the client doesn't have to save the JSON in a file, but can use it directly for its application. This is very common in Android apps, like for instance the travel planner for the public transportation companies. Such a travel planner app fetches JSON with the data corresponding to the end user's destination and starting point and the route and times. The travel planner app can interpret (parse) the JSON and use it to present a route and travel plan for the end user.

The HTTP protocol also specifies information about the data the server sends in its response to a request. If, for instance, your web browser fetches the file located at the URL https://www.gu.se/digitalAssets/1314/1314204_google.html​, it means that the server should send a file called 1314204_google.html to the client.

In that file (which is a text file) there's code in the HTML "markup language" (HTML actually stands for Hyper Text Markup Language). The HTML "language" is used primarily for web browsers. Code in HTML describes the structure and contents of a document (a web page) and what links and images the document contains. The aim of such a document is to ultimately be presented to a human being. But we humans are not good at reading code, so we use a web browser for that and we let the web browser "parse" (interpret) the code and present a pretty page with structure, images and links and all, to us. We are used to web pages and the way they look, and the browser does the job of presenting them for us, using the HTML code.

But, since a web browser can be used to fetch a variety of file formats, how does it know that it got HTML back from the server? Using the URL above, how can your browser know that the file from gu.se's web server contains HTML markup? You might guess that it looks at the file suffix .html but that is not the case. Any file with any name can contain HTML. And data sent from a web server doesn't even have to come from a file! It might come from some application that simply writes the HTML code as a stream of data to be sent by the web server to its client.

So, no, it is nothing to do with the file suffix. The answer is that HTTP stipulates that a response from a server to a client should include meta data about what content type the response has. Such meta data is sent as part of the response in the first lines of the response called the "HTTP headers". From this we learn that HTTP says that a response from a server to a client has two parts; the headers with meta data and the actual data. This is lucky for us, because it allows our web browsers (and other clients) to know how to parse and present data from a web server.

Here's an example of what the headers of the response from gu.se's web server could look like:

200 OK
Connection: close
Date: Fri, 16 Feb 2018 14:44:28 GMT
Via: 1.1 varnish (Varnish/5.2)
Accept-Ranges: bytes
Age: 245
ETag: W/"f4b1c-756-48ea0c9cfbac9"
Server: nginx/1.12.2
Vary: Accept-Encoding
Content-Length: 1878
Content-Type: text/html; charset=UTF-8
Last-Modified: Wed, 25 Aug 2010 07:25:38 GMT
Client-Date: Fri, 16 Feb 2018 14:44:28 GMT
Client-Peer: 130.241.151.114:443
Client-Response-Num: 1
Client-SSL-Cert-Issuer: /C=NL/ST=Noord-Holland/L=Amsterdam/O=TERENA/CN=TERENA
SSL CA 3
Client-SSL-Cert-Subject: /C=SE/L=G\xC3\xB6teborg/O=G\xC3\xB6teborgs
universitet/CN=www.gu.se
Client-SSL-Cipher: ECDHE-RSA-AES128-SHA256
Client-SSL-Socket-Class: IO::Socket::SSL
Grace: none
Title: Cookie-information, Google
X-Backend: tcsession: tcsession
X-Varnish: 5385451 4218782

Look at the line in the headers saying Content-Type: text/html; charset=UTF-8. That line is the one that tells your web browser that it should expect HTML in the content of the response, so that your browser can present it to you as a nice looking web page. Not that the example file is neither nice-looking, nor advanced, but feel free to look at the code for the web page and compare that to what your browser is showing you! The source code of the web page (most browsers have a menu or right-click option for displaying that) is exactly what the web server at gu.se sent you. The page you see in your web browser is that code parsed and rendered as a web page (or web document).

Going back to the assignments, when we request a URL from the web api (the servlet running inside Winstone), the web server part of Winstone is sending the equivalent header row about content type, but the type is specified as application/json instead. This is to make sure that no web browser tries to interpret the data as HTML (which wouldn't work, since JSON looks nothing like HTML).

Winstone

About Winstone, we think we could have been more clear about what that program is. Here's our attempt to fix the situation. Winstone is a Java application acting as a web server. The main task for Winstone thus is to run continuously and listen for clients requesting some resource using the HTTP protocol. When a client sends a request for a resource, e.g. a file, Winstone's task is to locate the file somewhere in its so called "webroot" directory (or one of the sub-directories to the webroot directory). The request must contain a relative path to the file or resource, relative to the webroot directory (whatever that directory is called).

Let's take an example:

A client connects to Winstone and uses HTTP to request a file using this HTTP request:

GET /search.html

This means that Winstone should look in its webroot directory (which is specified when Winstone is started) for a file called search.html. The file name comes directly after the / (slash), so the file should be located directly in the webroot directory. The relative path was calculated by your browser by looking at the URL you entered in the browser's address field: http://localhost:8080/search.html .

So, the thing that comes after the address part of the URL (in this case after the port number 8080) is a slash. That slash is like an alias for "the webroot" in the file system known to the web server. You told Winstone what directory to use as webroot when you started it (or the script told Winstone if you started it using a script). The thing after the first slash is either a file in the webroot directory, or a relative path using sub-directories. There is one exception, however, and that is the WEB-INF directory in Winstone's webroot. That directory is special and not accessible for clients. It is kind of secret to them (or at least hidden and off-limits).

Let's say that you had a directory called images/ inside the webroot directory, and in that directory, there were a file called Henrik.jpg, then you could have requested that file for viewing using your web browser. The URL you would have used, would then have been http://localhost:8080/images/Henrik.jpg. Your browser would have translated that into an HTTP request containing: GET /images/Henrik.jpg - that is, a relative path (from the webroot) to the image file. Your web browser would have got the file and displayed it to you (perhaps an image showing Henrik in some awkward pose). But, how the heck could your browser display an image from another computer (actually, your computer since you are using localhost, but to your browser that is the same, it doesn't know that)? The answer is that Winstone was reading the image file, bit by bit (piece-by-piece at least) sending the pieces to your browser. Your browser would have got first the headers and then the data, and the headers would include a line with something similar to Content-Type är image/jpeg and how many bytes the image data was. So your web browser would know that this isn't any HTML page, but a type of image. So the browser isn't only capable of parsing and rendering HTML, it also knows how to display a variety of image types. It knows how to read the bytes of e.g. a JPEG image and display it to the user.

So far, we've seen that Winstone acts pretty much like any web server. But what makes Winstone special compared to plain old web servers, is that we can configure it to interpret some relative paths as special paths that are NOT paths related to files in the webroot, but actually paths to some application capable of creating data. In Winstone's case, such applications are written in Java and called Servlets. In the assignment, we have configured Winstone such that paths starting with /search are not paths to files, but paths to a certain Servlet (included in the assignment source code). Whenever Winstone detects such a path (starting with /search, it packages the complete HTTP request form the client as a Java object (of class HTTPServletRequest) and calls the Servlet's doGet method with it as one of the arguments. The other argument is of class HTTPServletResponse. The Servlet will then investigate the request object to figure out what to do, and use the response object to write its response. The response from the Servlet will be picked up by Winstone which will forward it to the client using HTTP again, just as if it were Winstone doing all the work by itself. The client will never know that a Servlet was involved.

So, when would one want to use Servlets? When it comes to serving files (HTML, images etc), there is no need for a Servlet - any old web server can do that! It is when we need to send data that isn't stored on the file system, we need some help by some kind of programming, for instance a Java Servlet.

A common example of when we need some programming is a web site with a search function. Let's say you enter "Sport" into the search field of Google, or for that matter into the search field of gp.se (or some other newspaper web site), you will get a new page with links to all the pages containing the word "Sport". Our wiki has such a search function (try it out!).

The resulting web page (written in HTML) with all the links to pages matching the search phrase, can't already exist on the file system somewhere under webroot. This is because the people behind the site can't possibly know what words people will search for. People search for all kinds of weird stuff (just ask Google, and you'd be depressed hearing the answer).

Since we can't have lists of links to every possible search phrase, we need some kind of logic in an application for detecting the search phrase and create an HTML page on the fly with links to matching pages. Such logic can be implemented in a Java Servlet (or some other similar technology such as PHP, JSP, ASP, ASP.NET etc).

So, how would a Servlet, then, know what the search phrase was? When you entered the search phrase into the search box you did so on a web page written in HTML (using your browser's user interface). Look at the source code of the file search.html included in the webroot directory of your assignment, for an example of what the code for a search function could look like.

The HTML code with the form for searching contains instructions for your browser for where to send the search phrase and what to call the search phrase in the request to send to the server.

A simplified example of such a form in an HTML page could look like this:

<html>
<body>
<form action="/search" method=”GET”>
Search: <input type="text" name="search_word" />
<input type="submit" value="Search!">
</form>
</body>
</html>

The central parts of the code above are:

  • action (here having the value /search)
  • method (GET)
  • name (search_word)

If you entered "sport" into the search box, and clicked the button [Search!], then your web browser will transform that to the following URL to be sent to the same server it got the page with the form from:

/search?search_word=sport
HTML page with a search form

That URL contains a "parameter", a so called "GET parameter". A parameter is a key-value pair. In this case, the key comes from the HTML code in the form, search_word, and the value comes from the text you entered into the search box, sport. Your browser was kind enough to do the work. Browsers are indeed capable and versatile creatures.

The part of the URL that comes after the ? contains zero or more "GET paramters". When such parameters are part of a request (which is forwarded by e.g. Winstone to a Servlet), an application can get the values from the parameters. In our fictive scenario, let's say Winstone forwards the complete request to a search Servlet. Winstone has then package the full request from your browser into the HttpServletRequest parameter. Included in that object is now the parameter with the key search_word and the value sport

The Servlet, being written in Java, can now get the String representing the value for the parameter named search_word, the value being the String "sport". After that, the Servlet could look in all files in the webroot (except WEB-INF which is secret) for the string "sport" and create HTML links for those pages, and accumulate a full HTML page to write as the response. If for instance it finds a match in the page /sport/sportnytt.html, it would include an HTML link. e.g. <a href="/sport/sportnytt.html">Latest news from the world of sports</a> in its response web page, and so on for each matching HTML file it finds. The resulting HTML String can then be sent by the Servlet using its response object. The Servlet must, however, remember to set the content type to text/html before sending the response. This is so that your web browser will understand that the result of clicking "search!" is a new HTML page.

To you, looking at your web page, it will look as if the "Search" button was a link to a new page on the same server with the relative path /search?search_word=sport, but readers of this Wiki know that this URL doesn't refer to a file, but to a request for HTML that was created on-the-fly dynamically by a Servlet!

We have a chapter with a small example of a Servlet search function: Search Servlet .

Servlet

Technically, a Servlet is an object of a class implementing the interface javax.servlet.Servlet, often indirectly by extending javax.servlet.http.HttpServlet . You can view a Servlet as a Java program which can be called by a web server capable of running Servlets (Winstone being an example of which).

An HttpServlet (which the authors of this Wiki claim is the most common form of Servlet) is then a program which can "talk" (or handle) the HTTP protocol as if it were a web server. HTTP is used in a client-server architecture where the clients often are web browsers and the servler often is an HTTP server (commonly referred to as a "web server").

There are several Java methods realted to HTTP in an HttpServlet. The ones we've focused on in the assignments are void doGet(HttpServletRequest req, HttpServletResponse resp) and void init() .

void doGet(HttpServletRequest req, HttpServletResponse resp) gets its argument from e.g. Winstone whenever Winstone decides that a request from a client should be handled by a Servlet (and not represent a file in the file system). The request parameter contains all information from the original request the client sent to Winstone. The response parameter is an object with methods for setting headers (content type etc) and writing the actual response data (using a Java OutputStream). The doGet method is the method responsible for handling a request with the method type <code>GET in HTTP (which is the most common type of request). GET is used when you enter a URL in your browser and hit enter. We'll discuss other HTTP methods below.

The init() method is a method run only once, when the Servlet is first called by Winstone. In lack of better words, like a one-off constructor. Almost. In the assignment, we used init() to change the locale of the Servlet to English (because Rikard had written crap code for the creation of JSON which failed miserably on Swedish computers - since Swedes use a comma as the decimal point in texts). It was also in init() that we figured out what kind of ProductLine should be returned by the ProductLineFactory (a topic for Lab3!). In short, the init() method read an element in the settings file web.xml and set the value of product line to the system properties, to be read later by the factory. This means that we can change one line of code in web.xml to change between fake product line and a product line that uses the database. We don't have to recompile the Servlet, because of the init() method which reads the XML file and sets a property. And we don't need to recompile the factory, because it reads said property and decides which type of product line to return to the Servlet. We do have to restart Winstone, however, after changing the web.xml.

Anyway, the init() method is used for stuff like this. Global settings which should be set once (the first time the Servlet is called) and then be in effect the rest of the Servlet's lifetime.

An HttpServlet can also implement the method POST of the HTTP protocol (used for sending larger quantities of data without using the GET parameters). We don't use this in the assignment, but if we would, we would override the doPost() method we have inherited.

So, Servlets are used when we need to generarate an HTTP response on-the-fly, dynamically, instead of simply having the web server fetch a file from its webroot. Services on the internet use Servlets or similar technologies, which allows users of a service interact with the service via HTML pages. Those pages are the user interface (or front-end) which sends requests to a web server which then forwards the requests to e.g. a Servlet (or similar tech), and the Servlet talks to the web server which sends a response via HTTP just as always when web servers are involved. So, a page with a service for translating text from Swedish to English, for instance, could use a form in HTML for sending data to the web server, which forwards the text (as part of the whole request) to a Servlet. The Servlet could then do the translation from Swedish to English (using some clever Java code) in the doGet() method, and respond with HTML with the translated text.

The example with the translation Servlet is another example of the need for programming on the server-side. Clearly, we can't have static (already existing) web pages on the server with all possible translations, since we have no idea of what text the users want us to translate.

localhost:8080 - what's that all about?

You are calling your web service using the address localhost:8080 . That address, localhost, is used when one wants to connect to a server program on one's own computer, without actually using the network. You may think of localhost as if your browser goes to your network card as usual but bounces back to the computer immediately (without passing Go and without collecting $200). An alternative to using the address localhost, is to use the IP number 127.0.0.1 (also referring to your own computer - it is the number you get when you try to resolve the address named localhost).

When it comes to the 8080 part, that is a so called port number (see Introduction to network for a further explanation). A network card on a computer is usually contacted using a so called IP number (a.k.a. IP address), so that other computers can contact it. If the computer runs a server (a program listening for connections from other programs using the network card), that server program is using a port on top of the IP number on which it "listens". The combo of IP number and port allows us to have more than one server program on the same IP number.

Winstone is started listening to port 8080, which is a de facto standard for HTTP servers run by a normal user. The standard port for HTTP servers is 80, but port numbers below 1024 can only be listened on by applications started by the super-user (or administrator) account.

You may wonder why you never have to add :80 to standard URLs for standard HTTP servers then? That's because it's so standardized to use port 80, that your browser takes care of that part for you, so you don't have to know about ports in order to browse the internet (until you are tasked with our assignments!). But you can actually add :80 to your web URLs. Try it!

So, 8080 usually means "a web server started by a normal computer user". For a list of other standard ports, see Introduction to network.

The fact that you are using localhost:8080 doesn't mean that your web API isn't accessible via your network card from other computers. However, the wireless network at universities and other schools are often restricted so that computers on the same network aren't allowed to access one and other. That's for security reasons (probably). We did have (when the course where the assignment originally was presented) a network workshop, however, when you connected to the teachers' network without such restrictions. Those who attended that workshop could verify that they could access each other's web APIs (if they knew each other's IP addresses).

When you develop a server program or a network service, it is quite common to work with localhost before you are done. That allows you to use your own computer to access the server you are working on, which simplifies stuff. So you can start Winstone in one terminal and access it from another terminal, using wget, curl, nc or some other network capable program, in order to communicate with Winstone (and your web API).

JSON

JSON is a standard for text representing data. It is a very clever standard, allowing us to have hierarchical data structures represented as text, e.g. objects with lists of other objects. The basic JSON format is for objects which have key-value pairs of data. A key is the name of some part of the data object, and the value is the value of this part of the object. A value can, as mentioned earlier, be a list of JSON objects. Let's look at an example instead (proudly borrowed from Oracle's tutorial):

{
  "firstName": "Duke",
  "lastName": "Java","age": 18,
  "streetAddress": "100 Internet Dr",
  "city": "JavaTown",
  "state": "JA",
  "postalCode": "12345",
  "phoneNumbers": [
    { "Mobile": "111-111-1111" },
    { "Home": "222-222-2222" }
  ]
}

The JSON text above represents one object of data, a person it seems. The person object has a first name, last name, address, city, state, postal code and a list of phoneNumbers. The phone numbers are represented by a JSON array of two objects, each having a key and a value ("Mobile" and "Home" are the keys, and the values are various phone numbers as strings).

JSON allows us to transfer data from one system to another (or one application to another). Let's say that the first application reads data from the database and creates some objects from the database. In order to send the data to another application, the first application converts its objects to JSON and sends a string of JSON to the second application. Now, the second application can read the string with JSON and "parse" (convert) the JSON to some kind of objects. Perhaps, the second application presents the data to an end-user, or perhaps it does something else with the data.

JSON is a neutral way of sending data, in the sense that reading JSON from an application means that we don't care how that application created the JSON, only that it can send us JSON. We don't even know what programming language was used to create the application sending us JSON, only that it can send us JSON (if we ask politely for it).

And the sending application doesn't need to know how the application requesting JSON works. It always sends JSON text whenever some other application asks for it. So the JSON data is neutral - it can be used and created by various unrelated applications.

Another advantage of JSON is that it is only text, and enough verbose for humans to understand what the data is for etc. Humans can understand the data and make sense of it. We'll pass on a discussion of the difference between data and information here, ask your nearest Informatic for a definition. All we are saying is that data encoded in JSON carries enough information in the format itself for most humans to get information out of the data.

Parsing

To "parse" usually means to read data in a known format, check that the data really complies with the rules for that format and if so represent the data in some programming language using the datatypes of that language.

Looking at the JSON example above, we could write a Java program which reads the JSON text and checks that i is valid JSON. Then, if it were valid JSON, we could go on and create a Java object and use data which we extract from the JSON text. For instance, in Java, we could create a class Person with instance variables for name, address etc, and a List<Phone> for the phone numbers. Phone would of course also be a Java class written by us. This way, we would "parse" the JSON text into a Java object of class Person. The object would have Java types for its instance variables (String name, String address, List<Phone> phones, etc); Only after parsing the JSON, we can use the data in our Java program, since Java doesn't have a datatype for JSON which we could use to represent a person. We want to use our own Person class, and in order to create a Person object we need data in Java types (String, Phone, List etc etc).

What we achieved when we parsed the JSON to a Person object, was to allow us to use data from the world outside of our program, in our program, and represent that data using Java data types (classes like Person, Phone and String). That's parsing for you.

Another case of parsing is something you already have used, when you had a String representing a price, but wanted a double value. In the class java.lang.Double, you'll find a static method double parseDouble(String) which accepts a String and returns a double (if the String was possible to convert to a double value!). If the method cannot convert the String to a double (because the String for instance had the value "henrik"), then a NumberFormatException will be thrown. As the method name indicates, the method "parses" the String and converts it to a primitive double value. Similar methods exist in the classes Integer, Byte, Short, Long, Boolean, Float etc (the so called wrapper classes).

In the assignment, the Servlet is parsing the GET parameters of the request. Why is this also called parsing? Well, the parameters are simply strings of text from the outside world, which we need to interpret and convert to something of a Java class type, like a Predicate<Product>. The string of GET parameters represent the request for some products with certain properties (min_price and max_price, for instance). So the Servlet askes a parser to interpret the parameters and create a Predicate from them. The Servlet needs such a Predicate in order to call the getProductsFilteredBy(Predicate) method in the product line object. Only, the Servlet doesn't want to do the parsing by itself, so it makes use of an object which is an expert in doing exactly that.

When the Servlet gets the list of filtered products, it converts those products to JSON and writes that back in its response. Note that going from Java (a List<Product> for instance), to JSON, is not called parsing. Parsing works in the other direction - from one format to a Java type. To go from a Java type to some external format could perhaps be called "exporting", "converting" or even "serializing".

You will do some JSON parsing as part of Lab 3, when your Swing GUI fetches JSON from the Web API and converts that to a List<Product>, which can be displayed in the GUI.

That's a classic example of parsing!

Double.parseDouble vs. Double.valueOf

In the class Double we actually find two static methods able to parse a String, parseDouble and valueOf. Since they are parsing, they interpret something (in this case a string of characters) to something of a Java type. The former, parseDouble, converts (parses) the string to a double (the primitive datatype), and the latter, valueOf, parses the string to an object of class Double (representing the same value as the text does). And both of them will throw a NumberFormatException if the parsing fails.

You can use any of them in order to convert the string to a double (of any kind), because Java since a few years back will allow a Double object where there should be a primitive double value, and the opposite around. This automatic conversion between primitive value and wrapper class object is called boxing/unboxing, for those who like academic lingo.

Systembolaget's XML file

When introducing the three sub assignments, we showed you that we, the teachers, had fetched data from Systembolaget and that we didn't quite approve of the way this was done, so we wanted to create a Web API of our own, which would work better than Systembolaget's offering of data. The data from Systembolaget came in the form of one humongous XML file with all the 20 000 or so products. The file is updated every day (which is good) but we disliked the idea of having to fetch all or nothing in order to make an application using their data.

We don't disapprove of using XML as the neutral format for coding the data, but rather the fact that we as developers have to read all data into our application, every time. What if we want only a selection of prouducts?

But we can't ask Systembolaget for such a selection. We have to fetch everything and make the selection ourselves. This lead to the idea of having our dear students create a web API which we could ask for a selection of products according to some selection criteria. This is what perhaps Systembolaget should do, but we liked the idea of letting our students make something which is more useful than what the data owners themselves offer.

So, we came up with the exposing data assignments. We also chose to let you create JSON instead of XML, because JSON is becoming more common in web APIs than XML (even if many APIs offer both formats).

Also, the course where the assignments are given is about databases and JDBC too. That was a problem. So we quickly decided to hide the XML file from you and put the data from the XML file into a database. So the assignment source code and files include a full database which you will use in Lab 3.

However, it seems (judging by the questions and reports we've got) that our telling you about the XML file made some of you confused. So we'll now repeat what we said briefly in the introduction - Forget about the XML file! We have already parsed the XML file for you and put the same data in a database for you.

Now, we could have kept the XML file and used that instead of a database and still make a web API. But that wouldn't fit in well with a course that includes also databases, SQL and JDBC, and that's why we removed the XML file out of the equation. We apologize for confusing you by mentioning the XML file. It isn't really that important or interesting. Not in terms of solving or understanding the three assignments, anyway.

The use of Java interface in the lab

In the labs, there is a lot of use of Java interfaces. Both interfaces written by the teachers and included in the code you got from them, and interfaces from the standard Java API.

The first interface you ran into, was in the Servlet for the Web API, which needed a ProductLine to ask for products satisfying the predicate created from the GET parameters. ProductLine is an interface declaring two abstract methods:

You can refresh your knowledge about Java interfaces by returning to the following chapters:

The reason for using an interface for ProductLine, is to be able to hide the actual implementation class used for the two methods. By letting the Servlet have a variable of the interface type ProductLine, we can swap one implementing class for another at a later time, without having to change one line of code in the Servlet. This is because interfaces merely declares abstract methods (or default methods since Java 8) and cannot be instantiated (you cannot create new instances of the interface type, only of concrete implementing classes).

So the interface is only there to declare the two methods that classes must implement in order to qualify as a type of ProductLine. You wrote the class FakeProductLine, which is a class claiming to be a type of ProductLine, because the class was declared as:

public class FakeProductLine  implements ProductLine

When a class SomeClass is declared to implement SomeInterface, we know since the introductory course that

  1. we can use SomeInterface as a type for a reference variable referring to an object of the class SomeClass
  2. all concrete classes implementing SomeInterface have all the methods declared in SomeInterface implemented and ready for us to call

Applied to the case of ProductLine and the Servlet, the above means that the Servlet can have a reference variable of type "reference to ProductLine, regardless of what type the actual object the variable refers to, as long as the object is of some class implementing ProductLine - the methods getAllProducts() and getProductsFileteredBy(Predicate) will exist and be implemented in the object. It also means that the Servlet, when we get to the last assignment, Lab3, can keep its code and logic intact and exactly the same, even if the variable will refer to a different kind of ProductLine in runtime (when the servlet is called by Winstone). In fact, the variable will not refer to an object of type FakeProductLine, in the last assignment, it will instead refer to an object of type SQLBasedProductLine.

What we bought ourselves by using an interface (and a so called factory), was flexibility for future changes, letting us use different kinds of ProductLine implementations (from a fake one to one that talks to the database).

We wanted to show you that programming against an interface protects our code from future changes. But how can we change the type of the object the ProductLine variable refers to, without changing and re-compiling the Servlet code? This change is handled dynamically by the ProductLineFactory class. The Servlet assigns its variable some kind of ProductLine which it gets from the factory. Well, OK, but how does the factory know what type of ProductLine to offer its clients (clients as in code calling a method in the factory)? We'll look at that in the third assignment (Lab3) but if you really want to know, you can check the source code.

We are using a design pattern called Factory. You can read about some versions of that patterna in our Wiki in the Design_patterns_-_Factory chapter (there are videos, lecture slides and exercises there for the curious and ambitious student).

The use of Predicate in the lab

Predicate is an interface declaring only one single abstract method (such interfaces are called "functional interfaces" - they can be used as a function). Interface (both functional and other types) can declare something called "default methods". Such methods are actually not abstract but have implementations directly in the interface. You can revisit the chapters about interfaces (see links above) if you want to know more about interfaces in general.

The only abstract method in the functional interface Predicate is declared as:

public abstract boolean test(T t)

The T is a generic type parameter, since the interface was declared as

public interface Predicate<T>

This means that we can create predicates for a particular class, pretty much like we can create a List for a specific class, e.g. List<Product> . So we can have a Predicate<Product> . So, what's the use for Predicate, then? Well, we can have an object of some class implementing Predicate<Product> which then would represent a claim about products. Such a claim can be that "a product's price is less than 20.00". For such a claim we know that it either holds (is true) or that it doesn't hold (is false). We can use the method test(Product p) to test a single product against the claim. Let's say we call the method with a product as an argument. If the product really has a price less than 20.00, the method will return true, otherwise false. So thanks to Predicate<T>, we can encapsulate a claim about objects of type T in an object of type Predicate<T>. This is pretty clever and useful. It allows us to have objects for testing properties about other objects.

The Servlet needs to convert the GET parameters to a Predicate<Product>, because it is going to use the getProductsFilteredBy(Predicate<Product>). We have separate lectures about that, see the lab pages for links.

In short, the Servlet delegates the delicate task of converting the GET parameters to a Predicate to an object of class ParameterParser. Read about that class below.

The ParameterParser class

A great deal of Java code consists of methods creating objects and calling methods on these objects (sometimes referred to as "sending messages to objects") and these methods, in turn, call other methods on other objects etc.

At first, this might look odd and hard to follow. The reason it looks like this is that we are dealing with an object oriented language, Java. In Object oriented languages, you consider objects a central part and they have responsibilities and abilities. If you didn't write methods which call other methods, you would end up with one single method (like a five kilometer long main method). So, think about it, the alternative to methods calling methods isn't very attractive.

We could have, for instance, let the Servlet be the one and only class in the Web API with the doGet method the one and only method. But reading, maintaining, changing, extending that one and only method would have been a nightmare (on Elm Street - and nobody likes Freddy's surprises).

The strategy (or design if you prefer that) we used was to not let doGet do all the work. In stead, we break down the work the Servlet should do acting as a Web API into abstractions - one abstraction for each sub-task. So, what are the tasks for the Servlet?

  1. Interpret the GET parameters (what products is it the client wants?)
    1. Use a ParameterParser object for this! It should be able to convert the parameters to a Predicate<Product>
  2. From the interpretation of e.g. min_price=100&max_price=150, fetch all products that match the criteria
    1. Use a ProductLine and use the getProductsFilteredBy(Predicate)
  3. Format the product list as e.g. JSON
    1. Use a formatter object for this! It should be able to take a list of products and give a String back (with e.g. JSON)
  4. Write the JSON (or similar) to the response to the client

The class ParameterParser has a constructor accepting a String array with all the GET parameters. The capabilities (methods) of the object can do various things, the most central of which is the filter() which returns a Predicate<Product> object encapsulating the filter expressed by all the GET parameters combined.

So, the Servlet uses an object of class ParameterParser in order to get a predicate to use for asking the ProductLine object for all products satisfying the predicate. This replaces some 45 lines of code! So, let objects do the work!

In short, the doGet() method can now be expressed with a few abstractions:

    response.setContentType("application/json;charset="+UTF_8.name());
    PrintWriter out =
      new PrintWriter(new OutputStreamWriter(response.getOutputStream(),
                                             UTF_8), true);
    
    ParameterParser paramParser = new ParameterParser(request.getQueryString().split("&"));
    
    Predicate<Product> filter = paramParser.filter();
    
    ProductLine productLine = ProductLineFactory.getProductLine();
    
    List<Product> products = productLine.getProductsFilteredBy(filter);
    
    Formatter formatter = FormatterFactory.getFormatter();
    String json = formatter.format(products);
    
    out.println(json);
    out.close();

Category:FAQ