Warm-up with some Java surprises

From Juneday education
Jump to: navigation, search

Meta information about this chapter

Expand using link to the right to see the full content.

Ideas and sources

Some of the ideas for the topics brought up on this page come from various "Java Puzzles" (or "Java Puzzlers") available online and in books. Some sources:

Some examples on this page are heavily inspired by others, some are variations on popular themes, and some come from our experience from teaching Java, and common misconceptions we've spotted in various places. This page is a living document and might grow or get updated over time.

Introduction

In this chapter we'll try to get the reader started and well show some stuff about Java which might come as a surprise to the reader.

Purpose

The purpose of this chapter is to make the reader start thinking about Java and some rules and some syntax which the reader perhaps didn't know. It is not meant to be a very serious chapter with important-to-know facts about Java. We rather want to start with some interesting aspects about the Java language which might be fun to know for any one who wishes to learn more about the language.

Goal

Get the reader started and thinking about the way the Java compiler and runtime system works.

Instructions to the teacher

Common problems

Not every thing brought up in this chapter qualifies as something every Java student must know in order to be successful. But some of the topics here are quite interesting and at some occasion necessary to know in order to understand the Java environment fully. Some of the topics are often left out from introductory courses and literature, so we think it is worth addressing in this book, if for no other reason, in order to fill some of the gaps from other sources.

If the students don't fully grasp everything in this chapter, explain to them that this is not a central chapter with a lot of important facts, but rather meant to be a kind of "did you know this about Java" kind of chapter. Stuff like the constants pool where objects are being recycled to save some space isn't super-important to know, but it can be used as the basis for a discussion about references, immutability and comparing reference variables using the == operator, as opposed to using the equals method.

We think it might be helpful to view the examples in this chapter as fuel for discussions with the students, rather than important facts that they must understand and learn by heart.

Hopefully they will think it is fun to see the surprises (if they get surprised, that is, it is of course possible that some of them might already know the things shown in the chapter!).

Videos

All videos in this chapter:

See below for individual links to the videos. (TODO)

Unicode and Java

Printing unicode

We'll start with a simple example on how to use unicode escape sequences, to print a non-alphabetic character on standard out:

public class Unicode{
    public static void main(String[] args){
	System.out.println("Warning: \u2622");
    }
}

Running this example should print "Warning ☢" to your terminal (if it knows unicode). The syntax for a unicode literal as an escape sequence starts with a backslash followed by a u followed by a unicode table number for the character. The "translation" happens in compile time, because a Java source code file could be written completely in unicode and the compiler "understands" unicode. This means that we could actually write a complete program using only unicode escape sequencies! It is not something we recommend, but it kind of proves the point that the compiler has no problem understanding unicode escape sequences. Of course, all the normal rules about syntax applies, so for instance the \u2622 sequence can only occur inside double quoutes because ☢ has no meaning in the Java language. We cannot use ☢ as a variable name, for instance.

Since we could write a complete program using only unicode escape sequences, you could actually type the following in and save it as Hello.java:

\u0070\u0075\u0062\u006c\u0069\u0063\u0020\u0063\u006c\u0061\u0073\u0073\u0020\u0048\u0065\u006c\u006c\u006f\u007b
\u0020\u0020\u0070\u0075\u0062\u006c\u0069\u0063\u0020\u0073\u0074\u0061\u0074\u0069\u0063\u0020\u0076\u006f\u0069\u0064\u0020\u006d\u0061\u0069\u006e\u0028\u0053\u0074\u0072\u0069\u006e\u0067\u005b\u005d\u0020\u0061\u0072\u0067\u0073\u0029\u007b
\u0020\u0020\u0020\u0020\u0053\u0079\u0073\u0074\u0065\u006d\u002e\u006f\u0075\u0074\u002e\u0070\u0072\u0069\u006e\u0074\u006c\u006e\u0028\u0022\u0048\u0065\u006c\u006c\u006f\u0022\u0029\u003b
\u0020\u0020\u007d
\u007d

The above is the following program written in unicode escape sequences:

public class Hello{
  public static void main(String[] args){
    System.out.println("Hello");
  }
}

For instance, "public class Hello" is written

\u0070\u0075\u0062\u006c\u0069\u0063\u0020\u0063\u006c\u0061\u0073\u0073\u0020\u0048\u0065\u006c\u006c\u006f
.

Now, remember that the same syntax rules must apply even if we use this strange coding of the characters in a Java source code file. Can you figure out why the following program won't compile?

public class DoesNotCompile{
    public static void main(String[] args){
	System.out.println("Unicode radiation warning: \u2622");
	System.out.println("This is unicode too: \u000a");
    }
}

Hint: The compiler warning is:

$ javac DoesNotCompile.java 
DoesNotCompile.java:4: error: unclosed string literal
	System.out.println("This is unicode too: \u000a");
	                   ^
DoesNotCompile.java:4: error: unclosed string literal
	System.out.println("This is unicode too: \u000a");
	                                               ^
2 errors

Expand using link to the right to see the answer.

The program doesn't compile, because \u000a is the unicode sequence for newline. You are not allowed to have a newline inside a String literal and break up the line before you close the String literal.

That also explains the "unclosed string literal" error message. The line which produces the error could just as well have been written like this:

System.out.println("This is unicode too: 
");

You know that writing like that won't work. You cannot break an unfinished string literal with a newline!

A unicode translator - just for fun

Just for fun, we'll share with you a small program which translates a file to unicode escape sequences. If you run the program and redirect the output from it to a file, you will have a working source code written only in unicode.

You need to have the actual Hello.java file (from above) in the same directory, so that the program can read it. If you want to try it, we propose the following workflow:

$ ls
Convert.java Hello.java
$ javac Convert.java && java Convert > Hello2.java
$ mkdir test
$ mv Hello2.java test/Hello.java
$ cd test
$ javac Hello.java && java Hello

The above runs Convert and saves the output in a file called Hello2.java (because we don't want to overwrite the original Hello.java). Next we create a directory test and moves Hello2.java to test with the new name Hello.java. Then we cd to test and compile and run Hello.java. If you open the new Hello.java in the test directory, you'll see that it consists of only unicode escape sequences.

Here's the Convert.java program:

import java.io.*;

public class Convert{
    public static void main(String[] args) throws Exception{
	File f = new File("Hello.java");
	//File f = new File("Convert.java"); // of course we can convert also this file ;-)
	BufferedReader in = new BufferedReader(new InputStreamReader(new FileInputStream(f), "UTF-8"));
	String s=null;
	while( (s=in.readLine()) != null ){
	    for(char c : s.toCharArray()){
		System.out.print(String.format("\\u%04x", (int)c) );
	    }
            System.out.println();
	}
    }
}

Here's the Hello.java program again, which you need to have in the same directory as Convert.java:

public class Hello{
  public static void main(String[] args){
    System.out.println("Hello");
  }
}

Disclaimer: We do not encourage you to declare that the main method throws Exception (or that it throws anything). That's just sloppy. We did it only to save space so that you could focus on how the code works, and not on the try-catch which we would have had to use if we followed our own advice and didn't declare that main throws Exception.

This is what the program would have looked like if we stopped doing stupid things, and used try-catch instead:

import java.io.IOException;
import java.io.File;
import java.io.BufferedReader;
import java.io.InputStreamReader;
import java.io.FileInputStream;

public class BetterConvert{
  public static void main(String[] args){
    try{
      File f = new File("Hello.java");
      //File f = new File("Convert.java");
      BufferedReader in = new BufferedReader(new InputStreamReader(new FileInputStream(f), "UTF-8"));
      String s=null;
      while( (s=in.readLine()) != null ){
        for(char c : s.toCharArray()){
          System.out.print(String.format("\\u%04x", (int)c) );
        }
        System.out.println(); // Optional - to not have one long line for the whole program
      }
    }catch(IOException ioe){
      System.err.println("An IO exception occurred: " + ioe.getMessage());
    }catch(Exception e){
      System.err.println("An exception occurred: " + e);
    }
  }
}

Some notes on the System.out.println() statement. It prints a newline character after every source code line. This is optional, because the program source code would work just as fine without any newline characters. That's what the separators are for, in the Java language. It is allowed to add any amount of white-space between statements and declarations (including no white-space).

Java knows that a semicolon ; separates two statements, and that curly braces delimit a block of code. This is why it works to write a whole program on one single line. We do not recommend that coding style, however. But if you decide to write the whole program as unicode escape sequences, we don't think doing so on one single line makes it much worse.

A strange loop?

Can you figure out (predict, not compile and run!) what the following program prints to the screen?

public class StrangeLoop{
    public static void main(String[] args){
	int j=0;
	for(int i = 0; i<100; i++){
	    j = j++;
	}
	// What will it print? Why?
	System.out.println("j: " + j);
    }
}

If you answered "0", good! Now, can you explain why j is still zero after doing one hundred assignments in the loop?

Let's break down this mystery to conclude that if j is assigned j++ a hundred times, and stops at zero, then it must have been assigned zero every time. So, why is j assigned zero in the loop?

At the start of the main method, j is assigned zero. Then there is an assignment in the loop which is evaluated to zero every time. This means that j++ is evaluated to zero. We'll that might not be so surprising the first time, because we know that the postfix increment works in a way that it is evaluated to the current value of the operand, and only then a side-effect of an increment takes place.

But what really happens is this: j++ is evaluated to zero (the original value of j which was zero). Then the increment takes place. But this happens before the assignment is performed. The assignment uses the value of the right-hand side which we concluded was zero.

You could think of it like this:

  1. j is initialized to zero
  2. j is assigned the value of j++ which is evaluated to zero
  3. before the assignment is executed, j is incremented to one
  4. finally the actual assignment of zero to j is executed
  5. after the statement, j has been assigned zero again

Calling a method using a null reference

We know that trying to use a reference which is null as if it were a reference to an object will create a NullPointerException, right? The null value means "not any object", so surely we cannot call a method using such a reference variable.

This means that the following program would crash with a NullPointerException:

public class NullPointer {
    public static void main(String[] args) {
	execute(null);
    }
    public static void execute(Test obj) {
	obj.print();
    }
}
class Test {
    public static void print() {
	System.out.println("hello");
    }
}

Only, it doesn't crash:

$ javac NullPointer.java && java NullPointer
hello

Can you explain why?

Expand using link to the right to see the answer.

Only if the print() method would have been an instance method, a NullPointerException would have been thrown by the runtime system. But a closer look at the Test class shows that the print() method is a static method (a so called class method). We know that static methods exist regardless of instances as they belong to the class and can be called directly via the class name.

For this reason, the null reference is never used in order to try and find an object. When execute(Test obj) is called with a null literal, indeed obj is assigned null. But that doesn't matter, because it is enough for the runtime system to look at the type of obj, which is "reference to Test", in order to find the static print() method.

Some words of advice. If you thought a NullPointerException would be thrown, you are probably not alone. The reason is that at a first glance, we assume (incorrectly) that print() is an instance method, so it can't work with a null reference.

Why do we make the assumption that print() is an instance method? Because it is called via a reference variable and not via the class name. A better way to write the code above would have been Test.print(). Since, Test is a class name, we instantly would have seen that print() must be a static method. We would also instantly have seen that no NullPointerException could be thrown, since we are accessing print() directly via the class name.

Our advice is to never use a reference variable to call a static method, even if it works fine (even with null references!). The reason is simply that you will give the wrong impression to people reading your code, which is not a good thing (unless you try to write a trick-question for readers of a book or so).

Constant confusion?

We've seen in introductory literature and introductory courses that strings are just plain objects, and a String variable is of type "reference to object of class java.lang.String". We have also learnt that the fact that String is a class in the java.lang package is convenient, because classes in that package don't require an import statement.

We've also learnt that we can create a String object using the normal syntax with the operator new like in String name = new String("Some name"); but for convenience, there is also the possibility to create a String object like this: String name = "This is my name"; .

Furthermore, we've learned that when checking if two String objects are representing the same sequence of characters (the same "text"), we should use the equals() method which String has inherited from Object and overridden. Using the operator == always means value in Java, and with reference variable the only value is the reference (which is like an address to the heap memory space in the JVM). Using == between two String variables, is risky because it only compares the reference address, which means it will be evaluated to true only if the two references really refer to the exact same String object in the heap memory.

But why then, if using == between two references (of any type) always and only checks memory address equality, why does the following evaluate to true?

String first  = "Abba";
String second = "Abba";
// The boolean variable is assigned the value true below:
boolean areTheSame = (first == second); // true!

We are not using the equals, as we've been told. But still comparing the two references using == is evaluated to true, so the addresses must be the same then. Why is that?

The reason is that String objects created via the double quote convenience construct, are recycled. If we create one String object like this "Abba" (without the new operator), any consecutive creation of a String also using "Abba" (without the new operator) will be evaluated to the same memory address as the first "Abba". Only one String will be created, and it will be re-used.

This sounds very dangerous! What if we have the following code, won't all references be affected by the change we try to do?

String popBand = "Abba";
String someOtherBand = "Abba";
popBand.toUpperCase();
// What about someOtherBand? It references the same object!

Do you know what the value of someOtherBand would be, if the above was compiled and executed?

Expand using link to the right to see the answer.

The variable someOtherBand would still refer to the same object, and the text of that object would still be "Abba"

Do you know what the contents (the text) of the variable popBand would be after the call to toUpperCase()?

Expand using link to the right to see the answer.

The variable popBand would still refer to the same object, and the text of that object would still be "Abba". No instance methods called on a String object can change the text the object represents!

The reason for this outrage(?) is that String objects are said to be "immutable". Immutable objects are objects whose state (whose fields or instance variables) cannot ever change once they are constructed. They get their initial state (values) when they are constructed (via parameters passed to the constructor), and they stay the same until the object dies (is garbage collected).

This is actually a good thing, because it means that it is perfectly safe to let two reference variables refer to the same object. None of the reference variables can be used to change the contents (the state, or the instance variables) of the object, so the object can safely be re-used and shared throughout your entire system.

This thing, with putting objects in a pool for reuse, is not unique to String. The same actually applies to some of the wrapper classes for the primitive types! But with some limits. The following creates two Integer reference variables, but only one Integer object (which both variables will refer to). Integer, just like String, is immutable, so this is perfectly safe:

Integer hundred = 100;
Integer cien    = 100;
// Now, it is true that (hundred == cien) - they refer to the same object!

Like with String references, this behavior is certain unless the operator new is used for creating the Integer objects. But Integer objects are only recycled to a certain limit:

Integer first = 128;
Integer last  = 128;
System.out.println( (first==last) ); // Prints "false"

If you think it is strange that we can assign a reference variable of type "reference to instance of Integer" an int literal, then you should look up "autoboxing" and "unboxing". What happens is that when the compiler sees an assignment of an int literal to an Integer reference variable, it actually replaces the int literal with a call to Idnteger.valueOf(the_literal_int_value). It is the static method Integer.valueOf() which does the caching of the Integer reference for reuse, actually.

Both Short and Integer caches values between -128 and 127. Character caches autoboxed values between 0 and 127 (\U0000 - \U007F). Boolean's only two values are cached too:

    Integer i1 = 127;
    Integer i2 = 127;
    Short s1 = 127;
    Short s2 = 127;
    Boolean b1 = false;
    Boolean b2 = false;
    Character c1 = 'a';
    Character c2 = 'a';
    System.out.println(i1==i2); // true
    System.out.println(s1==s2); // true
    System.out.println(b1==b2); // true
    System.out.println(c1==c2); // true

Is this important to know? Not very, but it might be interesting to think about the consequences! For instance, if the primitive wrapper classes Boolean, Short, and Integer, as well as String, wouldn't have been immutable, it wouldn't have been possible to cache values like Java does. If they'd been immutable, anyone with a reference to a compile time constant String could have changed the text value of the String, so the cache would have been corrupted.

Point of no return?

You have probably heard that when execution of a method reaches a return statement, execution leaves the method and no code beyond that point can execute. You have probably even heard that you are not allowed to have a statement after a return statement, because the compiler will give you an error about unreachable statement. All this is true in a way, but there is actually one exception to this rule.

Challenge: Write a simple method public static int five() which simply returns the int value 5 each time it is called. It can have only one return statement, and it should return 5. But you should also make the method print "And now this!" each time the method is called, but the println statement is not allowed to occur before the return statement.

In other words, write the method so that it prints the message even though the println statement occurs below the only return statement.

Expand using link to the right to see the answer.

This actually works, with the help of the try-finally construct, with the return statement in the try-block and the println statement in the finally block. Note that the finally block will get executed before the method returns (but the value to be returned (5) will be evaluated before the finally block. The return statement must know the value to return, and it is then put on hold, so to speak, until the finally block completes.

public class AfterReturn{
  public static int five(){
    try{
      return 5;
    }finally{
      System.out.println("And now this!");
    }
  }

  public static void main(String[] args){
    System.out.println(five());
  }
}

/*
Test-run:
$ javac AfterReturn.java && java AfterReturn
And now this!
5
*/

Now, the challenge question was a little tricky. You can in fact not have a statement below the return statement in the same block as the innermost block of the return statement. The trick was to put return in a try-block, and the println in the finally block. Note that this also shows a use of try without catch. Normally, releasing resources should be done in this way, in the finally block. Allowing the try-finally without a catch block, allows us to do cleanup in one place even if there are no exceptions to handle.

Think for instance of a situation where you have two return statements in an if-else construct. If you need to release some resource or do some cleanup, you would have to duplicate code in both the if-branch and the else-branch. In theory you can use the finally instead like this (pseudo code):

someMethod(){
  openResource();
  if(conditionUsingResource()){
    closeResource()
    return STATUS_OK;
  }else{
    closeResource();
    return STATUS_ERROR;
  }
}

If the condition check depends on the resource being open, you must close the resource in both of the branches of the IF-ELSE. Using try-finally, the above now becomes:

someMethod(){
  try{
    openResource();
    if(conditionUsingResource()){
      return STATUS_OK;
    }else{
      return STATUS_ERROR;
    }
  }finally{ closeResource() ; }
}

Note that the return value is calculated before the finally executes, then the JVM lets finally finish before using the return value when transfering control to the invoker of the method.

Of course, nowadays, one would use try-with-resources instead when guaranteeing that a resource is closed:

public class AfterReturnResources{
  public static final int OK    = 0;
  public static final int ERROR = 1;
  public static int check(){
    try(Resource r = new Resource()){
        if(r.isOpen()){
          return OK;
        }else{
          return ERROR;
        }
    }finally{
      System.out.println("This runs after close() but before return!");
    }
  }

  public static void main(String[] args){
    System.out.println(check()==OK?"Open works" : "Open failed");
  }
}
class Resource implements AutoCloseable{
  boolean isOpen;
  public Resource(){ isOpen = true; }
  public boolean isOpen(){ return isOpen; }
  @Override
  public void close(){
    isOpen=false;
    System.out.println("Closing up");
    }
}

The code uses a try-with-resources which creates a Resource instance which is used in the IF-statement. The IF-statement checks if the resource was opened. If it was, it returns OK, otherwise ERROR. Try-with-resources only work on AutoCloseable classes, but implementing AutoCloseable is as easy as implementing a close() method (the only method in the AutoCloseable interface).

The logic of the check() method now becomes:

  1. Create a Resource and let r refer to it. This will be closed after the try block completes.
  2. Check if r.isOpen()
    1. Return OK if it is
    2. Return ERROR if it isn't
  3. Close r
  4. Execute the explicit finally-block
  5. Transfer control to main, where the call to check() gets the value OK or ERROR

Static in the attic

What does the following small program print?

public class Strange{

  static String name  = readName();
  static String title = "Doctor";

  static String readName(){
    return title + " " + Strange.class.getName();
  }
  static void printName(){
    System.out.println(name);
  }

  public static void main(String[] args){
    printName();
  }
}

Expand using link to the right to see the answer.

This is what the program prints:

$ javac Strange.java && java Strange 
null Strange

The reason is that the static variables are intialized in the order they occur in the source code:

  static String name  = readName();
  static String title = "Doctor";

  static String readName(){
    return title + " " + Strange.class.getName();
  }

Since readName() is called before title has been initialized, title is still null when it is used by the method. These things can of course lead to NullPointerExceptions if one is not careful.

The same applies to instance variables:

public class Carter{
  String name = readName();
  String title= "Agent";

  String readName(){
    return title + " " + this.getClass().getName();
  }
  public static void main(String[] args){
    System.out.println(new Carter().name);
  }
}

/*
Test code:
$ javac Carter.java && java Carter
null Carter
*/

Here, the order of the source code initialization of the instance variables are important, too, because before the constructor is run, any explicit initialization of the instance variables are performed, again, in the order as they occur in the source code file.

It doesn't help to put the initialization of the title inside the constructor (and where in the source code file you put the constructor doesn't matter). Using an initializer helps, but only if it occurs before the explicit initializations:

public class Carter{
  {title = "Agent";}
  String name = readName();
  String title= "Agent";

  String readName(){
    return title + " " + this.getClass().getName();
  }
  public static void main(String[] args){
    System.out.println(new Carter().name);
  }
}
/*
$ javac Carter.java && java Carter
Agent Carter
*/

The same principle applies to static initializers. They are executed in the order they occur:

public class Carter{

  static{ title = "Agent"; }

  static String name = getName();
  static String title= "Agent";

  static String getName(){
    return title + " " + Carter.class.getName();
  }


  public static void main(String[] args){
    System.out.println(name);
  }
}
/*
$ javac Carter.java && java Carter
Agent Carter
*/

What might look strange is the assignment of the title variable in the static block before the title variable is even declared. The rule is that in a static block, a variable can be assigned before it is declared, but not used as a value. This will not compile:

public class Carter{

  static{ title = "Agent"; name = title; } // using title (as a value) before it's declared

  static String name = getName();
  static String title= "Agent";

  static String getName(){
    return title + " " + Carter.class.getName();
  }
  
 
  public static void main(String[] args){
    System.out.println(name);
  }
}
/*
$ javac Carter.java && java Carter
Carter.java:18: error: illegal forward reference
  static{ title = "Agent"; name = title; }
                                  ^
1 error
*/

But what if we make title a final static (a constant) variable? Will this work?

public class Carter{

  static String name = getName();
  static final String title= "Agent";

  static String getName(){
    return title + " " + Carter.class.getName();
  }
  
  public static void main(String[] args){
    System.out.println(name);
  }
}
/* It works :)
$ javac Carter.java && java Carter
Agent Carter
*/

This works, since final (constants) variables are initialized before any other variables. It doesn't matter anymore that name is declared and uses (via getName()) the variable title textually before the title is initialized in the source code. Because when the class is loaded, the class constants are initialized before any normal class variables. The same happens with instance variables that are final:

public class Carter{
  String name = readName();
  final String title= "Agent";

  String readName(){
    // title is final, so it is already initialized!
    return title + " " + this.getClass().getName();
  }
  public static void main(String[] args){
    System.out.println(new Carter().name);
  }
}
/* It works :)
$ javac Carter.java && java Carter
Agent Carter
*/

Summary:

  • Variables are explicitly initialized (using assignment at the place of declaration) in the order as they appear in the source code
  • Initializer blocks can be put before a declaration in order to assign a variable
  • Initializer blocks can't use an undeclared variable as a value (i.e. the block appears before the declaration and cannot use it)
  • Constants are initialized before "normal" variables, so with constants the order they are declared doesn't matter to normal variables
    • It is safe to use a constant in a method, since it is guaranteed to have been initialized

Abuse of variable names

Will a class written like this ever compile?

package _;
class _ {
  _ _;
  _(_ _) {
    this._ = _;
  }
  _ _() {
    return _;
  }
}

Expand using link to the right to see the answer.

Yes, it will actually! If it is placed in a directory named _ (an underscore).

Here's an example showing that it works:

$ cat _/_.java 
package _;
class _ {
  _ _;
  _(_ _) {
    this._ = _;
  }
  _ _() {
    return _;
  }
}
/*
Compile:
$ javac -Xlint:none _/_.java
$

(-Xlint:none disables all warnings)
*/

But don't ever do that. Underscore is not a good name for

  • a directory
  • a package
  • a class
  • a method
  • a variable

;-)

The class above is syntactically and semantically equivalent to:

package a;
class a{
  a a;
  a(a a){
    this.a = a;
  }
  a a(){
    return a;
  }
}

Links

Further reading (external links)

Books this chapter is a part of