Java:Language - Streams and Lambdas

From Juneday education
Jump to: navigation, search

Introduction

This page introduces the Java 8 streams api and lambda expressions.

Stream of objects - a pipe dream

Let's say you have a collection and want to operate on each element. For this you can use a pipeline, or so called stream.

A pipeline is a sequence of elements and consists of

  • A source (an array, a collection etc)
  • Optionally, some intermediate operations (filtering, sorting, crunching)
    • produce a new stream/pipeline
  • A terminal operation which produces a normal result again (not a stream)
    • a primitive value (int, double etc)
    • a collection
    • even void (nothing, but with a side-effect, like println)

Streams (pipelines) are similar to shell (Bash) pipes (where std out is the stream typically). This is a well-known pattern called pipes-and-filters.

Example of pipeline (stream)

Perhaps things will become more clear with some examples. Let’s sort the arguments to a main method and then print all of them:

import java.util.*;

public class ArgSorter {
  public static void main(String[] args) {
    Arrays.stream(args) // create a stream (source)
      .sorted() // intermediate op, creates a new stream
      .forEach(System.out::println); // terminal operation
  }
}
# compare to pipes and filters in Bash:
$ cat file.txt | sort | cat        # actually "| cat" is not needed

In the example above, the terminal operation is a void method which is applied to all elements in the stream. First we get the stream of the args array (using Arrays.stream(args)), then we apply the filter sorted(), and finally we terminate the stream, using forEach() which accepts a void method accepting all elements as arguments.

A pipeline for filtering

We can also filter all elements in the args array of some main method, using the filter which is actually called filter() and which accepts a predicate for testing all elements passed through it in the stream:

import java.util.*;

public class ArgFilterer {
  public static void main(String[] args) {
    Arrays.stream(args) // create a stream (source)
      .filter(s -> s.startsWith("i")) // intermediate op
      .forEach(System.out::println); // terminal operation
  }
}
# Bash:
$ cat file | grep '^i' | cat   # actually "| cat" is not needed in bash

The filter() method accepts a java.util.function.Predicate<T> object. A predicate is an object which has the same function-type as an instance of a class implementing the java.util.function.Predicate<T> interface. It might be an object of a class which simply implements the Predicate interface (which declares only one abstract non-static method). But it can be any object which has only one method matching the only method in Predicate.

Lambda expressions

What you saw in the code above might look odd to you: s -> s.startsWith("i"). That is a lambda expression. A lambda is like an anonymous function with its syntax reduced to only its parameters and the return expression.

The method in Predicate<T> declares only one abstract method, which is boolean test(T t). So any method which takes a T as the only argument and returns a boolean value, will work as a Predicate<T>. Why? Because the contract is that interfaces with only one abstract method, so called functional interfaces, don't need to be implemented in order for us to use such a method as the only abstract method in the interface. A lambda expression is like expressing an instance of the method.

So, while filter() above is declared to accept a Predicate<T>, we know that the interesting method of the predicate is one that accepts a T parameter and returns a boolean value.

Method references

What about the terminal operation forEach(System.out::println)? That too doesn't look familiar, perhaps. In this case, the forEach() terminal accepts a Consumer<T>. The forEach() itself is a void method, so the terminal of the stream in our examples don't result in any values. All elements are consumed by the Consumer. What is a Consumer, then? Well, any method which is void and accepts a T as the only parameter will fit. We know about System.out.println already. We know that out is a static instance variable in System, and a reference to a particular PrintStream connected to "system out". Now, there exists an overloaded version of println() in PrintStream (and thus via the out reference) which consumes a String, which is the type of the elements in our stream!

So, we can actually pass a reference to exactly the println of the System.out reference, using a method reference: System.out::println. This is the consumer we pass to forEach.

Anatomy of a lambda

Let's dissect the lambda expression s -> s.startsWith("i"). How is it that this works in place of an object of a class implementing Predicate?

The method in Predicate we want to represent is, again, public boolean test(T t). The lambda starts with a parameter variable s. This will be the symbol of the paramter to the test() method. Then there's the weird arrow: ->. After the arrow, we have s.startsWith("i). This is the expression to return from the method. As you see, the s is used in the expression! But, startsWith() is a method in String? Yes, but as our Stream is a Stream of Strings, the predicate will be Predicate<String>, so the argument to test() will, too, be a String.

Here are some syntax pointers for the start of lambda expressions:

  • () ->
    • a method which takes no arguments
  • a ->
    • a method which takes a single argument
  • (a1, a2) ->
    • a method which takes two arguments

In place of the method body, we only put the expression of the return statement, for instance a -> a * 2. Here, a is an argument of type int and the return expression is the argument times two.

So, a -> a * 2 is a lambda which can be used to represent a method (function) which takes an int argument and returns the argument times two.

Longer lambdas - lambdas with bodys

What about methods which do something more than return a value? If you need more lines for the method body, you can use curly braces for the method block (body), and use statements as usual, terminated by semicolon. But then you need a normal return statement too:

SomeStringInterface ssi = a -> {
    if (a == null) {
      return "";
    } else {
      return a.toLowerCase();
    }
};

Example of lambdas for functions

Let’s say we have an interface defining only one method:

public interface CaseFixer {
  // returns the String with its first letter in upper case
  public String ucFirst(String s);
}

Now, we can use a lambda to represent an object which has such a method:

s -> Character.toUpperCase(s.charAt(0)) + s.substring(1);

Note, that the lambda represents an instance of for instance that interface. It might very well also be used to represent another functional interface with the same function-type! For instance, the lambda works also for instances of:

public interface StringStyler {
  // returns the String with its first letter in upper case
  public String capitalize(String s);
}

Why? Because both interfaces are functional interfaces (only one abstract method), and both interfaces' methods have the same structure - they return a String and accept a String as the only argument!

Here's an example showing how you could use a lambda to represent an instance of some class implementing the CaseFixer interface above:

public class Uc {
  public static void main(String[] args) {
    CaseFixer caseFixer =
      s -> Character.toUpperCase(s.charAt(0)) + s.substring(1);
    for (String arg : args) {
      System.out.println(fixIt(arg, caseFixer));
    }
  }
                                // will accept lambda here
  static String fixIt(String s, CaseFixer cf) {
    return cf.ucFirst(s);
  }
}

The method fixIt(String, CaseFixer) can be called with a String and a lambda, as long as the lambda works fine as a replacement for an object of a class implementing the CaseFixer interface (i.e., having the ucFirst(String) method, which is called inside fixIt(). The lambda we pass it, s -> Character.toUpperCase(s.charAt(0)) + s.substring(1); works fine as a symbol for an object with that method! Because the lambda also accepts a String argument, and returns a String. Who cares if the lambda represents a method (function) without stating the name of the method? The important thing here, is that the lambda will work fine as a stand-in for the ucFirst() method, which is called inside the fixIt() method of the example.

We don't need to declare the lambda as of type CaseFixer for this to work. This would also work just as fine:

System.out.println(
  fixIt(arg, s -> Character.toUpperCase(s.charAt(0)) + s.substring(1))
);

One single statement. That's how nice and neat lambdas are. They make our code shorter, more expressive and often clearer, once you get used to the syntax.

Lambdas for Comparators

Another functional interface that you already have seen and used (we guess), is the Comparator<T> interface. It declares only one abstract method: int compare(T t1, T t2) and you know the drill! It should take two arguments, compare them and return a negative number if the first argument is "less than" the second, zero if they are "equal" and a positive number if the first argument is "greater than" the second.

Let's look at a simplified class Book:

public class Book {
  private String author;
  private String title;
  private int year;

  public Book(String author, String title, int year) {
    this.author = author;
    this.title = title;
    this.year = year;
  }

  public String author() { return author; }
  public String title() {return title; }
  public int year() { return year; }
  public String toString() { return author + " " + title + " " + year; }
}

Just play along and pretend that that's a reasonable class (we've made it short and rather anemic for the sake of brevity and simplicity). Now, the Book class isn't Comparable<Book>, so a Book can't compare itself with another Book. So, if we want to e.g. sort a list of Book references, we need a Comparator<Book>. Here's a class for such a comparator:

import java.util.Comparator;

public class BookYearComparator implements Comparator<Book> {
  public int compare(Book first, Book second) {
    return first.year() - second.year();
  }
}

Let's put it to use!

  public static void main(String[] args) {
    List<Book> books = someBooks();
    System.out.println(books);
    Collections.sort(books, new BookYearComparator());
    System.out.println(books);
  }

Looks familiar? It should!

But, Comparator<T> is a functional interface, so we are allowed to use lambdas here! That's great, since we don't have to create the full class above, just to get access to one single method (compare()). We don't even have to create an anonymous local class with only the method. We can simply do this:

  public static void main(String[] args) {
    List<Book> books = someBooks();
    System.out.println(books);
    Collections.sort(books, (b1, b2) -> b1.year() - b2.year());
    System.out.println(books);
  }

That's all we need, in order to sort a list of books with regards to their year.

Method references revisited

But, what about method references? We see above that Book defines a method year(). That's really what we want to use for comparison, isn't it? Now, Comparator has a static method which accepts a Function and returns a Comparator which uses that function as the key for comparisons. We can call that static method with a method reference as the argument:

  public static void main(String[] args) {
    List<Book> books = someBooks();
    System.out.println(books);
    Collections.sort(books, Comparator.comparing(Book::year));
    System.out.println(books);
  }

That was kind of nice, wasn't it? Here are some types of method references:

  • Reference to a static method: ContainingClass::staticMethodName
  • Reference to an instance method of a particular object: containingObject::instanceMethodName
  • Reference to an instance method of an arbitrary object of a particular type: ContainingType::methodName
  • Reference to a constructor: ClassName::new

In fact, the compiler will translate a method reference to a lambda. So, you can use method references (if you find a suitable method) in place of a lambda.

IntStream

There's a very handy stream for ints in java.util.stream.IntStream which we can use instead of loops over ints.

Here's a small example:

import java.util.stream.IntStream;

public class Infinity {
  public static void main(String[] args) {
    IntStream.iterate(0, i -> ++i)
      .filter(i -> i % 2 == 0)
      .limit(10)
      .forEach(System.out::println);
  }
}

We use the method iterate(int seed, IntUnaryOperator f) as one way to create an int stream. It works like this:

  • first argument is a starting point, like in our case 0
  • second argument is a IntUnaryOperator which works as an operator applied to an int

In our case, we start an int stream at 0, and gives the iterate() method the operator created by the lambda i -> ++i. That operator will take the starting point (in our example 0) and apply the operator to it. If 0 is the operand i then the result will be ++i (which will be evaluated to 1. Next iteration i will be 1 and ++i will be evaluated to 2, etc, etc. We have created an infinite stream of ints from 0 and up, each int increasing with one in every step.

Now, we can filter our int stream with the usual filter() stream method. Our filter takes a lambda i -> i % 2 == 0 which is a predicate lambda for "the int argument is an even number". Now we have an infinite stream of all even numbers. This will take some time to print to the terminal, so we use limit(10) as a new filter. This means that we stop after we've got a stream of the first 10 even numbers. Then we apply the terminal operation forEach(System.out::print) which will consume our 10 even numbers and pass them to println:

$ javac Infinity.java && java Infinity
0
2
4
6
8
10
12
14
16
18

This means that we can do some very interesting stuff with integers, without writing any loops. Let's say that we want to know what the first 20 numbers are, that are divisible by 3 or 7 or 11. How would we do that?

If we are not good at calculating stuff in our heads, we wouldn't know how long we would have to loop to find the first 20 such numbers. So a for loop doesn't seem to be a good idea. So, let's copy the scheme from the previous example! We create an infinite stream of such numbers, and limit it to the first 20! Let's see what that would look like in code:

import java.util.stream.IntStream;

public class Infinity {
  public static void main(String[] args) {
    IntStream.iterate(0, i -> ++i)
      .filter(i -> i % 3 == 0 || i % 7 == 0 || i % 11 == 0)
      .limit(20)
      .forEach(System.out::println);
  }
}

You are probably keen to see what those numbers are!

$ javac Infinity.java && java Infinity
0
3
6
7
9
11
12
14
15
18
21
22
24
27
28
30
33
35
36
39

OK, that was very enlightening. What about something more nerdy? What are the 21 first powers of two?

We know that computers work with binary numbers, right? What is a binary number? It's a combination of 1s and 0s. How many combinations are there of 20 ones and zeros? 220. 19 ones and zeros? 219. Etc. How large numbers are the various ones and zeros? Let's look at the first 21! That is 02, 12, 32...202.

import java.util.stream.IntStream;

public class PowersOfTwo {
  public static void main(String[] args) {
    IntStream.range(0, 20)
      .map(i -> (int)Math.pow(2, i))
      .forEach(System.out::println);
  }
}

And the numbers?

$ javac PowersOfTwo.java && java PowersOfTwo
1
2
4
8
16
32
64
128
256
512
1024
2048
4096
8192
16384
32768
65536
131072
262144
524288

Here, we used another way to generate an int stream, range(start, end). And we applied a function to each number in the stream, using map(i -> (int)Math.pow(2, i)), which took a lambda for the function.

Finally, we'll look at a useful method of streams, called reduce(). The reduce() method takes two arguments:

  • the identity value for a binary operation (like plus, times etc)
  • a binary operation (a function with two arguments)

The identity for the addition operation is 0. The identity for multiplication is 1.

Let's go back to create an infinite stream of all values but from 1 and upwards. Now, lets say we want to know what the sum of the first 50 numbers are (1 + 2 + 3... + 50). How can we do that?

We take the stream and reduce it using the identity 0 and a function for adding the numbers:

import java.util.stream.IntStream;

public class Infinity {
  public static void main(String[] args) {
    int sum = IntStream.iterate(1, i -> ++i)
      .limit(50)
      .reduce(0, (a, b) -> a + b);
    System.out.println("The sum of all numbers from 1 to 50 is: " + sum);
  }
}
$ javac Infinity.java && java Infinity
The sum of all numbers from 1 to 50 is: 1275

Now, as a final demonstration, let's find out what the sum of the first 21 powers of two is! We've all wanted to know that, for so long, haven't we?

First figure out how you would calculate that without streams and lambda. That's a good exercise for loops, if you need to freshen that up. Here's what it could look like using streams and lambdas:

import java.util.stream.IntStream;

public class Infinity {
  public static void main(String[] args) {
    int sum = IntStream.range(0, 20)
      .map(i -> (int)Math.pow(2, i))
      .reduce(0, (a, b) -> a + b);
    System.out.println("The sum of all powers of two from 0 to 20 is: " + sum);
  }
}
$ javac Infinity.java && java Infinity
The sum of all powers of two from 0 to 20 is: 1048575

Would it shock you to learn that we can achieve such calculations on our original infinite stream?

We can do this:

  • create an infinite stream from 0 and up
  • map 2each_number
  • limit to e.g. 31
  • reduce using addition

This is the code using an infinite stream for calculating the sum of the first 31 (0..30) powers of two:

import java.util.stream.IntStream;

public class Infinity {
  public static void main(String[] args) {
    long sum = IntStream.iterate(0, i -> ++i)
      .map(i -> (int)Math.pow(2, i))
      .limit(31)
      .reduce(0, (a, b) -> a + b);
     System.out.println("The sum of all powers of two from 0 to 30 is:       " + sum);
     System.out.println("Which is exactly the same as the maximum int value: " + Integer.MAX_VALUE);
  }
}

Or, a version, using long values (mapToLong produces a LongStream):

import java.util.stream.IntStream;

public class Infinity {
  public static void main(String[] args) {
    long sum = IntStream.iterate(0, i -> ++i)
      .mapToLong(i -> i)
      .limit(31)
      .map(i -> (long)Math.pow(2, i))
      .sum();
     System.out.println("The sum of all powers of two from 0 to 30 is:       " + sum);
     System.out.println("Which is exactly the same as the maximum int value: " + Integer.MAX_VALUE);
  }
}
$ javac Infinity.java && java Infinity
The sum of all powers of two from 0 to 30 is:       2147483647
Which is exactly the same as the maximum int value: 2147483647

As it happens, the maximum int value is 231 -1. And, as it also happens, the sum of the n first powers of two is exactly 2n+1 -1.

Can you figure out why?

Expand using link to the right to see some explanation.

Let's start with a story. There's a story about the inventor of Chess who was to get paid for his invention. When asked how much he wanted, he asked to be paid in wheat. How much? He asked to be paid the amount of wheat grains which represents the sum of grains if one grain was placed on the first square of the Chess board, two grains on the second, four on the next, eight on the following, and so forth, each square with the double amount of grains as the one before. The story has it that the master who was to pay up laughed at such a small price!

Now, how many grains of wheat would the total be?

We can easily realize that the amount on each square represents a power of two, starting with:

  • square 1: 1 grain ( 20)
  • square 2: 2 grains (21)
  • square 3: 4 grains (22)
  • square 4: 8 grains (23)
  • ...
  • square 64: 263 grains

So, the total amount of grains amount to: Sum of 20 + 21 + 22 + ... + 263, which is a huge number.

But how huge?

Remember that such a sum, from 20 to 2n is the same as 2n+1 - 1.

So, 264 - 1. That's larger than what fits in a Java long variable, whose largest number is 263 - 1.

But, wait! We have java.math.BigInteger, for arbitrarily large integers!

So, how do we express 264 - 1, in Java?

Well, we could do Long.MAX_VALUE times 2, then add one. Why?
Long.MAX_VALUE == 263 - 1.
Long.MAX_VALUE * 2 == 264 -2.
Now, all we have to do, is add one, and we'll get 264 - 1.

Another story about the grains on the chess board, is that it is often given to students who are learning to program. Both because it is a good exercise for loops and integer overflow (the result doesn't fit inside a long! And since it is a Classic assignment™

Here's our solution to the problem, in Java:

import java.math.BigInteger;

public class WheatChess {
  public static void main(String[] args) {
    System.out.print("Calculating number of wheat grains...");

    /* sum(2^0, 2^1,...2^63) = 2^64-1.
     * Long.MAX_VALUE = 2^63-1.
     * Long.MAX_VALUE * 2 = 2^64-2.
     * Long.MAX_VALUE * 2 + 1 = 2^64-1.
     * Answer: Number of grains is Long.MAX_VALUE * 2 + 1.
     */
    BigInteger grains = BigInteger.valueOf(Long.MAX_VALUE).multiply(BigInteger.valueOf(2)).add(BigInteger.ONE);
    System.out.println("Done!");
    System.out.println(grains);

  }
}

Here's an explanation for why the IntStream example adding the first 31 powers of two adds up to Ingeger.MAX_VALUE:

    /* sum(2^0..30) = 2^31-1.
     * Integer.MAX_VALUE = 2^31-1.
     * Answer: the sum of 2^0 + 2^1 + 2^2 + ... + 2^30, is the same as Integer.MAX_VALUE
     */

What now?

Now, see the video lectures, and then head on to the exercises!

Slides and videos

One lecture and one live coding video.

Streams and Lambda introduction

Lecture slides

Lecture videos

Filtering lists using lambdas (live coding)

Links

Further reading

Source code

Where to go next

Next page has exercises on streams and lambdas. Previous page is not yet decided.

« PreviousBook TOCNext »