[Notes] Java8 | Stream, Terminal vs Non-Terminal operation, etc
∘ Stream
∘ Parallel stream vs Sequential Stream
∘ Stream vs Loop
∘ Terminal vs Non-Terminal/Intermediate operation
∘ Future read
Stream
“A sequence of elements” supporting sequential and parallel aggregate operations.Added to take advantage of lambda expression also added in Java8Provides a functional approach to processing bounded streams of objects. Capable of internal iteration of it's elements. Provides functionality for processing its elements during iteration.
- Collections support operations such as
add()
,remove()
, andcontains()
that work on a single element. Streams, in contrast, have bulk operations such asforEach()
,filter()
,map()
, andreduce()
that access all elements in a sequence. - The actual motivation for inventing streams for Java was performance or — more precisely — making parallelism more accessible to software developers
- Streams, which come in two flavours (as sequential and parallel streams), are designed to hide the complexity of running multiple threads.
- Parallel streams make it extremely easy to execute bulk operations in parallel — magically, effortlessly, and in a way that is accessible to every Java developer.
Parallel stream vs Sequential Stream
You might be tempted to generalise these figures and conclude that parallel streams are always faster than sequential streams, perhaps not twice as fast (on a dual core hardware), as one might hope for, but at least faster. However, this is not true. Again, there are numerous aspects that contribute to the performance of a parallel stream operation.
- Splitability: One of them is the splittability of the stream source. An array splits nicely; it just takes an index calculation to figure out the mid element and split the array into halves. There is no overhead and thus barely any cost of splitting. How easily do collections split compared to an array? What does it take to split a binary tree or a linked list? In certain situation you will observe vastly different performance results for different types of collections.
- Statefulness: Another aspect is statefulness. Some stream operations maintain state.
An example is thedistinct()
operation. It is an intermediate operation that eliminates duplicates from the input sequence, i.e., it returns an output sequence with distinct elements. In order to decide whether the next element is a duplicate or not the operation must compare to all elements it has already encountered. For this purpose it maintains some sort of data structure as its state. If you calldistinct()
on a parallel stream its state will be accessed concurrently by multiple worker threads, which requires some form of coordination or synchronisation, which adds overhead, which slows down parallel execution, up to the extent that parallel execution may be significantly slower than sequential execution.
With this in mind it is fair to say that the performance model of streams is not a trivial one. Expecting that parallel stream operations are always faster than sequential stream operations is naive. The performance gain, if any, depends on numerous factors, some of which I briefly mentioned above. If you are familiar with the inner workings of streams you will be capable of coming up with an informed guess regarding the performance of a parallel stream operation. Yet, you need to benchmark a lot in order to find out for a given context whether going parallel is worth doing or not. There are indeed situations in which parallel execution is slower than sequential execution and blindly using parallel streams in all cases can be downright counter- productive.
The realisation is: Yes, parallel stream operations are easy to use and often they run faster than sequential operations, but don’t expect miracles. Also, don’t guess; instead, benchmark a lot.
Stream vs Loop
- Streams are a more declarative style. Or a more expressive style.
- It may be considered better to declare your intent in code, than to describe how it’s done:
return people
.filter( p -> p.age() < 19)
.collect(toList());vsList<Person> filtered = new ArrayList<>();
for(Person p : people) {
if(p.age() < 19) {
filtered.add(p);
}
}
return filtered;(Says "I'm doing a loop". The purpose of the loop is buried deeper in the logic.)
- Streams are often terser. The same example shows this. Terser isn’t always better, but if you can be terse and expressive at the same time, so much the better.
- Streams have a strong affinity with functions. Java 8 introduces lambdas and functional interfaces, which opens a whole toybox of powerful techniques. Streams provide the most convenient and natural way to apply functions to sequences of objects.
- Streams encourage less mutability. This is sort of related to the functional programming aspect — the kind of programs you write using streams tend to be the kind of programs where you don’t modify objects.
- Streams encourage looser coupling. Your stream-handling code doesn’t need to know the source of the stream, or its eventual terminating method.
- Streams can succinctly express quite sophisticated behaviour. For example:
stream.filter(myfilter).findFirst();
Might look at first glance as if it filters the whole stream, then returns the first element. But in fact findFirst()
drives the whole operation, so it efficiently stops after finding one item.
- Streams provide scope for future efficiency gains. Some people have benchmarked and found that single-threaded streams from in-memory
List
s or arrays can be slower than the equivalent loop. This is plausible because there are more objects and overheads in play.
But streams scale. As well as Java’s built-in support for parallel stream operations, there are a few libraries for distributed map-reduce using Streams as the API, because the model fits.
Disadvantages?
- Performance: A
for
loop through an array is extremely lightweight both in terms of heap and CPU usage. If raw speed and memory thriftiness is a priority, using a stream is worse. - Familiarity.The world is full of experienced procedural programmers, from many language backgrounds, for whom loops are familiar and streams are novel. In some environments, you want to write code that’s familiar to that kind of person.
- Cognitive overhead. Because of its declarative nature, and increased abstraction from what’s happening underneath, you may need to build a new mental model of how code relates to execution. Actually you only need to do this when things go wrong, or if you need to deeply analyse performance or subtle bugs. When it “just works”, it just works.
- Debuggers are improving, but even now, when you’re stepping through stream code in a debugger, it can be harder work than the equivalent loop, because a simple loop is very close to the variables and code locations that a traditional debugger works with.
Terminal vs Non-Terminal/Intermediate operation
- Non-Terminal operation : will transform a stream into another stream, such as
filter(Predicate).
Non-Terminal Operation : Stream --> Stream - Terminal Operation : will produce a result or side effect, such as count() or forEach(Consumer)
Terminal Operation : Stream --> Result
- Intermediate operations are Lazy — all intermediate operations will NOT be executed without a terminal operation at the end.
- In a way, an intermediate operation is memorized and is recalled as soon as a terminal operation is invoked.
- You can chain multiple intermediate operations and none of them will do anything until you invoke a terminal operation. At that time, all of the intermediate operations that you invoked earlier will be invoked along with the terminal operation.
Intermediate Operations -filter(Predicate<T>)
map(Function<T>)
flatmap(Function<T>)
sorted(Comparator<T>)
peek(Consumer<T>)
distinct()
limit(long n)
skip(long n)Terminal Operations - forEach
forEachOrdered
toArray
reduce
collect
min
max
count
anyMatch
allMatch
noneMatch
findFirst
findAny
Future read
http://cr.openjdk.java.net/~briangoetz/lambda/lambda-libraries-final.html