Spliterator for “in-between” lambda processing
The usecase of summing up the distances in a list of vectors can benefit from parallelization on todays multi-core systems with Java 8 lambdas using a Spliterator.
If one writes
vecList.stream().mapToDouble( e-> e.getDistance( b )
Then a reference to the next point “b” isn’t there hence the calculation cannot be performed that way. There is a property of each list item, it has a relation to the previous and a relation to the next item in the list.
Expressed as a piece of java code it might look as simple as this
final class Relation<T> { private final T a; private final T b; public Relation(T t0, T t1) { this.a = t0; this.b = t1; } public T getA() { return a; } public T getB() { return b; } }
If we iterate on the relation organizing the list items into pairs, a distance calculation is possible by using a lambda expression and the stream api.
double totalDistance = stream.mapToDouble( e -> { return e.getA().distance(e.getB()); }).sum();
To get there, a Spliterator class has to be defined that provides items for processing when the lambda expression is evaluated and is also capable of dividing the remaining set of items into subsets (sub-lists in this case) for parallel processing. Here is a possible implementation:
/** * A provider for pairs of consecutive list items * * @param <T> */ public class PairIterator<T> implements Spliterator<Relation<T>> { private final List<T> list; private int index; private final int start; private int end; /** * Allows to iterate over elements and their successors */ public PairIterator(List<T> list) { this(list, 0, list.size()); } /** * Internally used when the iterator is being split */ private PairIterator(List<T> list, int start, int end) { this.list = list; this.start = index = start; this.end = end; } /** * Creates a new Relation between two consecutive items and * performs the {@link Consumer#accept(entry)} operation. */ @Override public boolean tryAdvance(Consumer<? super Relation<T>> action) { Relation<T> entry; if (index >= end || index >= list.size()-1) { return false; } entry = new Relation<T>(list.get(index), list.get(++index)); action.accept(entry); return true; } /** * If the list still has more that 4 elements to be processed * the method splits the work items into two working sets of ordered elements */ @Override public PairIterator<T> trySplit() { int size = end - index; if (size > 4) { //reuse the middle element for both sides, so pair(mid-1,mid) and pair(mid,mid+1) is created int mid = start + (size >> 1); int prevEnd = end; end = mid; return new PairIterator<T>(list, mid, prevEnd); } return null; } /** * @return the size of the remaining workload */ @Override public long estimateSize() { return (end - index); } /** * Configuration of this spliterator */ @Override public int characteristics() { return Spliterator.CONCURRENT | Spliterator.SIZED | Spliterator.ORDERED; } }
There are two methods in the Spliterator implementation that are used to accomplish streaming the relations of predeccessor and successor, tryAdvance
and trySplit
. The first actually performs the action on the item pair by instantiating a Relation
on-the-fly, and calling accept(...)
where the real calculation happens. The latter contains the algorithm to distribute the workload to the consumers. If the size of the remaining elements is greater than 4 it changes the indeces pointing into the list and returns a new instance configured for the remaining indeces. The method is designed to have a list element being used in two relations, once as a sucessor and once as a predecessor, so care must be taken not to modify its state by executing code with side-effects.
To use it a Stream is constructed with the StreamSupport
class by passing the Spliterator as an argument and true to ensure it is processed in parallel:
//creates a parallel stream Stream<Relation<Vec2>> stream = StreamSupport.stream(new PairIterator<Vec2>(vecList), true); double totalDistance = stream.mapToDouble( e -> { return e.getA().distance(e.getB()); }).sum();