Google Collections

Google Collections - Part 1
Google Collections - Part 2

This Google Tech Talk covers the Google Collections library for Java, a library I was introduced to on my last work term. It provides some useful data structures that I’ve had uses for before, as well as utility methods that make working with the standard Java classes easier. This talk gave me a much better overview of what precisely was in these libraries.

Google Collections was open-sourced by Google, so you can also use it. It’s hosted on google code at http://code.google.com/p/google-collections/. It was programmed for JDK 1.5, although being Java it is forward-compatible with Java 1.6. It’s widely used internally at Google and other large companies, so it is very well-tested. It is an extension to the Java Collections library.

The first set of data structures added by Google Collections are Immutable collections. Google provides ImmutableSet, ImmutableMap, ImmutableList, some other immutable data structures and differing implementations of these. These are guarantied to be immutable, and are slightly faster and use less memory. There is also UnmodifiableIterator, which is an iterator that does not supply remove() in order to get a view of a mutable collection. Immutable data structures are made to be initialized - Sets can be created with ImmutableSet.of( Element_1, Element_2, … ), and Immutable Maps are constructed with the following syntax:

ImmutableMap.Builder<Integer, Integer> builder = new ImmutableMap.Builder<Integer, Integer>();
return builder.put( 1, 2 ).put( 2, 3 ).put( 3, 4 ).build();

There is only one problem I’ve had with the Immutable data structures, which more a fault of the ndesign of the Java Collections Library than Google. Since modification methods of most collections return void - like List.add() - Immutable collections throw exceptions when these methods are called on them. In several cases, it would make my life much easier if these methods returned new ImmutableLists without modifying the old one, but I suppose this could be constructor. These data structures are guaranteed to be immutable - All constructors for them are private, so they cannot be subclassed by others who subvert this immutability.

There are a few caveats to the Immutable structures - they cannot store Null, which I’m really quite OK with, and there isn’t a way to enforce deep immutability. This means that if you have a ImmutableList of some data structure, you can retrieve the structures and modify them - just not the contents of the list itself.

Google Collections also introduces a few new types of data structures. One of these is the Multiset, which is a Set that is unordered and allows duplicates. The four main methods on it are count, add, remove, and setCount. There are 6 different implementations of Multiset optimized for various performance characteristics.

Mulimaps are also introduced. These are one of the most usefule things introduced by this library; I’ve had to implement these before, usually in a more unsafe way. Multimaps store many-many relationships; They are Maps, except calls to get() will retrieve a list of values that is added to when you put() a key-value pair. These also do the logical thing of returning empty collections when you attempt to get a value that isn’t in the map, not return null. There are 5 different implementations of Multimap, which can return different types of Collection from Get - ListMultimap, SetMultimap, SortedSetMultimap are a few of these.

Bimaps are also introduced. Bimaps are like Maps which require values to be unique - this has the benefit of being able to map from value->key as well as key->value. This has 3 implementations. Finally, a MapMaker class that will construct a ConcurrentMap. MapMaker allows you to specifie whether you want strong, weak, or soft keys and values, for a total of 9 combinations. It is fully concurrent, and can be used as a ConcurrentMap once you call makeMap(). It also has a makeComputingMap, which allows you to specify a function that will compute the value to be returned.

Another very useful feature Google Collections provides is static factory methods to create all types provided by their library, as well as Java classes. This cuts down the verbosity, due to Java’s type inference for static methods - you can do

Map< Integer, Boolean > m = Maps.newHashMap();

instead of

Map< Integer, Boolean > m = new HashMap< Integer, Boolean >();

Ordering, a better Comparator, is also provided. You can generate an Ordering using Ordering.forComparator( Comparator ), which provides an Ordering object for you to use. Orderings provide a bunch of utility methods - min, max, isIncreasing, and a bunch of others. Orderings work whenever Comparators are expected, so they are pretty much strictly better than Comparators.

Google Collections has as a design philosophy that methods work on Iterators whenever possible. This helps in the cases where you have more data than can fit in memory. To improve iterators, they provide the classes Iterators and Iterables, which provide utility methods for working with Iterators and Iterables. Some of these are transform, filter, and concat - you should look in the documentation for a full list.

The Google Collections library is extensively unit tested. A framework for creating tests was created, resulting in 25000 unit tests generated from a few thousand base tests. Running standard Java Collections classes through this framework has found bugs in these - they are quite thorough.

Having used the Google Collections library, I can testify that if you are programming Java, you should be using it. It makes your life a lot easier in pretty common cases, and some of the more esoteric stuff is always there if you need it. This talk was very interesting, and introduced me to some of the Library I was previously unaware of.

Tags: ,

Leave a Reply