Posts Tagged ‘java’

Google Collections

Monday, October 26th, 2009

Google Collections - Part 1
Google Collections - Part 2

This Google Tech Talk covers the Google Collections library for Java, a library I was introduced to on my last work term. It provides some useful data structures that I’ve had uses for before, as well as utility methods that make working with the standard Java classes easier. This talk gave me a much better overview of what precisely was in these libraries.

Google Collections was open-sourced by Google, so you can also use it. It’s hosted on google code at http://code.google.com/p/google-collections/. It was programmed for JDK 1.5, although being Java it is forward-compatible with Java 1.6. It’s widely used internally at Google and other large companies, so it is very well-tested. It is an extension to the Java Collections library.

The first set of data structures added by Google Collections are Immutable collections. Google provides ImmutableSet, ImmutableMap, ImmutableList, some other immutable data structures and differing implementations of these. These are guarantied to be immutable, and are slightly faster and use less memory. There is also UnmodifiableIterator, which is an iterator that does not supply remove() in order to get a view of a mutable collection. Immutable data structures are made to be initialized - Sets can be created with ImmutableSet.of( Element_1, Element_2, … ), and Immutable Maps are constructed with the following syntax:

ImmutableMap.Builder<Integer, Integer> builder = new ImmutableMap.Builder<Integer, Integer>();
return builder.put( 1, 2 ).put( 2, 3 ).put( 3, 4 ).build();

There is only one problem I’ve had with the Immutable data structures, which more a fault of the ndesign of the Java Collections Library than Google. Since modification methods of most collections return void - like List.add() - Immutable collections throw exceptions when these methods are called on them. In several cases, it would make my life much easier if these methods returned new ImmutableLists without modifying the old one, but I suppose this could be constructor. These data structures are guaranteed to be immutable - All constructors for them are private, so they cannot be subclassed by others who subvert this immutability.

There are a few caveats to the Immutable structures - they cannot store Null, which I’m really quite OK with, and there isn’t a way to enforce deep immutability. This means that if you have a ImmutableList of some data structure, you can retrieve the structures and modify them - just not the contents of the list itself.

Google Collections also introduces a few new types of data structures. One of these is the Multiset, which is a Set that is unordered and allows duplicates. The four main methods on it are count, add, remove, and setCount. There are 6 different implementations of Multiset optimized for various performance characteristics.

Mulimaps are also introduced. These are one of the most usefule things introduced by this library; I’ve had to implement these before, usually in a more unsafe way. Multimaps store many-many relationships; They are Maps, except calls to get() will retrieve a list of values that is added to when you put() a key-value pair. These also do the logical thing of returning empty collections when you attempt to get a value that isn’t in the map, not return null. There are 5 different implementations of Multimap, which can return different types of Collection from Get - ListMultimap, SetMultimap, SortedSetMultimap are a few of these.

Bimaps are also introduced. Bimaps are like Maps which require values to be unique - this has the benefit of being able to map from value->key as well as key->value. This has 3 implementations. Finally, a MapMaker class that will construct a ConcurrentMap. MapMaker allows you to specifie whether you want strong, weak, or soft keys and values, for a total of 9 combinations. It is fully concurrent, and can be used as a ConcurrentMap once you call makeMap(). It also has a makeComputingMap, which allows you to specify a function that will compute the value to be returned.

Another very useful feature Google Collections provides is static factory methods to create all types provided by their library, as well as Java classes. This cuts down the verbosity, due to Java’s type inference for static methods - you can do

Map< Integer, Boolean > m = Maps.newHashMap();

instead of

Map< Integer, Boolean > m = new HashMap< Integer, Boolean >();

Ordering, a better Comparator, is also provided. You can generate an Ordering using Ordering.forComparator( Comparator ), which provides an Ordering object for you to use. Orderings provide a bunch of utility methods - min, max, isIncreasing, and a bunch of others. Orderings work whenever Comparators are expected, so they are pretty much strictly better than Comparators.

Google Collections has as a design philosophy that methods work on Iterators whenever possible. This helps in the cases where you have more data than can fit in memory. To improve iterators, they provide the classes Iterators and Iterables, which provide utility methods for working with Iterators and Iterables. Some of these are transform, filter, and concat - you should look in the documentation for a full list.

The Google Collections library is extensively unit tested. A framework for creating tests was created, resulting in 25000 unit tests generated from a few thousand base tests. Running standard Java Collections classes through this framework has found bugs in these - they are quite thorough.

Having used the Google Collections library, I can testify that if you are programming Java, you should be using it. It makes your life a lot easier in pretty common cases, and some of the more esoteric stuff is always there if you need it. This talk was very interesting, and introduced me to some of the Library I was previously unaware of.

Closures in Java

Wednesday, September 23rd, 2009

Advanced Topics in Programming Languages: Closures for Java

This Google Tech talk was quite interesting, although a bit depressing since it seems unlikely that Java will get closures anytime soon. Still, it was a great talk on how closures could be added to Java cleanly.

The talk starts off with describing use cases for closures. The two main issues were to enable control abstraction APIs and to make API calls easier. Enabling control abstraction APIs means allowing programmers to define abstractions like the foreach loop without having to wait for them to be added to Java itself. Making API calls easier by removing the reliance on anonymous inner classes (which closures get compiled to) also looks quite nice.

The talk first presents a way of defining foreach using anonymous inner classes, and shows the problems that occur with this approach. The first is an issue with toString and lexical scoping, and then goes on to the problems with Exceptions, return and break statements, being unable to access non-final variables, etc. Many of these problems have solutions, but solving these problems has nothing to do with what you are trying to express and solving them just obscures the business logic.

A few examples of where closures would be useful are then presented. The first was a withLock construct, though the concept also translates to streams. The accepted Project Coin proposal on automatic resource management implements a subset of what you would be able to do with closures, using them to automatically close streams. Another example is performing timing on functions - without closures, this adds at least 7 lines of non-business-logic code to each method you want timing information on, whereas you could program a withTiming closure and only take two lines.

The talk outlines the requirements than any proposal for adding closures to Java must implement. The first is that it must be able to create control abstractions that are on par with built-in statement forms. They must be concise, or there’s no advantage in using them. Lastly, they must inter-operate with existing APIs that currently take anonymous inner classes.

The details of the proposal, which were quite interesting, were then presented. This proposal adds function types to Java - for example int=>int represents a function that takes an int and returns an int. Function types are implemented as an interface generated on-demand with an ‘invoke’ method. Since they are regular interfaces, you can extend them, make them serializable, all of the things you can do with regular interfaces. The syntax for closures is:

     {int x => x + 1}

Which represents a closure that takes an int and returns it’s increment. Unfortunately, closures do not have any way to yield a result early - returns will return from the function the closure is in, not the closure itself. The way these closures are made to be inter-operable with existing APIs is through a process called ‘closure conversion’, which will make a closure auto-implement an interface such as runnable if the argument is a Runnable. The compiler is required to enforce this, so you can pass in closures to API calls requiring a Runnable and it will work as expected.

To simulate build-in control abstractions, such as withLock, you don’t want to have to pass in a closure as an argument. The proposal thus suggests that functions requiring a closure as the last argument can be expressed as

withLock( lock ) { doSomething(); };

instead of

withLock( lock, {=>doSomething();});

which is much less cumbersome.

These closures work properly - they can access local variables, throw exceptions, and are interoperable with existing APIs. You also have completion transparency, which is the ability to create functions which return the same type as a closure they take as an argument. This allows for constructs such as withLock to not need to worry about the return type of the closure; you can define one withLock function and then be able to return calls to withLock as the proper type.

The talk wraps up with a number of examples of how closures could be used, and then a link to where you could get more information on the proposal. This was a great talk, and I wish the proposal was going to be implemented in Java 7.

Updating Imenu for Java 1.5

Tuesday, July 21st, 2009

NOTE: In the following regex’s, some strings are broken into two lines. This is due to a C-M character that renders as a newline in wordpress - It looks like ^M in emacs.

While coding in Java for work, I realized that the default imenu indexing for java was not sufficient. I usually use Imenu for jumping to a function in the same file (as opposed to CTags or something like that), and it didn’t load every function into the index. After some experimentation, I found that the problem occurred when either generic types were used or annotations were on the parameters of the functions. Since I’ve started using the @NonNull annotation and Findbugs frequently, this was a problem.

After digging around Imenu’s preferences, it seems the problem was in the variable imenu-generic-expression. If no function is specified for imenu-extract-index-name-function (which is not by default in Java-mode),
imenu-generic-expression is used to determine the name of the function. The implementation is essentially to go to the end of the buffer, repeatedly call beginning-of-defun, and use the regex stored in imenu-generic-expression to parse out the function name. This regex is originally set somewhere in the Emacs code base as the following for java-mode:

"[[:alpha:]_][][.[:alnum:]_]+[ 	\n
]+\\([[:alpha:]_][[:alnum:]_]+\\)[ 	\n
]*([ 	\n
]*\\([][.,[:alnum:]_]+[ 	\n
]+[][.,[:alnum:]_][][.,[:alnum:]_ 	\n
]*\\)?)[.,[:alnum:]_ 	\n
]*{"

Realizing that I would have to modify this regexwas not very encouraging, but breaking it into separate pieces and puzzling over the meaning came up with the following commented regex. While still ugly, it is at least no longer as confusing:

(concat
    "[[:alpha:]_]"                    ;start of type
    "[][.[:alnum:]_]+"                 ;end of type
    "[ \t\n
]+"                      ;whitespace
    "\\([[:alpha:]_][[:alnum:]_]+\\)" ;captured function name
    "[ \t\n
]*"                      ;whitespace
    "("                               ;start of arg list
       "[ \t\n
]*"                   ;whitespace
       "\\("
           "[][.,[:alnum:]_]+"        ;type
           "[ \t\n
]+"               ;whitespace
           "[][.,[:alnum:]_]"         ;start of var name
           "[][.,[:alnum:]_ \t\n
]*" ;end of var name, space
       "\\)?"
    ")"                               ;end of arg space    "[.,[:alnum:]_ \t\n
]"*"         ;throws declarations and whitespace
    "{"                               ;open brace

With this done, it was fairly easy to transform into a regex that accepted generics and annotations by adding <, >, space, and @ in the appropriate places. While this will find some malformed functions, I’m OK with this - It still has to be located at the beginning of a function for a name to come up, in which case even if I made a type I want to be able to jump there. The modified regexp ended up looking like this:

(concat
    "[[:alpha:]_]"                       ;start of type
    "[][.[:alnum:]_<> ]+"                ;type
    "[ \t\n
]+"                         ;whitespace
    "\\([[:alpha:]_][[:alnum:]_]+\\)"    ;funname
    "[ \t\n
]*"                         ;whitespace
    "("
    "[ \t\n
]*"                         ;whitespace
       "\\("                             ;argument list
 
           "[][.,[:alnum:]_@<> ]+"       ;annotations/type
           "[ \t\n
]+"                  ;whitespace
           "[][.,[:alnum:]_]"            ;start of var name
           "[][.,[:alnum:]_@<> \n\t
]*" ;end of var name
       "\\)?"
     ")"
    "[.,[:alnum:]_ \t\n
]*"             ;more whitespace, throws declarations
    "{"                                  ; begin fun
   )
)

This still leaves the problem of having this be automatically defined in java-mode, but any experienced emacs user will know how to do this: hooks! Specifically, adding a function that sets imenu-generic-expression to the correct value to java-mode-hook will automatically execute whenever a java buffer is entered.

(add-hook 'java-mode-hook '(lambda ()
 (setq imenu-generic-expression
 `((nil
    ,(concat
    "[[:alpha:]_]"                       ;start of type
    "[][.[:alnum:]_<> ]+"                ;type
    "[ \t\n
]+"                         ;whitespace
    "\\([[:alpha:]_][[:alnum:]_]+\\)"    ;funname
    "[ \t\n
]*"                         ;whitespace
    "("
    "[ \t\n
]*"                         ;whitespace
       "\\("                             ;argument list
           "[][.,[:alnum:]_@<> ]+"       ;annotations/type
           "[ \t\n
]+"                  ;whitespace
           "[][.,[:alnum:]_]"            ;start of var name
           "[][.,[:alnum:]_@<> \n\t
]*" ;end of var name
       "\\)?"
     ")"
    "[.,[:alnum:]_ \t\n
]*"             ;more whitespace, throws declarations
    "{"                                  ; begin fun
   )
 1)
  ))))

Why the old regex is still in emacs itself I don’t know: It is obviously from Java 1.4, which didn’t have templates or annotations, but Java 1.5 has been out for a while. This new version will catch all the occurrences that would be caught by the old regex, so don’t worry about compatibility issues. I submitted a patch to fix this to the emacs-devel mailing list - this seems like something that should be fixed in the main distro - so we’ll see whether or not this will make it in or not.

Accessing Javadoc via Emacs

Tuesday, July 7th, 2009

Since I do a lot of programming in Java, both for classes and my internships(so far, two have been primarily Java), I often need to look up javadoc. However, most of the time this information has been easily accessible, either through JDE or Eclipse. However, I recently started learning Clojure(I’ll cover my clojure setup in my next blog about emacs), and so neither of these tools were available to me. Searching for help online, I found the javadoc-help package.

Unfortunately, after trying it, I didn’t end up liking it very much. The search it uses, which pulls javadoc for each class that has as a substring the class you input, is far too wide. In most cases, I want an exact match on the class name I’m given - If I search for information on File, I don’t want to have to choose between it and FileFilter. Since this package didn’t work out, I wrote my own quick search function:

(defun java-describe (symbol-name)
  (interactive "MJava Class:")
  (let ((my-string (replace-regexp-in-string 
		    "^.*class-use/.*n" 
		    ""
		    (shell-command-to-string
		     (concat "find ~/.emacs.d/documentation/java/docs/api/ -name "
		     	     symbol-name
			     ".html"))
		   )))
    (string-match "^\(.*\)$" my-string)
    (browse-url (match-string 0 my-string))))
(global-set-key (kbd "C-h j") 'java-describe)

This function just finds the javadoc corresponding to the exact class name you specify, though you don’t specify package. If there are more than one class with the same name in different packages, it’ll choose one to show; I haven’t had an issue with this yet, although once it comes up I’ll probably allow adding part of the package name as well. For this to work, you need to the java API specification downloaded and extracted in the directory pointed to by this function - I just keep mine in ~/.emacs.d/documentation/java/.

The Dangers of String.substring

Sunday, July 5th, 2009

A few days ago at work, I had to track down why my Java application was running out of memory. It processed a few CSV files, storing some of the data in them. The files were large, but the data my application was storing should have easily been able to fit into memory. After an hour or so of investigation, I found it was an issue I had heard about but never run into myself: the String.substring memory leak.

The bug in question is that String.substring, instead of returning a new string, returns a string based on the original string. This means that the larger string can’t be garbage-collected, so if you store a substring thben the entire string will be kept in memory. In my specific case, I was using String.split and storing a few of the fields - however, String.split uses String.substring behind the scenes, so I ended up keeping the entire files in memory, explaining my OutOfMemoryErrors.

This bug is due to the Java implementors wanting String.substring to be as fast as possible. If it did not have this behaviour, a new char[] would have to be allocated on the heap each time the function was called and the data would have had to be copied into it. Moreover, if you did need the larger string to stay around, part of the string would be duplicated, which is inefficient memory-wise. The workaround for this bug is to just wrap calls to String.substring or calls that use it in a new String(). For example:

     String sub = new String( oldString.substring(0, 4) );

and the old string will be garbage collected.

For the above-mentioned reasons of the bug’s existence and the length of time it’s been opened, it probably won’t be fixed. The workaround suggested currently is not *too* painful, although you do have to find which calls themselves use String.substring, in many cases even if you forget the memory inefficiency isn’t too terrible (if you strip off the last character of a string, for example). It would be nice if there was a mention in the javadoc for these methods that this behaviour occurred, so that looking at the description of a method would warn you about issues such as this.

JSR 305

Thursday, May 14th, 2009

JSR-305

This Google Tech Talk was about JSR-305, a JSR that was adopted and will be in Java 7 (I believe). This JSR involves added annotations for software defect detection - essentially, it aims to provide standard annotations that tools can understand and use to help detect errors in your code. Some of these have been implemented nonstandardly in tools such as IntelliJ, such as the @Nonnull annotation.

This talk went over what would be provided and examples of why this would be useful, instead of going into implementation which would be tool-dependent. He did mention one possibility for enforcing these annotations at runtime, but I think it would have been interesting to go more in-depth on this. Oh well, there’s only so much you can cover in an hour.

I’ve known about some of the annotations before as ones that were coming to Java7, eg @Nonnull, but not some of the others. While (hopefully) the ones built-in to Java7 will have compiler support, since the JSR includes a way of defining your own classes it would be nice if there was a way to be able to tell javac how to verify those, reducing the need for external tools. On second thought, this seems like informing the compiler how to check for violations of these annotations seem rather difficult; maybe it should be left to tools developers.

One thing mentioned in this presentation that I was unaware of was class- and package-level default annotations. When I heard about @Nonnull, I liked the idea, but it seemed like it would end up making Java even more verbose, since you’d usually want to add it to essentially every variable. Default annotations allow you to annotate a class or package and essentially have variables ‘inherit’ from this default; this allows you to make every variable in one class @Nonnull with only one annotation, much better than if each variable declaration needed an annotation.