Updating Imenu for Java 1.5

NOTE: In the following regex’s, some strings are broken into two lines. This is due to a C-M character that renders as a newline in wordpress - It looks like ^M in emacs.

While coding in Java for work, I realized that the default imenu indexing for java was not sufficient. I usually use Imenu for jumping to a function in the same file (as opposed to CTags or something like that), and it didn’t load every function into the index. After some experimentation, I found that the problem occurred when either generic types were used or annotations were on the parameters of the functions. Since I’ve started using the @NonNull annotation and Findbugs frequently, this was a problem.

After digging around Imenu’s preferences, it seems the problem was in the variable imenu-generic-expression. If no function is specified for imenu-extract-index-name-function (which is not by default in Java-mode),
imenu-generic-expression is used to determine the name of the function. The implementation is essentially to go to the end of the buffer, repeatedly call beginning-of-defun, and use the regex stored in imenu-generic-expression to parse out the function name. This regex is originally set somewhere in the Emacs code base as the following for java-mode:

"[[:alpha:]_][][.[:alnum:]_]+[ 	\n
]+\\([[:alpha:]_][[:alnum:]_]+\\)[ 	\n
]*([ 	\n
]*\\([][.,[:alnum:]_]+[ 	\n
]+[][.,[:alnum:]_][][.,[:alnum:]_ 	\n
]*\\)?)[.,[:alnum:]_ 	\n
]*{"

Realizing that I would have to modify this regexwas not very encouraging, but breaking it into separate pieces and puzzling over the meaning came up with the following commented regex. While still ugly, it is at least no longer as confusing:

(concat
    "[[:alpha:]_]"                    ;start of type
    "[][.[:alnum:]_]+"                 ;end of type
    "[ \t\n
]+"                      ;whitespace
    "\\([[:alpha:]_][[:alnum:]_]+\\)" ;captured function name
    "[ \t\n
]*"                      ;whitespace
    "("                               ;start of arg list
       "[ \t\n
]*"                   ;whitespace
       "\\("
           "[][.,[:alnum:]_]+"        ;type
           "[ \t\n
]+"               ;whitespace
           "[][.,[:alnum:]_]"         ;start of var name
           "[][.,[:alnum:]_ \t\n
]*" ;end of var name, space
       "\\)?"
    ")"                               ;end of arg space    "[.,[:alnum:]_ \t\n
]"*"         ;throws declarations and whitespace
    "{"                               ;open brace

With this done, it was fairly easy to transform into a regex that accepted generics and annotations by adding <, >, space, and @ in the appropriate places. While this will find some malformed functions, I’m OK with this - It still has to be located at the beginning of a function for a name to come up, in which case even if I made a type I want to be able to jump there. The modified regexp ended up looking like this:

(concat
    "[[:alpha:]_]"                       ;start of type
    "[][.[:alnum:]_<> ]+"                ;type
    "[ \t\n
]+"                         ;whitespace
    "\\([[:alpha:]_][[:alnum:]_]+\\)"    ;funname
    "[ \t\n
]*"                         ;whitespace
    "("
    "[ \t\n
]*"                         ;whitespace
       "\\("                             ;argument list
 
           "[][.,[:alnum:]_@<> ]+"       ;annotations/type
           "[ \t\n
]+"                  ;whitespace
           "[][.,[:alnum:]_]"            ;start of var name
           "[][.,[:alnum:]_@<> \n\t
]*" ;end of var name
       "\\)?"
     ")"
    "[.,[:alnum:]_ \t\n
]*"             ;more whitespace, throws declarations
    "{"                                  ; begin fun
   )
)

This still leaves the problem of having this be automatically defined in java-mode, but any experienced emacs user will know how to do this: hooks! Specifically, adding a function that sets imenu-generic-expression to the correct value to java-mode-hook will automatically execute whenever a java buffer is entered.

(add-hook 'java-mode-hook '(lambda ()
 (setq imenu-generic-expression
 `((nil
    ,(concat
    "[[:alpha:]_]"                       ;start of type
    "[][.[:alnum:]_<> ]+"                ;type
    "[ \t\n
]+"                         ;whitespace
    "\\([[:alpha:]_][[:alnum:]_]+\\)"    ;funname
    "[ \t\n
]*"                         ;whitespace
    "("
    "[ \t\n
]*"                         ;whitespace
       "\\("                             ;argument list
           "[][.,[:alnum:]_@<> ]+"       ;annotations/type
           "[ \t\n
]+"                  ;whitespace
           "[][.,[:alnum:]_]"            ;start of var name
           "[][.,[:alnum:]_@<> \n\t
]*" ;end of var name
       "\\)?"
     ")"
    "[.,[:alnum:]_ \t\n
]*"             ;more whitespace, throws declarations
    "{"                                  ; begin fun
   )
 1)
  ))))

Why the old regex is still in emacs itself I don’t know: It is obviously from Java 1.4, which didn’t have templates or annotations, but Java 1.5 has been out for a while. This new version will catch all the occurrences that would be caught by the old regex, so don’t worry about compatibility issues. I submitted a patch to fix this to the emacs-devel mailing list - this seems like something that should be fixed in the main distro - so we’ll see whether or not this will make it in or not.

Tags: , ,

One Response to “Updating Imenu for Java 1.5”

  1. [...] a previous post, I talked about updating imenu-generic-expression for java to work with functions with generic [...]

Leave a Reply