Archive for September, 2009

Scalable Propagation-Based Call Graph Construction Algorithms

Wednesday, September 30th, 2009

This paper goes over several algorithms for constructing call graphs for Java programs. They compare an old algorithm, RTA, and three new algorithms (MTA, FTA, and XTA). The problem is that existing algorithms either are scalable to large programs but compute an excessively large call graph or compute a much more precise call graph but do not scale to larger problems. This paper introduced algorithms that appear to scale to large problems while computing a much more precise call graph.

A call graph is just a directed graph of what methods each method can call. Each method is a vertex in a call graph, and there is an edge from V_1 to V_2 if V_1 can call method V_2 in it’s execution. These graphs are determined by analysis of call sites, which are just Java lines of the form e.M(), or just a method call. Since these functions are dynamically resolved, these call graphs may be more imprecise than strictly necessary, depending on the algorithm used to figure out which call site can call each method. Call graphs are used for whole-program optimization. Methods that are not called from the main method can be removed, if only one method can be called from a call site then it can be inlined instead of called dynamically, etc.

Old algorithms

  • RA - RA, Reachability Analysis, only takes into accound the name of a method. Essentially, the main method is added to the list of reachable methods. For each method that is added to the list of reachable methods, go through all the call sites e.M() in the method and add all methods with name M to the list of reachable methods. This will give a set of reachable methods, and you can use this list to determine which methods are unreachable and can be deleted.
  • CHA - Class Hierarchy Analysis is an extension of RA that takes information about the class
    hierarchy to perform its reachability analysis. It works the same as RA, except instead of adding
    all methods with name m from a call site e.m(), it only adds methods with name m in classes that are subclasses of e.

  • RTA - Rapid Type Analysis is an extension of CHA that takes into account which classes were actually instantiated. For a call e.m(), it will only add those methods with name m that occur in subclasses of e that are instantiated in a reachable method. RTA is still simple to implement, scales, and computes call graphs significantly more precise than CHA.

New algorithms

  • XTA - XTA is an algorithm that uses a distinct set for each method M(S_M) and field x(S_x), in order to give a more local view of what types are available. XTA is defined by five constraints:
    1. The main method is reachable.
    2. For any method C.M, add C to M_X. Add the
    3. If a method is reachable, for all statements of the form “new C()” occurring inside M, add C to S_M.
    4. If a method is reachable, for all fields x that are read, add S_X to S_M.
    5. If a method is reachable, for All Fields x that are written, add the intersection of S_M and all subtypes of the declared types of x to S_x.
  • CTA - CTA uses a distinct set variable S_C for each class C, which unifies the flow of information about both methods and fields. It is defined by adding two constraints to XTA: If a class C defines a method M, S_C = S_M and if C defines a field x, S_C = S_x. This is less precise than XTA, but has the potential to be more scalable.
  • MTA - MTA uses a set variable S_C for each class C, and S_x for every field x. MTA adds this constraint to XTA: If a class C defines a method M, S_C = S_M.
  • FTA - FTA uses a set variable S_C for each class aC, and S_M for each method M. MTA adds this constraint to XTA: If a class C defines a field x, S_C = S_x.

The paper next covered some implementation issues that occurred while implementing these algorithms. The implementations use JikesBT for parsing class files. The new algorithms were added to Jax, which is an application extractor for Java. Jax uses RTA for constructing call graphs, so many of the data structures from Jax could be reused.

The algorithms were constructed in an iterative style, where types are propagated outwards from the main method. XTA had three lists associated with each method and field; one with tyeps that have been propagated to other components, current types that will be propagated in the next iteration, and new types that are propogated to the component in the current iteration. FTA and MTA use a shared set for all the methods and fields in a class.

Arrays are abstracted as classes with a field representing all of it’s elements. Methods are assumed to read an element from array A if an object of type A is propagated to the method, and the mehtod contains an aaload bytecode instruction. Array writes are treated similarly. Exceptions can also cause complex type flow between methods, since several stack frames may be skipped. Instead of implementing some way to handle type propagation of exceptions, there is one set of types that represent the type of all expressions whose static type is of type Throwable, and use that set to resolve method calls on Exception objects. This is not very precise, but the number of types involved will likely be very small. Reflective calls are dealt with by manually supplying the necessary information.

Another issue that had to be solved was the analysis of incomplete applications. Most applications use external libraries which you may not have the source for, such as the java standard library. The solution to this is to have one set of objects for all external classes. This works by assuming that any library call can call any method on an object it is passed. However, this is oftentimes quite conservative, and for some heavily-called methods the analysis is coded to not add types to the global set.

12 benchmark programs were used to test the different algorithms. XTA greatly reduced the number of types available per average in methods from RTA - XTA only has 12.3% of the types being available on average. XTA averages 1.6% fewer possible method definitions than RTA, leading to more methods that can be removed. XTA also computes call graphs with an average of 7.2% fewer edges than RTA. The last comparison, the number of virtual method calls determined to be monomorphic, shows that RTA classifies 7.8% of all call sites as polymorphic and XTA classifies 7% of call sites as polymorphic. MTA and FTA both had better results than RTA, but worse than XTA.

An analysis of the running times of the various algorithms showed that RTA was the fastest, and XTA was slower than RTA by a factor of about 8.3 on average. However, there does not seem to be correlation between program size and how much slower XTA is than RTA, implying they have the same asymptotic running times. MTA and FTA were both surprisingly slightly slower than XTA, although this is probably due to a poor implementation.

XTA, FTA, and MTA all produce more accurate call graphs. FTA is usually not much worse than XTA, and so if space is at a premium it may be used over XTA. XTA is considered the best algorithm for most cases. There does not seem to be a compelling reason to use MTA over RTA.

Emacs Fixes

Monday, September 28th, 2009

This will be amassed post in the vein of this one. I amassed enough changes to my already blogged-about improvements to emacs to go over what I changed. When possible, I’ll link to previous blog posts about them; In some cases I have just minor additions that aren’t big enough to go in their own post.

Hooks are quite nice, but they have some problems with lambda functions. Lambdas in hooks cannot be removed easily or modified; you’ll have to modify the hook variable manually in order to remove or modify that function. After having to play around with many of my hooks, I got tired of setting hook variables to nil and re-adding everything so I changed many of my hooks that used lambdas to instead be defined as regular functions. A list of some of these is below:

(defun turn-on-visual-line-mode ()
  (interactive)
  (visual-line-mode 1))
(add-hook 'text-mode-hook 'turn-on-visual-line-mode)
 
(defun c-mode-common-hook-fn ()
  (interactive)
  (c-toggle-syntactic-indentation 1)
  (setq c-basic-offset 4)
  (c-toggle-hungry-state 1)
  (c-toggle-electric-state 1)
  (flymake-mode 1))
 
(add-hook 'c-mode-common-hook 'c-mode-common-hook-fn)
(add-hook 'c-mode-common-hook 'yas/minor-mode-on)
(add-hook 'c-mode-hook 'c-turn-on-eldoc-mode)

I also started interpreters for most of the languages I have customizations for, so I decided that Python should be no exception. The following code snippet will open a python interpreter as a new buffer.

(python-switch-to-python t)

I also made yet another change to ido-goto-symbol. This function will prompt the user, using ido, for a function in the current file to move to and will then move point to the definition of the function. I updated it so that if the point is on a function name, that name will appear first in the list of options (and thus be the default choice).

(defun ido-goto-symbol ()
  "Will update the imenu index and then use ido to select a symbol to navigate to"
  (interactive)
  (imenu--make-index-alist)
  (let ((name-and-pos '())
        (symbol-names '()))
    (flet ((addsymbols (symbol-list)
                       (when (listp symbol-list)
                         (dolist (symbol symbol-list)
                           (let ((name nil) (position nil))
                             (cond
                              ((and (listp symbol) (imenu--subalist-p symbol))
                               (addsymbols symbol))
                              ((listp symbol)
                               (setq name (car symbol))
                               (setq position (cdr symbol)))
                              ((stringp symbol)
                               (setq name symbol)
                               (setq position (get-text-property 1 'org-imenu-marker symbol))))
                             (unless (or (null position) (null name))
                               (add-to-list 'symbol-names name)
                               (add-to-list 'name-and-pos (cons name position))))))))
      (addsymbols imenu--index-alist)
      (let* ((symbol-at-point (symbol-name (symbol-at-point)))
             (selected-symbol (ido-completing-read
                               "Symbol? "
                               (if (member symbol-at-point symbol-names)
                                   (cons symbol-at-point (remove-if (lambda (x) (string-equal x symbol-at-point))
                                                                    symbol-names))
                                 symbol-names)))
             (position (cdr (assoc selected-symbol name-and-pos))))
        (if (markerp position)
            (goto-char position) (goto-char (overlay-start position)))))))

I talked about my methods to update dired here. In the comments was a suggestion to just use auto-revert-mode. This greatly simplifies the code for this functionality, and solves some of the issues I mentioned in that post, so I now use it instead for dired buffers.

(defun turn-on-auto-revert-mode ()
  (interactive)
  (auto-revert-mode 1))
 
(add-hook 'dired-mode-hook 'turn-on-auto-revert-mode)

ff-find-other-file will go to a ‘matching’ file in a C or C++ buffer. If on a #include line, it will open the file being included; otherwise, if you are in a header file it will go to the corresponding implementation file and go to the corresponding header if you are in an implementation file. This is quite useful, so it now has it’s own keybinding.

(define-key	 c-mode-base-map (kbd "C-c o") 'ff-find-other-file)

I ended up changing the implementation of my auto-indentation functions. I moved the list of major-modes that this would be active in to a new list which both yank and yank-pop check. Also in the comments of that post was a suggestion to advise kill-line instead of replacing it, and the advice works quite well so I am now using it.

(defvar programming-major-modes
  '(emacs-lisp-mode scheme-mode lisp-mode c-mode c++-mode objc-mode latex-mode plain-tex-mode
                    java-mode)
  "List of programming modes")
 
(defadvice yank (after indent-region activate)
  (if (member major-mode programming-major-modes)
      (let ((mark-even-if-inactive t))
        (indent-region (region-beginning) (region-end) nil))))
 
(defadvice yank-pop (after indent-region activate)
  (if (member major-mode programming-major-modes)
      (let ((mark-even-if-inactive transient-mark-mode))
        (indent-region (region-beginning) (region-end) nil))))
 
(defadvice kill-line (after fixup-whitespace activate)
  "Call fixup-whitespace after killing line."
  (save-excursion
    (if (looking-back "^\s*")
        (funcall indent-line-function)
      (fixup-whitespace))))

I also added a few new keybindings. I recompile frequently, so a keybinding for it seems appropriate. So does shortening the keybinding for goto-line.

(global-set-key (kbd "<f2>") 'recompile)
(global-set-key (kbd "M-g") 'goto-line)

The last improvement I have is yet another change to imenu-java-generic-expression. I’m still waiting for the FSF to get my copyright assingment forms so that this becomes a part of emacs proper, but until then I updated the regexp to work with functions that throw Exceptions as well.

(defun set-java-generic-expression ()
  (setq imenu-generic-expression
        `((nil
           ,(concat
             "[" c-alpha "_][\]\[." c-alnum "_<> ]+[ \t\n\r]+" ; type spec
             "\\([" c-alpha "_][" c-alnum "_]*\\)" ; method name
             "[ \t\n\r]*"
             ;; An argument list htat is either empty or contains any number
             ;; of arguments.  An argument is any number of annotations
             ;; followed by a type spec followed by a word.  A word is an
             ;; identifier.  A type spec is an identifier, possibly followed
             ;; by < typespec > possibly followed by [].
             (concat "("
                     "\\("
                     "[ \t\n\r^M]*"
                     "\\("
                     "@"
                     "[" c-alpha "_]"
                     "[" c-alnum "._]""*"
                     "[ \t\n\r^M]+"
                     "\\)*"
                     "\\("
                     "[" c-alpha "_]"
                     "[\]\[" c-alnum "_.]*"
                     "\\("
 
                     "<"
                     "[ \t\n\r^M]*"
                     "[\]\[.," c-alnum "_<> \t\n\r^M]*"
                     ">"
                     "\\)?"
                     "\\(\\[\\]\\)?"
                     "[ \t\n\r^M]+"
                     "\\)"
                     "[" c-alpha "_]"
                     "[" c-alnum "_]*"
                     "[ \t\n\r^M,]*"
                     "\\)*"
                     ")"
                     "[.," c-alnum " \t\n\r^M]*"
                     "{"
                     )) 1))))
 
(add-hook 'java-mode-hook 'set-java-generic-expression)

That’s all for today - if you have any additional remarks, let me know in the comments.

Emacs-Vc

Friday, September 25th, 2009

Emacs has integrated version control support for numerous VC backends, including Git and Perforce. It gives a set of generic commands that you can use to interact with each version control system the same way. I’ve been using this for vc commands that don’t involve submitting or changelists. Modifying changelists I just do through a shell buffer, and I tend to go to an external shell to do commits, and I haven’t ever gotten emacsclient working properly.

The emacs package that synchronizes with The following is just a list of some of the more useful VC functions. I don’t use all of them, but a lot of them I do and I should probably use the others more than I do. Many of them have keybindings, which are suggested when you perform the command with M-x. Since Perforce is non-free, Emacs doesn’t come with a wrapper for it, but you can download them from here.

vc-annotate - This will display the current file in a new buffer. This buffer will have the revision each line came from, and the user who added that line. Additionally, the buffer will be color-coded according to how old lines are.

vc-annotate-find-revision-at-line - When in a buffer generated from vc-annotate, this will create a buffer containing the contents of the file at whatever revision the current line came from.

vc-annotate-show-changeset-diff-revision-at-line - When in vc-annotate, this will show the diff of the checkin that added the current line.

vc-annotate-show-diff-revision-at-line - When in vc-annotate, this will shown the diff *for this particular file* of the checkin that added the current line.

vc-annotate-show-log-revision-at-line - When in vc-annotate, this will show the checkin message for the commit that added the current line.

vc-create-repo - This will prompt you for a type of repository (svn, bzr, git, etc) and create a new repository in the current directory.

vc-delete-file - This will delete a file and remove it from the repos - for example, it will issue a ’svn rm’.

vc-diff - This will open a buffer containing a diff of the current file with the current file at last revision.

vc-dir - This will show the status of files in the prompted directory in a new buffer. It will tell you which files are unregistered, edited, or added - files in the repository that have not been changed seem to not be displayed. This is much like a ’svn status’ command.

vc-merge - This will merge changes between revisions/branches, bound to C-x v m.

vc-next-action - This will perform what it thinks the next action will be. If in an unchecked file, it will add it to the repos. If you’ve edited the file, it will perform a checkin on that file.

vc-print-log - This will open a buffer with all the checklist comments that apply to the current file.

vc-register - This will add a file to the version control system in the directory - will perform a ’svn add’, ‘p4 add’, etc.

vc-rename-file - This command renames a file in the version control.

vc-resolve-conflicts - vc-resolve-conflicts invokes ediff to resolve conflicts for the current file.

vc-revert - This will revert the current file, losing all changes you’ve made since the last checkin.

vc-revision-other-window - This asks for a revision and then displays that revision of the current file in another window.

vc-update - This will update your copy of the repository to the current status . Essentially, performs a ’svn up’ or ‘p4 sync’.

vc-version-diff - This will prompt for two revisions of the current file and display the diff of those two revisions.

Closures in Java

Wednesday, September 23rd, 2009

Advanced Topics in Programming Languages: Closures for Java

This Google Tech talk was quite interesting, although a bit depressing since it seems unlikely that Java will get closures anytime soon. Still, it was a great talk on how closures could be added to Java cleanly.

The talk starts off with describing use cases for closures. The two main issues were to enable control abstraction APIs and to make API calls easier. Enabling control abstraction APIs means allowing programmers to define abstractions like the foreach loop without having to wait for them to be added to Java itself. Making API calls easier by removing the reliance on anonymous inner classes (which closures get compiled to) also looks quite nice.

The talk first presents a way of defining foreach using anonymous inner classes, and shows the problems that occur with this approach. The first is an issue with toString and lexical scoping, and then goes on to the problems with Exceptions, return and break statements, being unable to access non-final variables, etc. Many of these problems have solutions, but solving these problems has nothing to do with what you are trying to express and solving them just obscures the business logic.

A few examples of where closures would be useful are then presented. The first was a withLock construct, though the concept also translates to streams. The accepted Project Coin proposal on automatic resource management implements a subset of what you would be able to do with closures, using them to automatically close streams. Another example is performing timing on functions - without closures, this adds at least 7 lines of non-business-logic code to each method you want timing information on, whereas you could program a withTiming closure and only take two lines.

The talk outlines the requirements than any proposal for adding closures to Java must implement. The first is that it must be able to create control abstractions that are on par with built-in statement forms. They must be concise, or there’s no advantage in using them. Lastly, they must inter-operate with existing APIs that currently take anonymous inner classes.

The details of the proposal, which were quite interesting, were then presented. This proposal adds function types to Java - for example int=>int represents a function that takes an int and returns an int. Function types are implemented as an interface generated on-demand with an ‘invoke’ method. Since they are regular interfaces, you can extend them, make them serializable, all of the things you can do with regular interfaces. The syntax for closures is:

     {int x => x + 1}

Which represents a closure that takes an int and returns it’s increment. Unfortunately, closures do not have any way to yield a result early - returns will return from the function the closure is in, not the closure itself. The way these closures are made to be inter-operable with existing APIs is through a process called ‘closure conversion’, which will make a closure auto-implement an interface such as runnable if the argument is a Runnable. The compiler is required to enforce this, so you can pass in closures to API calls requiring a Runnable and it will work as expected.

To simulate build-in control abstractions, such as withLock, you don’t want to have to pass in a closure as an argument. The proposal thus suggests that functions requiring a closure as the last argument can be expressed as

withLock( lock ) { doSomething(); };

instead of

withLock( lock, {=>doSomething();});

which is much less cumbersome.

These closures work properly - they can access local variables, throw exceptions, and are interoperable with existing APIs. You also have completion transparency, which is the ability to create functions which return the same type as a closure they take as an argument. This allows for constructs such as withLock to not need to worry about the return type of the closure; you can define one withLock function and then be able to return calls to withLock as the proper type.

The talk wraps up with a number of examples of how closures could be used, and then a link to where you could get more information on the proposal. This was a great talk, and I wish the proposal was going to be implemented in Java 7.

More Emacs Refatorings

Monday, September 21st, 2009

I ended up performing a few more refactorings to my initialize code in order to make it work a bit better, so I thought I’d share them with you. The first is adding another subdirectory to my customizations directory, so the tree of my .emacs.d directory in svn looks like:

~/emacs.d/
        customizations/http://nflath.com/wp-admin/post.php?action=edit&post=377
                1/
                2/
        documentation/
        java-libraries/
        major-modes/
        minor-modes/
        utilities/

This allows me to have finer control over the order in which my customizations execute. Mainly, almost every file is in directory customizations/1/: the only one that is in customizations/2 is state.el, which will open all previously visited files. I want this to happen after my hooks have been added to modes so the buffers are set up how they should be; this wasn’t quite happening with one directory. This does mean my init.el file now has these two lines in it to load the two directories in the proper order:

(setq customizations-directory-load-times-1 (load-directory "~/.emacs.d/customizations/1/"))
(setq customizations-directory-load-times-2 (load-directory "~/.emacs.d/customizations/2/"))

I also removed several directories from my version control repository, although they are in myt actual .emacs.d repository. Specifically, the slime, clojure, clojure-contrib, swank-clojure, and emacs-w3m folders are all gone from SVN. However, since I still want these accessible from wherever I check out, I added a few lines to clojure.el and w3m-cust.el to create these files if they don’t exist:

(if (not (file-exists-p (concat clojure-src-root "/clojure")))
    (clojure-install))
 
(if (not (file-exists-p "~/.emacs.d/emacs-w3m/"))
    (let ((default-directory "~/.emacs.d/"))
      (shell-command-to-string
       "cvs -d :pserver:anonymous@cvs.namazu.org:/storage/cvsroot co emacs-w3m")
      (let ((default-directory "~/.emacs.d/emacs-w3m/"))
        (add-to-list 'load-path default-directory)
        (normal-top-level-add-subdirs-to-load-path))))

Essentially, the first time I start emacs on a new checkout, clojure-install will be run, retrieving all the files I need to start editing Clojure files. The same thing happens with emacs-w3m: If the directory doesn’t exist, it is checked out and the directories are added to the load-path. This can take a while, but I’d have to check the directory out anyway, and this allows for a much easier package update. I also ended up updating clojure-mode.el itself, which is checked into version control.

C-eldoc-mode

Friday, September 18th, 2009

I’ve talked about eldoc in the past, a minor mode that displays arguments to functions in the echo area. Eldoc is mainly used for lisp and scheme files, although it also works with Python, but something fewer people realize is that there is a mode that turns eldoc on for C files.

Unfortunately, this package does not come built in to GNU Emacs. You can obtain it here and put it in your load path. It doesn’t work with C++ files; only straight C files can take advantage of it. This is better than nothing, anyway. Enabling it is quite simple: Add c-eldoc.el to your load path and add the following to your initialization file:

(require 'c-eldoc)
(add-hook 'c-mode-hook 'c-turn-on-eldoc-mode)

Flymake

Wednesday, September 16th, 2009

Flymake is a emacs package for on-the-fly syntax checking. IDEs such as emacs will highlight syntax errors in red without needing to compile the program, allowing you to fix errors as you are typing; Flymake implements this for emacs. Flymake works by using an external tool(usually a compiler) and checking the output of it for errors to highlight. Flymake also provides tooltips over these errors to explain what went wrong.

Flymake’s customization is a huge pain and fairly incomprehensible. There’s a TODO item in Emacs for someone to fix it, if you’re interested in cleaning it up at all. Having looked at it, I’d actually rather just reimplement it than attempt to fix it so that the configuration isn’t as bad. One of the main problems with Flymake is that it relies on functions defined within itself in order to actually run the syntax check; to change how you want syntax checking to be performed, you have to load flymake and override these functions in your initialization.

For C++/Java, there were two functions that I needed to override. One was understandable; the flymake-get-make-cmdline returns the command to call ‘Make’ with. The default version of this function adds arguments to make that tended to fail, so I replaced it with a much simpler version:

(require 'flymake)
(defun flymake-get-make-cmdline (source base-dir)
  (list (car (flymake-split-string compile-command " "))
        (cdr (flymake-split-string compile-command " "))))

This version just uses the buffer-specific compile-command in order to determine how to syntax check the buffer. If I was to fix flymake, this would be one of the key changes; it would respect compile-command instead of providing it’s own defaults for compiling different types of buffers. There would be an option to change it, but in all cases it would default to using compile-command.

The other function I had to override I was not entirely sure why was failing. It is the function called after syntax-checking is complete, and can turn off flymake-mode if any errors are encountered in the actual command used. It was reporting that the ‘make’ command was returning error code ‘2′ in some cases when there was no output, and so I modified it slightly to pass on that exit-status as well.

(defun flymake-post-syntax-check (exit-status command)
  (setq flymake-err-info flymake-new-err-info)
  (setq flymake-new-err-info nil)
  (setq flymake-err-info
        (flymake-fix-line-numbers
         flymake-err-info 1 (flymake-count-lines)))
  (flymake-delete-own-overlays)
  (flymake-highlight-err-lines flymake-err-info)
  (let (err-count warn-count)
    (setq err-count (flymake-get-err-count flymake-err-info "e"))
    (setq warn-count  (flymake-get-err-count flymake-err-info "w"))
    (flymake-log 2 "%s: %d error(s), %d warning(s) in %.2f second(s)"
                 (buffer-name) err-count warn-count
                 (- (flymake-float-time) flymake-check-start-time))
    (setq flymake-check-start-time nil)
 
    (if (and (equal 0 err-count) (equal 0 warn-count))
        (if (or (equal 0 exit-status)
                (equal 2 exit-status))
            (flymake-report-status "" "")	; PASSED
          (if (not flymake-check-was-interrupted)
              (flymake-report-fatal-status "CFGERR"
                                           (format "Configuration error has occured while running %s" command))
            (flymake-report-status nil ""))) ; "STOPPED"
      (flymake-report-status (format "%d/%d" err-count warn-count) ""))))

I had one other issue with flymake; tooltips for overlays are not displayed in the echo area when point is over them. js2-mode has this functionality, and it is very useful to not have to use the mouse or compile yourself to see the exact error message for each line. I looked at how js2-mode implemented it, which turned out to be with text properties. Strings in emacs *can* have a function called when point enters them: it is only overlays that do not. js2-mode created both an overlay and modified the text properties at that location in the buffer, which is a somewhat inelegant solution. Asking on emacs-devel clued me into the help-at-pt package, which if enabled will display any tooltips for location at point in the echo area. To enable this, put the following in your initialization file:

(setq help-at-pt-timer-delay 0)
(help-at-pt-set-timer)

Help-at-pt uses the idle-timer in order to determine when to display tooltips; after help-at-pt-timer-delay seconds, it will display the tooltip at point in the echo area. A value of 0 will have the tooltip display immediatly. This isn’t as elegant as giving overlays the ability to run functions when entered, but it works and is part of emacs-core.

Intro to Subversion

Monday, September 14th, 2009

Subversion is a version control system meant to improve upon CVS, not that that’s particularly difficult. It’s the one I’m currently using for my personal projects - I tried Git and didn’t like it very much, and my favorite VCS, Perforce, is unfortunately not free. This is just an introduction to basic SVN commands.

svn co - SVN checkout, or co, is what you use to first check out a repository. To use it, just do svn co LOCATION in the directory you want the repos to be checked out in. For example, to checkout a new copy of my .emacs.d directory, you can just do ’svn co http://www.nflath.com/svn/.emacs.d’. You can use this to specify which revision you want to check out, as well.

svn add - To add a file to a repository, just do ’svn add filename’. This lets svn know that it should start tracking changes to this file.

svn mv - This will move a file in your repository. You can just do a ’svn rm’ and ’svn add’ after using shell commands to copy a file, but if you use this command you can keep history of the file before you renamed it, which you can’t if you do it by adding and removing files.

svn cp - This acts just as ‘cp’ does in a shell, but also moves it in the SVN repository tracking. If you need to copy a file that is already in the repository and add it to the repos, use this instead of ‘cp’.

svn rm - This acts just as ‘rm’ does in a shell, but also moves it in the SVN repository tracking. If you need to remove a file in the repository, use this instead of ‘rm’.

svn cl - This command creates a subversion changelist. While these aren’t as good as perforce changelists, but they are still quite useful if you only want to check in a subset of what you are working on. A changeless is a subset of your modified files that you can check in as a group, without checking in anything else. This is what I use when I have distinct sets of changes that I have been working on simultaneously and want to check in with different commits. With svn cl, you can also remove files from the changelist.

svn diff - Svn diff allows you to either diff the current file you have against a version in the repository, or two different revisions of a file. Used when you want to see what you or someone else changed in a file.

svn ci - This is what you use when you are finished editing and want to make a checkin. This will open up an editor with the files you are about to check in, and prompt for a log message. svn ci –cl CLNAME will perform a checkin on the files in CLNAME. Unfortunately, unlike in perforce, deleting files from the list of files to check in doesn’t remove them from the checkin itself; I’ve been bitten by this quite a bit.

svn revert - This command is the standard version-control revert command. This will undo any changes to the specified file since the last commit.

svn up - This command will update the repository you have to sync with the server copy, retrieving updates made in another location.

There are of course many more SVN commands, but these are the ones I use most frequently.

Perforce

Friday, September 11th, 2009

My work uses perforce for source control, so I ended up having to install perforce integration for emacs. I’m pretty OK with this, since Perforce is so far my favorite version control system. I would like it if perforce allowed at least some functionality when not connected to the network, but I really prefer the interface of p4 to git or svn. As it is, Perforce is required to be able to connect to the central repository in order to do anything.

If you are using perforce, it’s probably for work, in which case they’ll probably have a guide on setting up your client. I don’t actually know a huge amount about setting perforce up: the places I’ve worked that use it have had it pre-set-up, so creating a repository and connecting to it may be ridiculously hard, but using it is quite easy.

Perforce has a very large number of commands; to see the full list, you can type ‘p4 help commands’. Fortunately, you won’t ever need to use most of those. There’s a pretty core group of about ten commands that I end up using frequently, which I’ll describe in a minute; the rest you can look up as you need to find out how to do something. Perforce also has a graphical interface you can use if you prefer that, but I find the command-line much easier. The commands I tend to use most are:

p4 add - This command will add a file to the repository. Before adding files, the version control system has no notion of it, so make sure to add all necessary files before committing.

p4 change - This command allows you to create a changelist. A changelist is a collection of files that you can commit together - basically it allows you to create groups of files that can be operated on. When you execute ‘p4 change’, a screen much like the ‘p4 submit’ screen will pop up, and you can edit the description and delete lines corresponding of files to remove them from the changelist that will be created.

p4 delete - This command deletes a file from the repository. Unlike p4 obliterate, history is kept of this file, but you do need to actually ‘p4 delete’ files that should not be checked out instead of just ‘rm’ing them.

p4 diff - p4 diff allows you to diff either a file on your workspace against a revision of that file in the repository, or two different repository revisions.

p4 edit - p4 edit will open up a file that currently exists in the repository for editing. The next time you do a p4 submit, the edited file will be in the list of files to submit.

p4 help - Perforce’s help command will let you know more about the system and commands. If you want more info about a command, ‘p4 help ‘ will give you the documentation for that command. ‘p4 help commands’ lists every perforce command. Other than that, you can just browse through the help categories by reading the options under ‘p4 help’.

p4 move - This command moves a file that is opened either for edit or add in your repository. ‘p4 rename’ will describe how to use this function.

p4 opened - This command will list every file you currently have opened. Similar to ’svn status’ for subversion, you can use it to ensure that every file has been added to the repository before committing.

p4 resolve - This command is used to resolve conflicts that occur when syncing. Resolving is always annoying, but this does what it can in order to be able to resolve conflicts. You can edit the file to remove them, accept either version of the file, and other options that are described when you use this command.

p4 revert - The standard version-control ‘revert’ command, this will undo changes to one or many files.

p4 submit - This command is used to submit your changes to the repository. When you type submit, you’ll receive a list of opened files that will be committed. Unlike SVN, deleting files from this list will prevent them from being committed. You can then type in a description of changes and exit and the commit will be pushed to the repository. You can also submit specific changelists, with ‘p4 submit -cl ‘. This will submit the changelist with the description of the changelist as the description of the commit.

p4 sync - The equivalent of ’svn up’, this will pull all the changes made to the repository since you last synced. This can require doing some resolving, which is done with ‘p4 resolve’. This can be a pain, but if you sync frequently you shouldn’t get very many conflicts.

Of course, to be maximally efficient you want to use Perforce directly from your editor. Vim, Emacs, and Eclipse all have ways to do this. In Emacs, you need to find an external elisp file for this. There are two main options for using perforce in emacs: p4.el and vc-p4.el. P4.el is more powerful - for example, it can operate on groups of files - but vc-p4.el contains hooks for emac’s usual source control actions, like vc-next-action. Both are compatible with each other, and you can use vc-p4 for most things while dipping into p4 for the difficult options. Personally, I only use p4.el, since I also don’t like Emacs’ version control integration too much for reasons I’ll talk about in another blog, and so the setup for me is simple. You download p4.el from here and put it in your load path, and then add the following to your initialization file:

(require 'p4)

All of the commands that you can use in this package are prefixed by p4, and are usually just p4-commandname. For example, to do a submit from emacs is just ‘p4 submit’. Also, most of the commands have keybindings that are ‘C-x p ‘ followed by the first letter of the name - so to add a file you would do ‘C-x p a’, edit is ‘C-x p e’ etcetera. There are a few exceptions, such as ‘p4-diff’ being ‘C-x p =’, but for the most part the keybindings are quite logical. You can look at this keybindings by doing M-x describe-function on the function you want, which tells you what the function does and any keybindings it has.

I haven’t installed vc-p4.el, but you can look at the emacs-wiki page here to see how to set it up.

js2-mode

Wednesday, September 9th, 2009

I’ve started having to do some javascript editing with Emacs for work, and so I installed Steve Yegge’s js2-mode. You can read his description of the mode here. His description is fairly extensive, so you should probably read it instead of or in addition to mine. The small amount I’ve used it has made me like it more than most programming modes Emacs has.

js2-mode, unlike most other modes, comes with a javascript parser. This allows it to know much more about the code, since it can pull eclipse-like tricks of highlighting errors and warnings in your code. Flymake-mode does this for other languages, but it’s not nearly as well integrated as js2-mode’s system. If you have an error, the code causing the error is underlined in red with a message in the minibuffer when your point is over it; if a warning, underlined in orange. This is incredibly useful.

js2’s indentation is more like python-mode’s indentation than cc-mode. It isn’t customizable, which hasn’t so far come up as a problem for me, but if you have very strict code style guidelines you may have to do some indenting manually. It provides several options for cycling through possible indentations, and one of them is usually right. Still, it can be somewhat irritating that it doesn’t just indent correctly all the time.

js2 does a few other things that are quite nice. Pressing enter inside a string will extend it to the next line, which is pretty much always what I want to do. I haven’t used it very much - I’ve mostly just been doing very like editing with it - but it is very customizable through the customize mechanism. It also supports hiding of blocks, which I don’t tend to use, but to each their own.

js2-mode is going to be built-in to GNU Emacs 23.2, but if you don’t want to wait until then to use it you have to download and enable it yourself. To enable js2-mode in your emacs, first you must download and byte-compile js2.el from here. If you don’t byte-compile the file, you’ll have to set js2-mode-must-byte-compile to nil in order for it to work at all, but byte compiling the file will make it work much faster. Once that is done, put the following lines in your initialization file to start js2-mode for all .js files:

(autoload 'js2-mode "js2" nil t)
(add-to-list 'auto-mode-alist '("\\.js$" . js2-mode))