Archive for July, 2009

YASnippet - Snippets for Emacs

Saturday, July 18th, 2009

I added another package back to my emacs configuration: yasnippet. Yasnippet is supposed to be based off of TextMate’s snippet system, which I have never used but have heard it is quite good. Yasnippet allows you to define your own ’snippets’ which are inserted into your file when you type a certain phrase and hit the trigger key, which is defaulted to tab. For example, in c, if you type ‘if‘, this gets expanded to

if (__)
{
 
}

Where your point is on the _. Once you finish inserting the condition, you can press TAB again to rotate to the next spot defined in the snippet - in this case, the body of the if statement.

These snippets are all kept in a directory as text files, in a folder heirarchy of modes. For example, to modify that if snippet, you would modify the “snippets/text-mode/cc-mode/if” file in your snippets folder. If you just open it up, you will see:

#name : if (...) { ... }
# --
if (${1:condition})
{
    $0
}

The #’s are just comments, and the rest of the snippet will be inserted, with the $’s being the location where the TAB goes to, with it going from $1 to $2 to … to $0. This snippet will be called by the trigger of ‘if’, which is the file name. For more information on editing snippets, see this.

To add yasnippet to your emacs startup, add the following to your init file:

(require 'yasnippet)
(yas/initialize)
(yas/load-directory "~/.emacs.d/yasnippet-0.5.10/snippets")

Where in the call to yas/load-directory you put the location of your snippets folder.

AVL Tree Implementation in Clojure

Friday, July 17th, 2009

To learn Clojure (and data structures) better, I decided to write an AVL tree implementation in Clojure. It was complicated enough that I ran into problems with Clojure, while still being doable in about a day, so it was about the right size for what I wanted to do.

An AVL tree is a balanced binary search tree, optimized for fast lookups. It’s much like a Red-Black tree, except it has faster asymptotic lookup times. At any node, the difference between the height of the node’s subtrees can be at most 1 - if this invariant is not true, the tree is not a valid AVL tree. The implementation I wrote is entirely immutable: It doesn’t ever change the data on a node, just creates a new one. This is great for concurrency, since there aren’t ever any locking issues on immutable data.

(import '(java.util Random))
(use 'clojure.contrib.test-is)

These are just the imports needed to implement the AVL Tree. The only Java library you need if for generating random numbers, and Clojure’s test-is framework was what I used for unit-testing some of the smaller methods.

(defstruct avl-tree :data :height :left :right)

Defining an avl-tree structure. Ended up only using it it one place and just overwriting attributes with assoc usually, but this also serves as documentation of the structure.

(defn get-height
  "Returns the height of an AVL tree, or -1 if nil"
  ([tree] (if tree (tree :height) -1))
  {:test (fn []
	   (is (= (get-height {:height 0}) 0))
	   (is (= (get-height nil) -1)))})

This gets the height of an AVL tree, or just -1 if the tree doesn’t exist. This is very useful, since it allows treating empty-subtrees the same as actual nodes for purposes of height calculations. If we just attempted to do (tree :height), a null pointer exception could be thrown, instead of returning -1.

(defn balance-factor
  "Returns the height of the right subtree - height of left subtree"
  ([tree] (- (get-height (tree :right)) (get-height (tree :left))))
  {:test (fn []
	   (is (balance-factor {:right {:height 3} :left {:height 5}}) -2))})

This just returns the balance factor, which the height of the right subtree minus the height of the left subtree. This is less than 2 for all nodes in a AVL tree.

(defn rrotate [tree]
  "Performs a right rotation on an AVL tree"
  (let [right-height (inc (max (get-height (tree :right))
       		     	       (get-height ((tree :left) :right))))]
    (assoc (tree :left)
      :height (inc (max right-height (get-height ((tree :left) :left))))
      :right (assoc tree
	       :height right-height
	       :left (if (tree :left) ((tree :left) :right) nil)))))
 
(defn lrotate [tree]
  "Performs a left rotation on an AVL tree"
  (let [left-height (inc (max (get-height (tree :left))
       		    	      (get-height ((tree :right) :left))))]
    (assoc (tree :right)
      :height (inc (max left-height (get-height ((tree :right) :right))))
      :left (assoc tree
	      :height left-height
	      :right (if (tree :right) ((tree :right) :left) nil)))))

These methods perform left and right rotations on AVL trees - they also work for any type of binary tree, but for those they will append extra information(probably incorrect) about the height. Tree rotations are used in AVL and Red-Black trees to maintain the invariants. They are operations that can be performed that rearrange the nodes in a binary search tree in a way that the tree is still valid, essentially by ‘rotating’ about a pivot node. For more information about these, look here.

(defn balance [tree]
  "Return a balanced version of the AVL tree"
  (if tree
    (if (< (Math/abs (balance-factor tree)) 2)
      tree
      (cond
	(= (balance-factor tree) 2) (if (>= (balance-factor (tree :right)) 1)
				      (lrotate tree)
				      (lrotate (assoc tree :right (rrotate (tree :right)))))
	(= (balance-factor tree) -2) (if (<= (balance-factor (tree :left)) -1)
				       (rrotate tree)
				       (rrotate (assoc tree :left (lrotate (tree :left)))))))))

This method balances an unbalanced tree, which can arise after insertion or deletion of a node from an AVL tree. Since the tree was balanced before the node was inserted or removed, the maximum difference in subtree heights is 2, and if the balance factor for a node is either 2 or -2 you must rebalance the tree by performing rotations based on the balance factor of the subtrees. If the balance factor of the current node is 1, 0, or -1, the node is a valid AVL tree and does not need to be rebalanced. However, nodes closer up the root may be invalid AVL trees and so need to be rebalanced.

(defn predecessor [tree]
  (#(if % (if (% :right) (recur (% :right)) %)) (tree :left)))

This finds the predecessor of a tree, or the largest node smaller than the current node. This makes use of the # reader macro, which creates an anonymous function, and %, which stands as the first argument to the function in one of these anonymous functions. It also uses the recur form, used to specify that this recursive call should be tail-call optimized. The recur is mainly used in this context to recursively call the anonymous function; the tree would have to be very large for this to throw a StackOverflowException.

(defn tree-lookup
  "Returns the data from the tree corresponding to val if it exists, otherwise nil"
  ([tree val < >]
     (if tree
       (cond
	 (< (tree :data) val) (tree-lookup (tree :right) val < >)
	 (> (tree :data) val) (tree-lookup (tree :left) val < >)
	 true (tree :data))
       nil))
  ([tree val] (tree-lookup tree val < >)))

This is the standard binary search tree lookup function, generalized so that custom comparators can be passed in. It works on any type of BST, assuming they are implemented as maps with a :left and :right field. I also like how clojure and other lisps allows you to define operators such as < and >, making for much cleaner-looking code. This function has multiple different ways to call it: you can call with the specifying all of [tree val < >], where tree is the tree to look in and val is the value to look for, or just [tree val], where < and > will default to numeric < and >.

(defn avl-insert
  "Inserts a new node into an AVL tree"
  ([tree val < >]
     (if tree
       (balance
	(cond
	  (> (tree :data) val)
	  (let [left (avl-insert (tree :left) val < >)]
	    (assoc tree
	      :height (inc (max (get-height left) (get-height (tree :right))))
	      :left left))
	  (< (tree :data) val)
	  (let [right (avl-insert (tree :right) val < >)]
	    (assoc tree
	      :height (inc (max (get-height (tree :left)) (get-height right)))
	      :right right))
	  true (assoc tree :data val)))
       (struct avl-tree val 0)))
  ([tree val] (avl-insert tree val < >)))

This is the function for inserting a node into an AVL tree. It also allows you to specify comparators - I’d like to be able to specify them on the AVL tree itself somehow so there was less chance of mistakes by passing in different < and > functions, but I couldn’t figure out a way to do it without having to have the comparators specified whenever you created a new node anyway, which still leaves the problem. If the tree you pass in is nil, it just creates a new avl-tree with that value and returns it; otherwise, it does a standard BST insert and calls balance on each subtree it visited. While balance doesn’t need to be called on every one, the calls to balance that are unnecessary will do no work anyway and so probably aren’t much of a performance hit.

 (defn avl-remove
   "Removes a node from an AVL tree"
   ([tree val < >]
      (if tree
	(balance
	 (cond
	   (< (tree :data) val) (let [right (avl-remove (tree :right) val)]
				  (assoc tree
				    :right right
				    :height (inc (max (get-height (tree :left))
						      (get-height right)))))
	   (> (tree :data) val) (let [left (avl-remove (tree :left) val)]
				  (assoc tree
				    :left left
				    :height (inc (max (get-height (tree :right))
						      (get-height left)))))
	   true (if (not (= (tree :height) 0))
		  (let [new-tree
			(if (predecessor tree)
			  (let [left (avl-remove (tree :left) ((predecessor tree) :data))]
			    (assoc tree
			      :data ((predecessor tree) :data)
			      :height (inc (max (get-height (tree :right))
						(get-height left)))
			      :left left))
			  (tree :right))]
		    (assoc new-tree
		      :height (inc (max (get-height (new-tree :left))
					(get-height (new-tree :right)))))))))))
   ([tree val] (avl-remove tree val < >)))

This is the function to call to remove a node from an AVL tree, and it is the most complicated function is the set. To remove it, you descend down the subtree until you find the node you wish to remove. If it is a leaf node, you can just remove it. If the left subtree is nil, the node is just replaced with the right subtree of the node. Otherwise, the node is given the data of it’s predecessor and then the predecessor is removed from the left subtree. This is all done immutably, of course. Once this new node is created, the tree is balanced at each node leading up from where the node was finally removed from to the root node, ending with a balanced AVL tree.

(defn assert-correct-heights [tree]
  (if tree
    (do
      (assert (< (balance-factor tree) 2))
      (assert (.equals (tree :height) (inc (max
					    (assert-correct-heights (tree :left))
					    (assert-correct-heights (tree :right))))))
      (tree :height))
    -1))

This is a function I used to test an AVL tree. It will go through the tree, make sure that all the heights are correct and that the tree is balanced at every node. If this is the case, then the tree is a valid AVL tree and it returns the height. If it is not, an exception is thrown.

(defn test-all []
  (dotimes [x 100]
    (def tree (ref (avl-insert nil 50)))
    (let [rand (Random.)]
      (dotimes [x 100]
	(dosync
	 (alter tree avl-insert (.nextInt rand 1000))))
      (assert-correct-heights tree)
      (def tree (deref tree))
      (dotimes [x 1000]
	(let [t (avl-remove tree x)]
	  (assert-correct-heights t)
	  (assert (not (tree-lookup t x)))))
      (def tree (ref tree))
      (dotimes [x 1000]
	(dosync
	 (alter tree avl-remove x)
	 (assert (not (tree-lookup (deref tree) x)))
	 (assert-correct-heights (deref tree))
	 )))))

This test puts the AVL tree through it’s paces. It creates a reference to an AVL tree so I could conveniently insert items into it. The tree is initially an already-inserted AVL tree because, unfortunately, (ref nil) is not to nil and so was giving null pointer exceptions on the first insert. The tree then has 100 random elements inserted into it, using dosync and alter to mutate tree. Once these insertions are done, this tree is checked to ensure it is a valid AVL tree. The tree then has each of it’s elements removed from it non mutably (so each node is removed once, but this does not effect the next removal which removes from the entire tree), which in a tree of this size will probably test each possible removal scenario. Each time a node is removed, the tree is checked for validity. After this, each node is removed and the tree is mutated one at a time randomly, leaving a nil tree at the end. At each intermediate step, the tree is checked for validity. This entire process is repeated 100 times. This test can take a while to finish, but once it’s done you can be pretty sure the tree works.

Imenu

Thursday, July 16th, 2009

Imenu is a built-in emacs package for navigating around source files. It is a feature that allows you to jump around by scanning on predefined regex’s - mostly to variable of function declarations. Imenu is very useful for quickly navigating around a source file - I recommend you start using it whenever you need to navigate to a different function in the same file. The default for imenu is to use emac’s built in completion for variables; however, I much prefer using ido’s completion, and for that you need the following commands and function in your initialization file:

(require 'imenu)

This just allows you to use the imenu commands.

(defun ido-goto-symbol ()
  "Will update the imenu index and then use ido to select a symbol to navigate to"
  (interactive)
  (imenu--make-index-alist)
  (let ((name-and-pos '())
	(symbol-names '()))
    (flet ((addsymbols (symbol-list)
		       (when (listp symbol-list)
			 (dolist (symbol symbol-list)
			   (let ((name nil) (position nil))
			     (cond
			      ((and (listp symbol) (imenu--subalist-p symbol))
			       (addsymbols symbol))
			      ((listp symbol)
			       (setq name (car symbol))
			       (setq position (cdr symbol)))
			      ((stringp symbol)
			       (setq name symbol)
			       (setq position (get-text-property 1 'org-imenu-marker symbol))))
			     (unless (or (null position) (null name))
			       (add-to-list 'symbol-names name)
			       (add-to-list 'name-and-pos (cons name position))))))))
      (addsymbols imenu--index-alist))
    (let* ((selected-symbol (ido-completing-read "Symbol? " symbol-names))
	   (position (cdr (assoc selected-symbol name-and-pos))))
      (if (markerp position)
	  (goto-char position) (goto-char (overlay-start position))))))

This is the function that uses ido for imenu. Since imenu occasionally has multiple levels - you select whether you want functions or variables, etc - this loops through all those sublists and adds all possibilities to a list. Once the list is compiled, it prompts you using ido for which definition you want to go to.

(setq imenu-auto-rescan t)

This just ensures that whenever you call imenu or the new function ido-goto-symbol you get the latest information, and not only functions which may have been cached.

(global-set-key (kbd "M-s") 'ido-goto-symbol)

I use this function quite frequently, so I rebound it to M-s. M-s is usually a prefix key for a few commands, but the only one I use that this interferes with is occur, but I use that rarely enough that I’m fine with just having to M-x occur. I use ido-goto-symbol enough that I really want it to be as few key presses as possible, so using something like M-s M-s would be pretty irritating.

Booklist Generation in Clojure

Tuesday, July 14th, 2009

I recently decided to learn Clojure, a Lisp that runs on the JVM. Since it does, it can take advantage of all of Java’s libraries. I’ve programmed in both Common Lisp and Scheme before, and so it wasn’t very difficult to pick up, but it may be harder for someone who hasn’t used a Lisp before.

The best way to learn a language is to use it, so I decided to create it to generate the booklist that is now on my sidebar. This was a very simple task, so it only took me about an hour or so while learning Clojure. The code and my explanations of it and some issues I ran into follow: If you know any ways to improve it, please let me know.

 (def book-list
  {:categories
   [
    {:name "Nonfiction - Technical"
     :books [
	     {:title "Hacker's Delight"					:asin "0201914654"}
	     {:title "The Craft of Text Editing"			:asin "0387976167" :blog "http://nflath.com/2009/04/the-craft-of-text-editing"}
	     {:title "On Lisp"						:asin "0130305529"}
	     {:title "Practical Common Lisp"				:asin "1590592395"}
	     {:title "Java Puzzlers"					:asin "032133678X"}
	     {:title "The Little Schemer"				:asin "0262560992"}
	     {:title "The Reasoned Schemer"				:asin "0262562146"}
	     {:title "Beautiful Security"				:asin "0596527489"}
	     {:title "Beautiful Architecture"				:asin "059651798X"}
	     {:title "A Little Java, A Few Patterns"			:asin "0262561158"}
	     {:title "The Algorithm Design Manual"			:asin "1848000693"}
	     {:title "Programming Language Concepts and Paradigms"	:asin "0137288662"}
	     ]}
 
    {:name "Nonfiction" :books nil
     }
 
    {:name "Fiction"
     :books [
	     {:title "Sensei"						:asin "0451411323"}
	     {:title "Angels and Demons"				:asin "1416580824"}
	     {:title "Belgarath the Sorcerer"				:asin "0345403959"}
	     ]}
    ]})

This data structure represents the booklist. It is a nested set that contains all the information about the organizational structure of the books. Clojure (or any lisp) allows you so record your data in ways that are both human-readable and trivially computer-readable. All operations to generate the HTML for the booklist page operate on this data structure.

(defmacro str-concat [& body]
  "Concatenates a list of sequences as a string"
  `(apply str (concat ~@body)))

Unfortunately, the concat method will not work very well on strings. Since concat works on sequences, one of which are strings, if you just concat a list of strings the result will be a sequence of characters, so you have to wrap it in an (apply str …) to get the value as an actual string. I created this as a macro instead of a function because the laziness of Clojure sequences was giving me problems when printing the final result out.

(def intro-string "These are a few of the books I've read - It is incomplete, but I'll try to keep it up to date and blog about any additional technical books I finish reading.<br>")
(def header-link-format-string "<a href="#%s">%s</a><br>n")
(def header-format-string "<h2><a name="%s">%s</a></h2>" )
(def book-link-format-string "<a href="http://www.amazon.com/gp/product/%s?ie=UTF8&tag=randmusiofaso-20&linkCode=as2&camp=1789&creative=9325&creativeASIN=%s">%s</a><img src="http://www.assoc-amazon.com/e/ir?t=randmusiofaso-20&l=as2&o=1&a=%s" width="1" height="1" border="0" alt="" style="border:none !important; margin:0px !important;" />")
(def blog-link-format-string "<t><t><a href="%s">Blog</a><br>n")

These are just a few configuration variables that determine how the list is laid out. They ensure that links are to the correct anchor and that books are linked to Amazon and any blog posts properly, without me having to manually add the link in for each book.

(defn generate-header [book-list]
  "Generates the header of the book-list"
  (str-concat
   (for [category (book-list :categories)]
     (format header-link-format-string (category :name) (category :name)))))

This just loops over the categories in the booklist and creates a HTML link to what will be the main entry. The for macro returns a list of strings, so we have to str-concat them to get our final string. Unfortunately, format can’t refer to one argument multiple times(not that I saw, anyway - if anyone knows how to do this please tell me!), so I have to pass in the category name twice.

(defn generate-body [book-list]
  "Generates the body of the book-list"
  (str-concat
   (for [category (book-list :categories)]
     (str-concat
      (format header-format-string (category :name) (category :name))
      "<table>"
      (str-concat
	(for [book (category :books)]
	  (str-concat
	    "<tr><td>"
	    (format book-link-format-string (book :asin) (book :asin) (book :title) (book :asin))
	    "</td><td>"
	    (if (book :blog) (format blog-link-format-string (book :blog)) "<br>")
	    "</td></tr>n"
	    )))
      "</table>n"
      ))))

This is similar to the previous function, but more complicated. It generates the headings for each section, then loops through each book in each category and outputs the formatting required to link them to amazon and any blog posts I’ve made. The lazy sequences were very annoying here, forcing me to make four nested str-concats in order to get the output I needed when printed instead of just referring to lazy sequences.

(defn print-html []
  "Prints the HTML for the book-list page"
  (println intro-string)
  (println (str-concat (generate-header book-list) (generate-body book-list))))

This function essentially just prints the introduction, the header, and the body of the list. Nothing complicated is happening here.

(print-html)

This just calls the previously-defined function, giving me the HTML I need to paste into WordPress to generate a new page.

I’m working on another Clojure projects, so I’ll post more of my thoughts about the language once I finish it.

More Random Emacs Config

Saturday, July 11th, 2009

Some more additions to my .emacs file (actually, my init.el file now. I decided it was easier to be able to add my entire .emacs.d directory to source control, and keep all of my customizations in it). Most of these were from my old config files - It turns out there’s a reason I had most of that stuff. I’ll just keep re-adding stuff from them if I need it, it turns out I had a fairly comprehensible structure that allows me to easily find specific customizations I had.

(require 'ansi-color) (add-hook 'shell-mode-hook 'ansi-color-for-comint-mode-on)
(setq comint-prompt-read-only t)
(shell)

A few more customizations to my shell configuration. The first line adds colours, as well as fixing the issues I mentioned last time. I had this in my old .emacs file, but did not realize it was what fixed shell-mode to be readable - part of the reason I started over I suppose, to better understand all of my customizations. The last line just starts a shell buffer - since I primarily use the shell from emacs, I always open one anyway. This just saves me from doing it manually.

(setq default-major-mode 'text-mode)
(add-hook 'text-mode-hook 'flyspell-mode)
(add-hook 'text-mode-hook 'visual-line-mode)
(defadvice ispell-command-loop (before ispell-reverse-miss-list activate)
  "reverse the first argument to ispell-command-loop"
  (ad-set-arg 0 (reverse (ad-get-arg 0))))

I added a few things to make writing this blog and other text documents easier on myself. There are pretty much no downsides to setting the default major mode to text instead of fundamental. While in text mode, I also want spellchecking and line wrapping based on my frame size. The advice is something I picked up from the internet somewhere(actually, after investigating, from here), which reverses the order in which ispell makes suggestions. This tends to offer me better suggestions than just the normal order, for whatever reason.

(require 'uniquify)
(setq uniquify-buffer-name-style 'reverse)
(setq uniquify-separator "/")
(setq uniquify-after-kill-buffer-p t)
(setq uniquify-ignore-buffers-re "^\\*")

Uniquify is an built-in emacs package that renames buffers when you open two files with the same name. In the default case, they will merely be numbered, so for example you would have a .emacs buffer and an .emacs<2> buffer. Uniquify will append part of the file path to the file, so instead you’d have .emacs/nflath and .emacs/emacs buffers. uniquify-buffer-name-style and uniquify-separator control how the buffer names are created; describe-variable uniquify-buffer-name-style in order to see all the possible options. uniquify-after-kill-buffer-p controls whether or not buffers are renamed when other buffers were killed: When set to t, when one of the about duplicate .emacs buffers is closed, the other will rename itself to .emacs. The last line specifies not to perform this for special buffers, such as *scratch*

(defun insert-strong-tags ()
  "Inserts <strong></strong> around the word at point"
  (interactive)
  (save-excursion
    (re-search-backward "\\s ")
    (forward-char 1)
    (insert "<strong>")
    (forward-char 1)
    (re-search-forward "\\s \\|$")
    (if (not (eq (point) (point-max)))
	(backward-char 1))
    (insert "</strong>")))
(global-set-key (kbd "C-c i s") 'insert-strong-tags)

I actually wrote this function while writing this blog post. This is one of the great things about emacs: You can immediately solve inefficiencies once you notice them. This function will insert strong tags around the word you are currently on, leaving the point at the same location. This is great when you forget to add an opening tag to the beginning of the word, or are noticing an error you made afterwords and just need to add the tags.

(global-set-key (kbd "C-a") 'back-to-indentation)

I used to have C-a bound to switch between going to the beginning of the line and back to the indentation, but I realized I pretty much never wanted to go to the actual beginning of a line. I rebound C-a to what usually M-m because it’s much easier to type.

(global-set-key (kbd "C-c j") 'join-line)

A useful function that doesn’t have a default keybinding, so I provided one

(add-hook 'before-save-hook 'delete-trailing-whitespace)
(set-face-attribute 'default nil :height 80)
(column-number-mode)

The first of these customizations will delete trailing whitespace from files every time I save. I really, really hate it when I press C-e and am not at what I want to edit, and I haven’t ever encountered a situation where I need to keep trailing whitespace, so I just automate the process of removing it. The second line just sets the size of the font; you should customize this to your liking, depending on your monitor’s resolution. On some versions and monitors, I’ve had to have this be at 90; others I could set the size to 60 without any problems. column-number-mode puts the current column number on the mode line, which I like to know.

I have made a few more customizations, but they’re big enough to warrant separate posts.

Clojure Setup for Emacs

Thursday, July 9th, 2009

I’ve recently started learning Clojure, and so of course the first thing I did was set up emacs to properly be able to format and code in it. To do this, you should first download clojure-mode. Now, manually load the file, with M-x load-file, and then perform M-x clojure-install. This only needs to be done once. The clojure-install function will install all of the necessary dependencies, including slime integration for clojure. Once this is set, I have the following to configure clojure-mode:

(require 'clojure-mode)
(add-to-list 'auto-mode-alist '("\\.clj$" . clojure-mode))
(setq clojure-src-root "/home/nflath/.emacs.d")
(setq swank-clojure-extra-classpaths '())
(clojure-slime-config)

This is admittedly a fairly minimal install. It requires clojure-mode to be installed and then sets clojure-mode to be used on all .clj files. You also have to specify the directory you specified with clojure-install as clojure-src-root. If you have any extra jar files or anything that you want Slime to know about, you need to add them to swank-clojure-extra-classpaths, and then clojure-slime-config will enable slime. Once this is done, M-x slime will load up Slime using clojure, acting as a Clojure repl. You can also send expressions in your .clj files to the repl to be evaluated with C-x C-e and all the usual emacs-lisp evaluation functions.

One note: learning the S-exp movement commands will help tremendously for editing clojure. I have only recently been using them, and they have sped up moving around the expressions by a huge amount. C-M-f is forward-sexp, which will take you to the next s-exp inside the current one, and C-M-b is backwards-sexp, which will do the opposite.

Accessing Javadoc via Emacs

Tuesday, July 7th, 2009

Since I do a lot of programming in Java, both for classes and my internships(so far, two have been primarily Java), I often need to look up javadoc. However, most of the time this information has been easily accessible, either through JDE or Eclipse. However, I recently started learning Clojure(I’ll cover my clojure setup in my next blog about emacs), and so neither of these tools were available to me. Searching for help online, I found the javadoc-help package.

Unfortunately, after trying it, I didn’t end up liking it very much. The search it uses, which pulls javadoc for each class that has as a substring the class you input, is far too wide. In most cases, I want an exact match on the class name I’m given - If I search for information on File, I don’t want to have to choose between it and FileFilter. Since this package didn’t work out, I wrote my own quick search function:

(defun java-describe (symbol-name)
  (interactive "MJava Class:")
  (let ((my-string (replace-regexp-in-string 
		    "^.*class-use/.*n" 
		    ""
		    (shell-command-to-string
		     (concat "find ~/.emacs.d/documentation/java/docs/api/ -name "
		     	     symbol-name
			     ".html"))
		   )))
    (string-match "^\(.*\)$" my-string)
    (browse-url (match-string 0 my-string))))
(global-set-key (kbd "C-h j") 'java-describe)

This function just finds the javadoc corresponding to the exact class name you specify, though you don’t specify package. If there are more than one class with the same name in different packages, it’ll choose one to show; I haven’t had an issue with this yet, although once it comes up I’ll probably allow adding part of the package name as well. For this to work, you need to the java API specification downloaded and extracted in the directory pointed to by this function - I just keep mine in ~/.emacs.d/documentation/java/.

The Dangers of String.substring

Sunday, July 5th, 2009

A few days ago at work, I had to track down why my Java application was running out of memory. It processed a few CSV files, storing some of the data in them. The files were large, but the data my application was storing should have easily been able to fit into memory. After an hour or so of investigation, I found it was an issue I had heard about but never run into myself: the String.substring memory leak.

The bug in question is that String.substring, instead of returning a new string, returns a string based on the original string. This means that the larger string can’t be garbage-collected, so if you store a substring thben the entire string will be kept in memory. In my specific case, I was using String.split and storing a few of the fields - however, String.split uses String.substring behind the scenes, so I ended up keeping the entire files in memory, explaining my OutOfMemoryErrors.

This bug is due to the Java implementors wanting String.substring to be as fast as possible. If it did not have this behaviour, a new char[] would have to be allocated on the heap each time the function was called and the data would have had to be copied into it. Moreover, if you did need the larger string to stay around, part of the string would be duplicated, which is inefficient memory-wise. The workaround for this bug is to just wrap calls to String.substring or calls that use it in a new String(). For example:

     String sub = new String( oldString.substring(0, 4) );

and the old string will be garbage collected.

For the above-mentioned reasons of the bug’s existence and the length of time it’s been opened, it probably won’t be fixed. The workaround suggested currently is not *too* painful, although you do have to find which calls themselves use String.substring, in many cases even if you forget the memory inefficiency isn’t too terrible (if you strip off the last character of a string, for example). It would be nice if there was a mention in the javadoc for these methods that this behaviour occurred, so that looking at the description of a method would warn you about issues such as this.

Emacs Customization Grab Bag

Friday, July 3rd, 2009

A bunch more I added back to my emacs configuration, plus a few new things:

(if (fboundp 'normal-top-level-add-subdirs-to-load-path)
        (let* ((my-lisp-dir "~/.emacs.d/")
              (default-directory my-lisp-dir))
           (add-to-list 'load-path my-lisp-dir)
           (normal-top-level-add-subdirs-to-load-path)))

This adds all subdirectories of ~/.emacs.d/ to my load-path, ensuring that I don’t have to worry about manually updating my load path.

(add-hook 'comint-output-filter-functions 'comint-strip-ctrl-m)

In shell mode, there is often a problem where the control-m character is displayed. This *should* fix the problem, but I actually still have issues. When I’m in shell-mode, the following is an example of my output:

nflath@nflath-laptop:~/clojure$ ls
build.xml          clojure.jar                clojure-sources.jar  readme.txt
classes            clojure-slim-1.0.0.jar     epl-v10.html         src
clojure-1.0.0.jar  clojure-slim.jar           pom-template.xml     svninfo.txt
clojure_1.0.0.zip  clojure-sources-1.0.0.jar  pom.xml

Which is very suboptimal, to say the least. This did not occur before I wiped my configuration file, so I fixed it once before, but I couldn’t find the configuration setting I used to do so. If you have any ideas, please let me know - I use shell mode just about constantly and this is a huge problem.

(fset 'yes-or-no-p 'y-or-n-p)
(show-paren-mode 1)
(setq kill-read-only-ok t)
(if (fboundp 'tool-bar-mode) (tool-bar-mode -1))
(setq inhibit-startup-message t)
(when window-system (global-unset-key "\C-z"))

These are just some general Emacs customizations I use. The first line standardizes emacs; instead of sometimes asking for yes/no and sometimes y/n, it will always request y/n. show-paren-mode will highlight the parenthesis matching the one you are over, which is great when you are doing any type of coding. The next line tells emacs not to throw an error when you attempt to kill text in a read-only buffer; it will just save it to the kill ring, which is what I want in most cases.

(defun copy-line (&optional arg)
  "Do a kill-line but copy rather than kill. This function directly calls
kill-line, so see documentation of kill-line for how to use it including prefix
argument and relevant variables. This function works by temporarily making the
buffer read-only, so I suggest setting kill-read-only-ok to t."
  (interactive "P")
  (toggle-read-only 1)
  (kill-line arg)
  (toggle-read-only 0))
 
(defun copy-whole-line (&optional arg)
  (interactive)
  (save-excursion
    (move-beginning-of-line 1)
    (copy-line arg)))
 
(global-set-key (kbd "C-x y") 'copy-whole-line)

I found copy-line online somewhere a while back, but don’t use it very much. What I do use a lot is copy-whole-line, which doesn’t require me to C-a to yank an entire line of text. This is a frequent enough occurrence that I bound it to it’s own keybinding.

(setq eldoc-idle-delay 0)
(autoload 'turn-on-eldoc-mode "eldoc" nil t)
(add-hook 'emacs-lisp-mode-hook 'turn-on-eldoc-mode)
(add-hook 'lisp-interaction-mode-hook 'turn-on-eldoc-mode)

Eldoc is a minor mode that is invaluable for lisp programming - or even other languages, I also use c-eldoc-mode. It will display the arguments to the function you are calling in the mode line. This can be used either as you are writing code and need to remember what order arguments are in, or when just reading it and wondering what each argument corresponds to. Many times this saves having to look up the full documentation for a function. The configurations above will enable eldoc in elisp modes and set it to display the arguments immediately, instead of waiting.

Accessing Documentation from Emacs

Thursday, July 2nd, 2009

Today’s re-additions to my .emacs file:

(defun my-help ()
  "If function given tries to `describe-function' if variable
uses 'describe-variable', otherwise uses `manual-entry' to display
manpage of a `current-word'."
  (interactive)
  (let ((var (variable-at-point)))
    (if (symbolp var)
        (describe-variable var)
      (let ((fn (function-called-at-point)))
        (if fn
            (describe-function fn)
          (man (current-word)))))))
 
(global-set-key [f1] 'my-help)

This function was one I found somewhere online, and is incredibly useful. It will attempt to give help on the word at point, no matter what it is. If it is a variable, it will call the describe-variable function. If it is a function, it will call describe-function to give documentation. If it isn’t a elisp variable of function, it will return the results of the ‘man’ command on the word, which is still useful in programming if you have packages such as manpages-dev. While these commands are all possible individually, having one key that does whatever it should to give documentation, instead of having to choose documentation functions, makes my life so much easier when reading elisp.