Your Botnet is My Botnet: Analysis of a Botnet Takeover

November 9th, 2009

This is a summary of a paper, titled Your Botnet is My Botnet: Analysis of a Botnet Takeover.

Botnets are becoming a large problem for the internet. They are formed by networks of compromised computers that are under the control of some other person. Botnets are becoming the primary means for criminals to launch DOS attacks, steal personal data, or other cyber crimes.

Most previous analysis of botnets have been analyzing them from the inside; intentionally infecting a computer to join the botnet, and analyzing the activity that then occurs. Since many botnets use P2P protocols, other infected computers can be discovered using this technique. However, this gives a very limited view of the activities of the botnet. A better way is to take control of the entire botnet, which can be done either with cooperation from domain registrars or law enforcement.

For this paper, researchers took control of the Torpig botnet. Torpig is primarily associated with bank account and credit card theft. This was done by exploiting how the bots try to locate their command server. Each bot generates a list of domains to contact, and the first host that sends a reply identifying itself is considered genuine until the next domain generating phase. This allowed researchers to register domains the infested host would contact.

Torpig is distributed to it’s victims using Mebroot, a rootkit that replaces a system’s Master Boot Record. Victims are infected through vulnerable web sites being modified so that the victim’s browser requests Javascript, which then attempts several exploits. If any are successful, an installer for Mebroot is downloaded and executed. Mebroot does not perform other malicious attacks itself; It acts as a platform to install malicious modules. Mebroot contacts the C&C server every two hours to receive updates.

The C&C server distributed three modules, which comprise Torpig. These inject these DLLs into the file manager, Internet Explorer, Firefox, and other popular utilities, allowing it to inspect all data handled by these programs. Every twenty minutes, Torpig uploads new data to the command server. In reply to this, the C&C server can either respond with ‘ok’ or a configuration file used for configuration and parameters to perform phishing attacks. These attacks can gain data that would not otherwise be possible by passive monitoring. When the user goes to a site in the configuration file, they will instead be redirected to a site given by an injection server.

Taking over the botnet was fairly simple; domains were registered for a three-week period. Logs were collected from all network data, until a new torpig binary that changed the domain generation algorithm was installed through Mebroot. 70GB of data were collected during the 10-day period that the Torpig botnet was under control.

All bots communicate with the Torpig command server through HTTP Post requests. This requests contains all the collected dat, as well as information about the bot. There are 8 different types of data that Torpig sends out: Mailbox account, email, form data, HTTP account, FTP account, POP account, and Windows passwords.

Attempting to analyze the size of the botnet is somewhat difficult. It can’t be done by merely checking how many IPs connect to the C&C server vecause of NAT and DHCP. However, Torpig contains information for hardware configurations and a mostly-unique ID for each bot. This led to an estimated 182,914 bots in the Torpig botnet. Further analysis was done to find the number of security researchers and search engine bots to get a more accurate number. Security researchers could be found by checking the default hardware configurations of VMWare and other virtualization tools. This gave a final estimate of 182,800 bots. In contrast, the number of IPs connecting to the C&C server was an order of magnitude larger. In the ten days the botnet was taken over, 49,924 new hosts were infected, though there were large spikes on two days.

Torpig is crafted to retrieve information that can easily be monetized. In the ten days, Torpig obtained 8310 accounts at financial institutions. 1660 credit and debit card numbers were also obtained. By pricing these accounts, the estimated value from these ten days is between $83K and $8.3M. In addition to information retrieval, Torpig opens proxies that can be used for spam or other activities, and represents a great deal of bandwidth that can be used in a DOS attack. It logs all other datas, which represents a huge breach of privacy and can be used to look at all chat, email, and other messages sent.

Analysis of the passwords retrieved showed that most were not very high strength, and roughly 28% of users reuse their passwords. This is evidence that the reason these botnets so large is a cultural problem, of people not understanding the consequences of irresponsible computer use.

Coders at Work

November 6th, 2009

There’s been a lot of talk about Peter Seibel’s new book, Coders at Work, recently, so I decided to
read it as well. I’m definitely glad I did; It’s a very readable book, with some very good programmers and designers’ views on debugging, programing, and other technical topics. Among those interviewed are jwz, Peter Norvig, and of course Donald Knuth, among others. This book seems inspired by another one I’m reading, Masterminds of Programming, but I enjoyed this one a lot more.

The book consists of 17 interviews with different programmers in a wide variety of domains. There’s a lot to absorb, and it’s pretty instructive to see how many of these coders don’t rely on modern tools such as debuggers or IDEs. While the interviews are pretty organic and aren’t exactly alike, Siebel does ask all of them some of the same questions, and it’s nice to see that different approaches can work equally well. Each programmers approach to API design is another one of the big ones he tends to ask, which is what I’d consider one of the most important parts of being a programmer.

I’d definitely recommend reading this book - there is a lot of useful information to absorb from it. It’s hard to pinpoint specific lessons, but it should at least make you think about your methods and techniques. It’s a very readable book, enjoyable to read and easy to understand. I’m with Joel on this - you should definitely read this book.



Java and C++ Utilities

November 4th, 2009

I’ve been working on some utilities for coding in Java. JDE and CEDET ground my emacs to a halt the last time I tried them, so I wanted something lightweight. So far, I mostly have some functions for looking up documentation - including c++ documentation - that I store locally on my computer and keep in a repository, but I also have a few utilities for auto-importing Java classes.

The utilities to follow need these macros defined. I talked about them previously at:
http://nflath.com/2009/08/emacs-timing-and-upgrades/. They are utilities for generating functions that take arguments defaulting to word at point.

(defun my-fn (fn prompt)
  "When given a function taking one argument and applying a function to it, will use that function
   and default to the word at point, with a prompt including that word."
  (let ((default (current-word)))
    (let ((needle (read-string (concat prompt " <" default ">: "))))
      (if (equal needle "")
          (funcall fn default)
        (funcall fn needle)))))
 
(defmacro defun-my (name prompt &rest body)
  "Will define both a function and a my- version of the function,
which defaults to the word at point."
  `(progn
     (defun ,name (arg) ,@body)
     (defun ,(intern (concat "my-" (symbol-name name))) ()
       (interactive)
       (my-fn (quote ,name) ,prompt))))

These functions will be used in some of the later functions I wrote. These are used for caching large directory structures in a buffer to search for files instead of parsing the output of ‘find’ each time. Specifically, I use these to quickly look up which file I should be referencing to view documentation on Java and C++ classes. create-file-list will just create a list of files in the given buffer, and find-location-for-doc-from-buffer will return the full path of the matching html file you are searching for. Java-find-html-for-class is just a helper function that fills in the arguments for find-location-for-doc-from-buffer for Java buffers.

(defun create-file-list (directory buffer)
  "Creates the list of files in a directory"
  (save-window-excursion
    (let ((default-directory directory))
      (shell-command "find . " buffer)
      (switch-to-buffer buffer)
      (flush-lines "\.svn")
      (flush-lines "class-use"))))
 
(defun find-location-for-doc-from-buffer (arg buffer-name buffer-creation-fn begin)
  "Finds the file for a given documentation name in the buffer
that may be created with buffer-creation"
  (save-excursion
    (save-window-excursion
      (let ((doc-buffer (or (get-buffer buffer-name)
                            (funcall buffer-creation-fn))))
        (switch-to-buffer doc-buffer)
        (goto-char (point-min))
        (while (not (line-matches (concat "/" arg "\.html")))
          (search-forward arg))
        (concat begin
                (buffer-substring (1+ (line-beginning-position))
                                  (line-end-position)))))))

These next functions are used to look up documentation. my-java-describe-class will open up documentation for the input class file, whereas java-describe-variable will take a variable name and look backwards to it’s declaration and find documentation for that class. c-search-docs does something similar; it will prompt you for a keyword and see if anything in my c++ documentation matches it.

(defun-my java-describe-class "Open Javadoc for Class"
  "Loads javadoc for specified class in your browser."
  (interactive "MClass Name: ")
  (browse-url (java-find-html-for-class arg)))
 
(defun-my java-describe-variable "Open Javadoc for Variable"
  "Opens the javadoc for the variable at point, if possible."
  (interactive)
  (save-excursion
    (re-search-backward (concat "[ \t\n]"
                                "[A-Za-z]+"
                                "<[][A-Za-z0-9<>]*>"
                                "[ \t\n]"
                                arg))
    (forward-char)
    (java-describe-class (current-word))))
 
(defun-my c-search-docs "Documentation For"
  "Searches C++ Documentation for the requested term"
  (browse-url (find-location-for-doc-from-buffer
               arg
               "*C Documentation*"
               (lambda () (create-file-list "~/.emacs.d/documentation/c++/" "*C Documentation*"))
               "~/.emacs.d/documentation/c++/")))

Another task that I frequently have to do is fix imports in Java classes. Doing this manually is a huge pain, so I wrote a few functions to help. my-java-import-class will prompt for a class, look up it’s full package name, and add the import to the top of your file. Java-get-undefined-classes will run compile-command and parse the output to add all unimported classes. This needs java-undefined-symbol-regexp to be defined correctly, as well as compile-command to be set to something like ‘javac filename’.

(defun-my java-import-class "Import Class"
  "Adds an import statement for the class at point."
  (save-excursion
    (let ((my-retn-value nil))
      (let ((my-string (java-find-html-for-class arg)))
        (find-file my-string)
        (end-of-buffer)
        (re-search-backward "\"\\([A-Za-z0-9]+\\.\\)+[A-Za-z0-9]+ [ci][ln][at][se][sr]" )
        (let ((start (point)))
          (re-search-forward " " )
          (setq my-retn-value (substring (buffer-string) start (- (point) 2)))))
      (kill-buffer (current-buffer))
      (beginning-of-buffer)
      (re-search-forward "import " (point-max) t)
      (beginning-of-line)
      (when (looking-at "import")
        (end-of-line)
        (newline))
      (insert "import " my-retn-value ";\n" ))))
 
(defvar java-undefined-symbol-regexp "symbol  : class \\([A-Za-z0-9]*\\)")
(defun java-get-undefined-class-names ()
  (interactive)
  (save-window-excursion
    (remove-if
     #'not
     (remove-duplicates
      (mapcar (lambda (x)
                (if (string-match java-undefined-symbol-regexp x)
                    (match-string 1 x)))
              (split-string (shell-command-to-string compile-command) "\n *^\n")) :test #'string-equal))))
 
(defun java-import-undefined-classes ()
  (interactive)
  (save-window-excursion
    (mapc #'java-import-class (java-get-undefined-class-names))))

Installing Trac

November 2nd, 2009

As part of a group project I’ll be doing over the next few terms, I had to set up a few utilities - trac, mediawiki, and reviewboard. It took me a while to figure out how to install these properly on a shared server - none of the tutorials I saw were entirely correct - so I figured that I should write my own.

Trac is a issue-tracker system that integrates with your version control system so you can track bugs by commits. I haven’t used it extensively yet - I’ll probably do another post on usage once I’ve been using it more. The first thing you need to do to install trac is to install all the required packages. On Ubuntu, you can do this with:

sudo apt-get install apache2 libapache2-mod-python libapache2-svn python-setuptools subversion python-subversion

Next, you need to create a base ‘trac’ directory somewhere on your filesystem and allow the apache user www-data to be able to read and write to it. If you want it to be in /home/trac, for example, you can do the following:

sudo mkdir /home/trac
sudo chown www-data:www-data /home/trac

Now, to create an individual trac project you need to know two things; 1) the name of you project, which is your choice, and 2) the location of your source control repository. Trac supports several different source control systems; we’re using it with subversion. Answer all the questions it asks you and your project will be created.

Next, you need to create the apache configuration file for this. Assuming that you want trac to be accessed by trac.somedomain.com, create the file /etc/apache2/sites-available/trac.somedomain.com from the following template, changing the values in []:


        ServerAdmin [yourEmail]
        ServerName [trac.yourDomain.com]

        ErrorLog [errorLogFile]
        CustomLog [logFile] combined

        # Possible values include: debug, info, notice, warn, error, crit,
        # alert, emerg.
	LogLevel warn
	ServerSignature On

    
		SetHandler mod_python
		PythonInterpreter main_interpreter
		PythonHandler trac.web.modpython_frontend
		PythonOption TracEnvParentDir [tracParentDirectory (e.g /home/trac)]
                PythonOption TracUriRoot /
                PythonOption PYTHON_EGG_CACHE /tmp
    

        # use the following for one authorization for all projects
	# (names containing "-" are not detected):
	
            AuthType Basic
            AuthName "trac"
            AuthUserFile /var/svn/conf/svnusers
            Require valid-user
	

This requires users to be authenticated to access trac.yourdomain.com. To create an authentication file, you need to do the following:

cd /var/svn/conf/
htpasswd svnusers [username] [newpass]
htpasswd svnusers [username2] [newpass2]

Repeat this for all the users you wish to be able to authenticate. You also need to define a policy file in /var/svn/conf/svnpolicy. This file has the following format:

[project1:path]
user1 = rw
user2 = r
usrer3 = rw

[project2:path]
user1 = rw
user2 = rw
user3 = brw

The project name will be the name of your trac project; the path is probably /, unless you want some parts of your trac setup to be authenticated differently than others. This should set up the authentication fro trac.

After this, you need to symlink the file your created in sites-available to sites-enabled and reload and restart apache:

cd /etc/apache2/sites-enabled
ln -s ../sites-available/trac.yourdomain.com .
/etc/init.d/apache2 reload
/etc/init.d/apache2 restart

After this, the site should be accessible if you edit your hosts file to redirect to it by adding the following line to /etc/hosts:

[siteip] trac.yourdomain.com

However, you probably want to be able to access it from any computer without modifying your hosts file. To do this, you need to go into whatever DNS manager you use and add an entry for trac.yourdomain.com that points to your server.

This ended up being longer than I expected, so I’ll cover mediawiki and reviewboard in later posts. Let me know if anything here is incorrect or more should be covered.

Programming Clojure

October 31st, 2009

I read Programming Clojure a few months ago; I meant to write a review of it then, but I was busy and then forgot. Better late than never, though, so I’m going to do it now. Clojure, as you may know, is a Lisp dialect that runs on the JVM. It is a very modern Lisp - Clojure 1.0 celebrated it’s first anniversary recently. It can integrate with Java libraries, nullifying the complaints that Lisp doesn’t have enough libraries, and has very good concurrency support.

Programming Clojure was a very good book for helping to get started with Clojure. I had already done a few small projects with it, but hadn’t used it in any substantial way. This book covered all of the features of Clojure, including all of the neat concurrency stuff that I probably wouldn’t have run into on my own for a while. This book provides a good overview of all the features of Clojure, as well as simple examples of how to use them. I still refer to it when writing Clojure code.

In addition to talking about the language features in isolation, there is an overarching project throughout the book that is used to show the features in a real application. This is pretty interesting, even though you’d probably never have to write a build system in real life. This project starts simpler, and then gradually uses more of the concepts you learn in each chapter, making it a great way to follow along and ensure you actually understood what you read in the book. It’s fairly short; I recommend everyone who reads the book should make sure they understand the project.

I’d highly recommend this book if you are interested in learning or getting better at Clojure. It will help you understand how to use Clojure to create real projects. It explains how to use clojure, some of the issues you may initially have with them (printing in lazy sequences, for example), and how to overcome them.



Org-mode

October 28th, 2009

I recently started to use org-mode, a emacs mode used for scheduling and organizing your tasks. I’ve really started to like it; I now do all my scheduling using it, instead of my previous Google Calendar.

Org-mode files are essentially trees, with * as the separator. An example is:

* Heading 1
** Subheading
some notes
more notes
** Subheading
* Heading 2

You can easily cycle the visibility using TAB. This is it’s basic note-taking mode; if you go far enough in, it’s quite complicated. For example, tasks can be scheduled, occur at a time, or have a deadline. You can insert deadlines for the current task with C-c C-d, schedule a task to begin with C-c C-s, or just add a timestamp to an item manually. Any of these can be recurring. These will show up on your agenda, which I’ll talk about shortly. An example with all of these types of timestamps is:

* Heading 1
** Subheading
<2009-10-19 Mon +1w>
some notes
more notes
** Subheading
TODO: <2009-10-21 Wed>
** Subheading 3
SCHEDULED: <2009-10-23 Fri>
* Heading 2

For a more detailed explanation of times with org-mode, read this.

Org-mode also has an ‘agenda’ view, which takes all the org files you’ve specified as agenda files and displays a timeline of them. This is one of my most used features; It allows you to easily see your schedule for a given day, see the list of items you have to do, and in general acts as a calendar. You can customize the various views; I haven’t done much of this, but you should read the org-mode manual in order to learn about this and other uses of org-mode.

To get started with org mode, you should require it:

(require 'org)

I set a few keybindings in order to make org easier to use. I remap left and right to raise the tree one level and indent the tree one level respectively. I map RETURN to add a new item; If i want a linebreak, I can still use C-j. I needed to use <, and it was bound to something I didn't care about, so I bound it to self-insert-command. Org-agenda switches to agenda mode, and I wanted it to be easily accessed.

(define-key org-mode-map (kbd "RET") 'org-meta-return)
(define-key org-mode-map (kbd "<left>") 'org-metaleft)
(define-key org-mode-map (kbd "<right>") 'org-metaright)
(define-key org-mode-map (kbd "<" ) 'self-insert-command)
(global-set-key "\C-ca" 'org-agenda)

There’s a few customizations I add to org to make it a smoother experience. The first thing is to turn off flyspell-mode. This is unfortunate; it could be useful, but as-is it just interferes with org-mode’s highlighting. I also set org-mode to log the time when I mark items as done. I set the list of my agenda files, and the amount of time in advance to start warning me of deadlines as 10 days. The last one is to set pabbrev to not activate; it interferes with org’s expanding and hiding of trees.

(add-hook 'org-mode-hook #'(lambda () (flyspell-mode -1)))
(setq org-log-done 'time)
(setq org-agenda-files '("~/org/technical.org" "~/org/designproject.org" "~/org/personal.org" "~/org/jobs.org" "~/org/martialarts.org" "~/org/emacs.org" "~/org/school.org" "~/org/financial.org"))
(setq org-deadline-warning-days 10)
 
(defadvice pabbrev-global-mode (around org-stop-pabbrev activate)
  (unless (eq major-mode 'org-mode)
    ad-do-it))

These functions are utilities used in publishing my agenda buffer. They work to generate a string describing the section number of the current bullet in an org buffer. These are required because exporting an org file to HTML will use these values as section numbers - to hyperlink to them in the agenda buffer, we will need to know this. There may already be functions to perform this, but I couldn’t find them.

    )))
 
(defun org-full-sections (&optional pos)
  "Returns a list coresponding to the full section number at pos"
  (interactive)
  (save-excursion
    (if pos (goto-char pos))
    (let* ((retn 1)
           (curnum (org-current-section-number))
           (retlist (list curnum)))
      (condition-case nil
          (while t
            (progn
              (outline-up-heading 1)
              (setq retlist (append (list (org-current-section-number)) retlist))))
        (error retlist))
      retlist)))
 
(defun org-full-sections-string (&optional pos)
  "Returns a string corresponding to the section at pos"
  (interactive)
  (substring (reduce (lambda (x y) (concat x "." (number-to-string y)))
                     (org-full-sections)
                     :initial-value "") 1))

Sometimes, I’m not on my primary computer, which has all of my agenda-files, and still want to be able to look at my schedule. To do this, the logical way would be to store my agenda as a HTML file that I can access from anywhere. Org has a few ways to publish to HTML, but I could not find how to export the agenda buffer and files, hyperlinked so that it emulates the actual agenda. I ended up writing a rather ugly method to do it myself; the results are below.

(defun org-publish-agenda ()
  (interactive)
  (save-window-excursion
    (mapcar (lambda (file)
              (find-file file)
              (org-export-as-html 3)
              (kill-buffer))
            org-agenda-files)
    (org-agenda 0 "a")
    (org-agenda-month-view)
    (let ((html-buffer (htmlize-buffer (get-buffer "*Org Agenda*")))
          (agenda-buffer (get-buffer "*Org Agenda*")))
      (switch-to-buffer html-buffer)
      (goto-char (point-min))
      (search-forward "<body>")
      (let ((line-start (line-number-at-pos)))
        (while (< (point) (point-max))
          (beginning-of-line)
          (cond
           ((line-matches "org-agenda-structure") (forward-line))
           ((line-matches "org-agenda-dat") (forward-line))
           ((line-matches "org-time-grid") (forward-line))
           ((line-matches " *\\([^:]+\\):")
            (let ((calendar (after-last " "(match-string 1))))
              (let ((agenda-line-no (1- (- (line-number-at-pos) line-start))))
                (switch-to-buffer agenda-buffer)
                (goto-line agenda-line-no)
                (let* ((marker (or (get-text-property (point) 'org-marker)
                                   (org-agenda-error)))
                       (buffer (marker-buffer marker))
                       (pos (marker-position marker)))
                  (switch-to-buffer buffer)
                  (goto-char pos)
                  (setq sec-string (concat "sec-" (org-full-sections-string)))
                  (switch-to-buffer html-buffer)
                  ))
              (insert (concat "<a href=\"" calendar ".html#" sec-string "\">"))
              (end-of-line)
              (insert "</a>")
              (forward-line)))
           (t (forward-line))
           )))
      (write-file "~/org/agenda.html")
      (let ((default-directory "~/org/"))
        (shell-command "mv *.html publish/")
        (kill-buffer "agenda.html")
        ))))
 
(defun org-publish-agenda-to-site ()
  (interactive)
  (save-window-excursion
    (org-publish-agenda)
    (shell-command "scp -r ~/org/publish nflath.com:/home/nflath/public_html/nflath.com/ &")))
 
(setq agenda-update-timer (run-with-timer 0 3600 'org-publish-agenda-to-site))

These functions require org-publish-location to be defined to the value you wish the HTML to be stored. This isn’t the best code; it assumes that your agenda files are in ~/org, and that the directory ~/org/publish exists, but this should be easy to fix if this is not where you store your agenda files.

org-publish-agenda works by modifying the results of calling htmlize-buffer on the org-agenda; htmlize.el is a package that can turn arbitrary buffers into HTML. It is sometimes useful when you want to export your view of a file, such as in this case. To use it, you have to put the following in your initialization file:

(require 'htmlize)

Google Collections

October 26th, 2009

Google Collections - Part 1
Google Collections - Part 2

This Google Tech Talk covers the Google Collections library for Java, a library I was introduced to on my last work term. It provides some useful data structures that I’ve had uses for before, as well as utility methods that make working with the standard Java classes easier. This talk gave me a much better overview of what precisely was in these libraries.

Google Collections was open-sourced by Google, so you can also use it. It’s hosted on google code at http://code.google.com/p/google-collections/. It was programmed for JDK 1.5, although being Java it is forward-compatible with Java 1.6. It’s widely used internally at Google and other large companies, so it is very well-tested. It is an extension to the Java Collections library.

The first set of data structures added by Google Collections are Immutable collections. Google provides ImmutableSet, ImmutableMap, ImmutableList, some other immutable data structures and differing implementations of these. These are guarantied to be immutable, and are slightly faster and use less memory. There is also UnmodifiableIterator, which is an iterator that does not supply remove() in order to get a view of a mutable collection. Immutable data structures are made to be initialized - Sets can be created with ImmutableSet.of( Element_1, Element_2, … ), and Immutable Maps are constructed with the following syntax:

ImmutableMap.Builder<Integer, Integer> builder = new ImmutableMap.Builder<Integer, Integer>();
return builder.put( 1, 2 ).put( 2, 3 ).put( 3, 4 ).build();

There is only one problem I’ve had with the Immutable data structures, which more a fault of the ndesign of the Java Collections Library than Google. Since modification methods of most collections return void - like List.add() - Immutable collections throw exceptions when these methods are called on them. In several cases, it would make my life much easier if these methods returned new ImmutableLists without modifying the old one, but I suppose this could be constructor. These data structures are guaranteed to be immutable - All constructors for them are private, so they cannot be subclassed by others who subvert this immutability.

There are a few caveats to the Immutable structures - they cannot store Null, which I’m really quite OK with, and there isn’t a way to enforce deep immutability. This means that if you have a ImmutableList of some data structure, you can retrieve the structures and modify them - just not the contents of the list itself.

Google Collections also introduces a few new types of data structures. One of these is the Multiset, which is a Set that is unordered and allows duplicates. The four main methods on it are count, add, remove, and setCount. There are 6 different implementations of Multiset optimized for various performance characteristics.

Mulimaps are also introduced. These are one of the most usefule things introduced by this library; I’ve had to implement these before, usually in a more unsafe way. Multimaps store many-many relationships; They are Maps, except calls to get() will retrieve a list of values that is added to when you put() a key-value pair. These also do the logical thing of returning empty collections when you attempt to get a value that isn’t in the map, not return null. There are 5 different implementations of Multimap, which can return different types of Collection from Get - ListMultimap, SetMultimap, SortedSetMultimap are a few of these.

Bimaps are also introduced. Bimaps are like Maps which require values to be unique - this has the benefit of being able to map from value->key as well as key->value. This has 3 implementations. Finally, a MapMaker class that will construct a ConcurrentMap. MapMaker allows you to specifie whether you want strong, weak, or soft keys and values, for a total of 9 combinations. It is fully concurrent, and can be used as a ConcurrentMap once you call makeMap(). It also has a makeComputingMap, which allows you to specify a function that will compute the value to be returned.

Another very useful feature Google Collections provides is static factory methods to create all types provided by their library, as well as Java classes. This cuts down the verbosity, due to Java’s type inference for static methods - you can do

Map< Integer, Boolean > m = Maps.newHashMap();

instead of

Map< Integer, Boolean > m = new HashMap< Integer, Boolean >();

Ordering, a better Comparator, is also provided. You can generate an Ordering using Ordering.forComparator( Comparator ), which provides an Ordering object for you to use. Orderings provide a bunch of utility methods - min, max, isIncreasing, and a bunch of others. Orderings work whenever Comparators are expected, so they are pretty much strictly better than Comparators.

Google Collections has as a design philosophy that methods work on Iterators whenever possible. This helps in the cases where you have more data than can fit in memory. To improve iterators, they provide the classes Iterators and Iterables, which provide utility methods for working with Iterators and Iterables. Some of these are transform, filter, and concat - you should look in the documentation for a full list.

The Google Collections library is extensively unit tested. A framework for creating tests was created, resulting in 25000 unit tests generated from a few thousand base tests. Running standard Java Collections classes through this framework has found bugs in these - they are quite thorough.

Having used the Google Collections library, I can testify that if you are programming Java, you should be using it. It makes your life a lot easier in pretty common cases, and some of the more esoteric stuff is always there if you need it. This talk was very interesting, and introduced me to some of the Library I was previously unaware of.

Save Visited Files, V1.1

October 22nd, 2009

If you remember a post I wrote a while back, I had made a mode save-visited-files that was a lightweight replacement for desktop.el.  It saves the files you have opened, reloading these whenever you start emacs.  Ryan Thompson spent some time improving save-visited-files, so I thought I’d release his improvements.  save-visited-files.el can be downloaded from here.

Save-visited-files now uses auto-save-hook instead of a periodic timer.  This is probably the better decision, but it means you don’t want it to trigger too often.  In order to reduce the number of auto-saves, you can use the following, which will auto-save every 3000 input events:

(setq auto-save-interval 3000)

The functions and variables in save-visited-files are also named more consistently, with each one being prefixed with save-visited-files.  The mode can now also be customized with M-x customize-group save-visited files.  As a final improvement, the handling of the temporary buffer is handled much more nicely, so that the user is never presented with it.

To use the new save-visited-files, just put the following in your .emacs file:

(require 'save-visited-files)
(turn-on-save-visited-files-mode)

More Gmail with Emacs

October 21st, 2009

The last time I talked about using Gmail from Emacs, several people suggested using Wanderlust, an IMAP client written in ELisp. Details for setting it up can be found here; just following those set it up well for reading my email. This worked until I wanted to send a mail from emacs, and found that wanderlust had replaced emac’s send-mail function with it’s own, which didn’t work. I tried for a bit to figure out why it was failing, but eventually gave up on wanderlust and set GNUs up for gmail again.

After this, I decided to try using GNUs for accessing my Gmail again. This process is fairly easy:
put the following in your configuration file:

(setq gnus-select-method '(nnimap "gmail"
(nnimap-address "imap.gmail.com")
(nnimap-server-port 993)
(nnimap-stream ssl)))

Once you have evaluated this expression, do M-x gnus and then S s and enter INBOX. This will switch to your GMail inbox. There are a few problems with this, however. The first is that very few messages are actually presented to you; even though it asked how many to retrieve, and I used a value of ‘100′, only three were presented. I really want to be able to read older mail; If anyone knows how to fix this, please let me know. The other issue, which isn’t as bad, is that read mail does not update the Gmail web interface as read.

Getting back to sending emails, I wrote a function to allow me to easily insert emails. It works for both individual people and multiple addresses.

(defun insert-email ()
(interactive)
(insert (cadr (assoc (ido-completing-read
"Name: "
(mapcar #'car email-alist))
email-alist))))
(global-set-key (kbd "C-c m") 'insert-email)

This function uses the value of email-alist, which is an alist of names to emails. This doesn’t even have to be used to insert emails; that’s just what I’m using them for. To set email-alist to the proper value, use:

(setq email-alist
'(("Name1" "email1")
("Name2" "email2")))

This ends up letting you use Ido to select between name1 and name2 to insert email1 or email2. As I said, it’s more general than what I’m using it for currently - you can use it select any arbitrary text for insertion.

C++ Customizations for Emacs

October 19th, 2009

I’ve been doing more C++ than usual for courses, so I ended up revisiting my C++ configuration and loading up some old customizations that I had in order to be more productive. I didn’t pull in all of my old ones, such as all the CEDET stuff that had been slowing down my emacs significantly, but I did find some useful customizations that I had and wanted to put back in.

The first was to define a project type for C/C++ projects using eproject. This lets you easily switch between files in the same project, as well as perform other actions described in the linked post. I define the project base to just be the topmost directory with a Makefile.

(define-project-type cpp (generic)
  (look-for "Makefile")
  : relevant-files ("\\.cpp" "\\.c" "\\.hpp" "\\.h"))

Another re-addition is member-function.el, a mode that will automatically expand member functions in class definitions with stubs. This ends up saving me a lot of time, since I don’t have to write out the function definitions twice for every function - I just add it to the class definition and then it appears in the corresponding implementation file.

Normally expand-member-function must be run interactively and prompt you for the header and implementation file, but I just created a function that would find these values and run it with the correct values. I added it to c-mode-hook, which will cause it to be run whenever a C/C++ file is opened, without me having to do anything.

I also got tired of typing in #ifndef blocks for my header files, so I wrote a function that would insert this in newly-created .h and .hpp files. I also added this function to c-mode-hook, making it so I never have to type one again (I hope).

(defun h-file-create ()
  "Create a new h file.  Insert a infdef/define/endif block"
  (interactive)
  (if (or (equal (substring (buffer-name (current-buffer)) -2 ) ".h")
          (equal (substring (buffer-name (current-buffer)) -4 ) ".hpp"))
      (if (equal "" (buffer-string))
          (insert "#ifndef "(upcase (substring (buffer-name (current-buffer)) 0 -2)) "_H\n#define "
                  (upcase (substring (buffer-name (current-buffer)) 0 -2)) "_H\n\n#endif"))))
 
(defun c-file-enter ()
  "Expands all member functions in the corresponding .h file"
  (interactive)
  (let ((c-file (buffer-name))
        (h-file (concat (substring (buffer-name (current-buffer)) 0 -3 ) "h")))
    (if (equal (substring (buffer-name (current-buffer)) -4 ) ".cpp")
        (if (file-exists-p h-file)
              (expand-member-functions h-file c-file)))))

I also occasionally write small C++ programs for testing purposes. In these cases, it’s nice to be able to do a M-x compile and have it work without having to manually type in a compile-command. yourself. This will set the compile-command to one that will appropriately compile the current file if no Makefile exists whenever a c/c++ file is opened.

(add-hook 'c-mode-hook
          (lambda ()
            (unless (file-exists-p "Makefile")
              (set (make-local-variable 'compile-command)
                   (let ((file (file-name-nondirectory buffer-file-name)))
                     (format "%s -c -o %s.o %s %s %s"
                             (or (getenv "CC") "g++")
                             (file-name-sans-extension file)
                             (or (getenv "CPPFLAGS") "-DDEBUG=9")
                             (or (getenv "CFLAGS") "-ansi -pedantic -Wall -g")
                             file))))))

I also finally learned how to properly use tags. Tags allow you to quickly navigate to declarations of functions and symbols in your program. To use tags, you must first create a tags file using either etags or exuberant ctags - etags comes with emacs, so you will have at least one of these. You use one of these to generate a TAGS file just by using ‘etags’ in the directory you wish to create a TAGS file in. It will automatically scan all files and subdirectories to create this file.

Once your index is created, you can jump to the definition of a function with M-.. This will prompt for a TAGS file the first time you use it, and then use that index to jump to the function definition you supply (defaulting to the one at point). If there are several conflicting definitions, C-u M-. will go to the next possible definition. Once you are done with that location, M-* will take you back to where you originally jumped from. C-x 4 . will perform a find-tag, but will open the declaration in another window. There are a few other tags commands, but these are the most useful ones. To find out more, read the TAGS node in the Emacs manual.

One last utility I added was . Cscope is the reverse of etags; given a symbol, it will find all uses of that symbol. To use it, cscope must be installed; in Ubuntu you can do this with ’sudo apt-get install cscope’. To use it, just use M-x cscope-find-c-symbol to have it output a list of all references to the symbol; this list is linked to your source, so you can jump to files by pressing ‘enter’ on the line you have selected. cscope-find-functions-calling is also useful; it iwll return a list of all functions that call the function you specify. There are a few more commands, but these are the ones I use; look at the commands starting with cscope- to find a list of them. To enable cscope, you need to put cscope.el in your load path and add the following to your initialization:

(require 'cscope)