Archive for the ‘Uncategorized’ Category

Supporting Dynamic Languages on the JVM

Thursday, November 19th, 2009

Supporting Dynamic Languages on the Java Virtual Machine
Author: Olin Shivers

This paper is about enhancements to the JVM that would make it easier to port dynamic languages, such as Scheme, to the JVM. This is desirable due to the fact that there is now a JVM for practically everything; running on the JVM means that the language will run on every architecture it will run on. However, the JVM, while well-designed for speed of Java programs, does not support dynamic languages very well. One example of a language that suffers performance problems from this porting process is Scheme. In Scheme, there must be a uniform representation of data - all types can be in cons cells. Working this into the Java class models means that every type must extend Object, and boxing and unboxing is expensive for primitive types.

The proposal to support immediates is to give pointers to Java objects a low bit of one. Since these are allocated on word boundaries, this does not hurt. This allows a final ImmediateDescriptor class that has 31 bits of state that can quickly be converted to an even integer(or vice versa). This causes no penalty to programs not using them - if a method iis called on them, this would generate a memory alignment exception that the VM can catch, which would still be fast.

There are still problems with issues such as method lookup, for example. The JVM bytecode is well-optimized for Java code, but not necessarily other paradigms. There is a tension in the bytecode between verification and efficiency - we don’t want an unsafe RISC bytecode system, we do want a safe system, but this will make it less efficient in few cases. This tradeoff is made well for Java, but not other languages.

A proposal to fix this is to have some of the bytecode instructions to be linked to C routines at runtime. This allows language implementers to efficiently represent whatever they need. However, this brings up the problem of verification - these C routines may be unsafe. The solution to this is to have some central body, where the JVM will ‘checkout’ the required instructions as told by the language implementer, and warn the user if the requested code is not in this standard and must be checked out from an unverified location.

Acer AspireONE and Dropbox

Monday, November 16th, 2009

My ASUS’ motherboard was recently blown, so I ended up getting a netbook to use until I get a more powerful laptop. Since I mostly use my laptop for development and web browsing, I don’t have a very pressing need for a hugely powerful comptuer anyway. I got an Acer ASPIREOne, which was one of the cheapest ones I could find on short notice.

I really should have learned this by now, but you should regularly back up all your files. I had most of my documents in SVN and Dropbox, but losing my current changes to my repositories is pretty annoying. I’m currently copying over files, since my hard drive is OK, but I htink I’m going to move as much as possible onto Dropbox.

Dropbox is a program that is used for backing up and synchronizing files across computers. It acts as just a regular folder on your computer, and when changes are made they are propogated up to Dropbox’s servers. It works offline and can be used with pretty much any OS. The first 2GB you use are free; if you need any more, they charge you.

The one problem I have so far with the netbook so far is the keyboard. It’s understandable that it’s small, but some of the keys are in strange position, makign it more annoying than it should be. For example, there are two ‘\’ keys, neither of which are in the correct position and both in spaces better used by ther keys. The placement of them makes the left shift key to be very small, and the ‘Enter’ button to not be as wide as I’m used to. I’m not sure that this was at all necessary.

Other than that, everything’s working pretty well. The screen is much smaller than I’m used to, which was expected, but is still somewhat disconcrting at first. If you have no reason to have a powerful laptop, I’d recommend just getting a cheap netbook and not spending a huge amount for a really good laptop.

The Clean Code Talks - Unit Testing

Thursday, November 12th, 2009

The Clean Code Talks - Unit Testing

This Google tech talk is about the benefit of unit tests. It starts out asking one fundamental question: how do you write hard to test code? Despite most developers(myself included) being good at writing hard-to-test code, most aren’t quite sure how to do it. The speaker describes some of these ways; mixing the ‘new’ operator in business logic, looking for things, doing work in the constructor, global state, and deep inheritance, among others. All of these make it hard to isolate specific parts of the code to test - in the deep inheritance case, you’re also testing superclasses; in the other cases, you can’t isolate one function from all the other functions. It’s also hard to test purely procedural code - there are no ’seams’ that can be exploited to isolate specific parts.

The speaker then describes the progression of levels of testing. The highest level are scenario tests. These test the whole application, doing something a user would do and ensuring the correct thing happens. These test the whole app, which makes them slow; if the test fails, it’s also hard to know precisely where the test failed; you generally have to trace through with a debugger in order to find the specific place.

The next level of testing are functional tests. These test specific subsystems, with simulators for external parts. These are much faster than scenario tests, and you have a better idea of what failed. However, you still don’t know precisely what went wrong - If you’re testing a radio and it doesn’t work, you don’t have much of an idea why. You still have to use a debugger to isolate the point of failure.

This leads to the bottom level of testing: unit tests. These test individual classes in isolation from one another. They are very fast - the speaker suggests running them whenever you save. If all your unit tests pass, you have a high confidence that the class in question is OK, even though you are unsure about the interaction between class. It’s also very easy to figure out what precisely went wrong.

The speaker stressed that this is a continuum of tests; you shouldn’t have only unit tests or only scenario tests - you should have both, although unit tests are more important. It’s also important to have ’seams’, where you can inject the class’s dependencies in order to mock out their behaviour. This is done using dependency injection, where a class asks for its dependencies in the constructor. Going back to before, a class should be either in the business of construction or finding objects, or doing actual work - it’s easy to test either of these methods, just hard to test a method that mixes the two.

Your Botnet is My Botnet: Analysis of a Botnet Takeover

Monday, November 9th, 2009

This is a summary of a paper, titled Your Botnet is My Botnet: Analysis of a Botnet Takeover.

Botnets are becoming a large problem for the internet. They are formed by networks of compromised computers that are under the control of some other person. Botnets are becoming the primary means for criminals to launch DOS attacks, steal personal data, or other cyber crimes.

Most previous analysis of botnets have been analyzing them from the inside; intentionally infecting a computer to join the botnet, and analyzing the activity that then occurs. Since many botnets use P2P protocols, other infected computers can be discovered using this technique. However, this gives a very limited view of the activities of the botnet. A better way is to take control of the entire botnet, which can be done either with cooperation from domain registrars or law enforcement.

For this paper, researchers took control of the Torpig botnet. Torpig is primarily associated with bank account and credit card theft. This was done by exploiting how the bots try to locate their command server. Each bot generates a list of domains to contact, and the first host that sends a reply identifying itself is considered genuine until the next domain generating phase. This allowed researchers to register domains the infested host would contact.

Torpig is distributed to it’s victims using Mebroot, a rootkit that replaces a system’s Master Boot Record. Victims are infected through vulnerable web sites being modified so that the victim’s browser requests Javascript, which then attempts several exploits. If any are successful, an installer for Mebroot is downloaded and executed. Mebroot does not perform other malicious attacks itself; It acts as a platform to install malicious modules. Mebroot contacts the C&C server every two hours to receive updates.

The C&C server distributed three modules, which comprise Torpig. These inject these DLLs into the file manager, Internet Explorer, Firefox, and other popular utilities, allowing it to inspect all data handled by these programs. Every twenty minutes, Torpig uploads new data to the command server. In reply to this, the C&C server can either respond with ‘ok’ or a configuration file used for configuration and parameters to perform phishing attacks. These attacks can gain data that would not otherwise be possible by passive monitoring. When the user goes to a site in the configuration file, they will instead be redirected to a site given by an injection server.

Taking over the botnet was fairly simple; domains were registered for a three-week period. Logs were collected from all network data, until a new torpig binary that changed the domain generation algorithm was installed through Mebroot. 70GB of data were collected during the 10-day period that the Torpig botnet was under control.

All bots communicate with the Torpig command server through HTTP Post requests. This requests contains all the collected dat, as well as information about the bot. There are 8 different types of data that Torpig sends out: Mailbox account, email, form data, HTTP account, FTP account, POP account, and Windows passwords.

Attempting to analyze the size of the botnet is somewhat difficult. It can’t be done by merely checking how many IPs connect to the C&C server vecause of NAT and DHCP. However, Torpig contains information for hardware configurations and a mostly-unique ID for each bot. This led to an estimated 182,914 bots in the Torpig botnet. Further analysis was done to find the number of security researchers and search engine bots to get a more accurate number. Security researchers could be found by checking the default hardware configurations of VMWare and other virtualization tools. This gave a final estimate of 182,800 bots. In contrast, the number of IPs connecting to the C&C server was an order of magnitude larger. In the ten days the botnet was taken over, 49,924 new hosts were infected, though there were large spikes on two days.

Torpig is crafted to retrieve information that can easily be monetized. In the ten days, Torpig obtained 8310 accounts at financial institutions. 1660 credit and debit card numbers were also obtained. By pricing these accounts, the estimated value from these ten days is between $83K and $8.3M. In addition to information retrieval, Torpig opens proxies that can be used for spam or other activities, and represents a great deal of bandwidth that can be used in a DOS attack. It logs all other datas, which represents a huge breach of privacy and can be used to look at all chat, email, and other messages sent.

Analysis of the passwords retrieved showed that most were not very high strength, and roughly 28% of users reuse their passwords. This is evidence that the reason these botnets so large is a cultural problem, of people not understanding the consequences of irresponsible computer use.

Coders at Work

Friday, November 6th, 2009

There’s been a lot of talk about Peter Seibel’s new book, Coders at Work, recently, so I decided to
read it as well. I’m definitely glad I did; It’s a very readable book, with some very good programmers and designers’ views on debugging, programing, and other technical topics. Among those interviewed are jwz, Peter Norvig, and of course Donald Knuth, among others. This book seems inspired by another one I’m reading, Masterminds of Programming, but I enjoyed this one a lot more.

The book consists of 17 interviews with different programmers in a wide variety of domains. There’s a lot to absorb, and it’s pretty instructive to see how many of these coders don’t rely on modern tools such as debuggers or IDEs. While the interviews are pretty organic and aren’t exactly alike, Siebel does ask all of them some of the same questions, and it’s nice to see that different approaches can work equally well. Each programmers approach to API design is another one of the big ones he tends to ask, which is what I’d consider one of the most important parts of being a programmer.

I’d definitely recommend reading this book - there is a lot of useful information to absorb from it. It’s hard to pinpoint specific lessons, but it should at least make you think about your methods and techniques. It’s a very readable book, enjoyable to read and easy to understand. I’m with Joel on this - you should definitely read this book.



Java and C++ Utilities

Wednesday, November 4th, 2009

I’ve been working on some utilities for coding in Java. JDE and CEDET ground my emacs to a halt the last time I tried them, so I wanted something lightweight. So far, I mostly have some functions for looking up documentation - including c++ documentation - that I store locally on my computer and keep in a repository, but I also have a few utilities for auto-importing Java classes.

The utilities to follow need these macros defined. I talked about them previously at:
http://nflath.com/2009/08/emacs-timing-and-upgrades/. They are utilities for generating functions that take arguments defaulting to word at point.

(defun my-fn (fn prompt)
  "When given a function taking one argument and applying a function to it, will use that function
   and default to the word at point, with a prompt including that word."
  (let ((default (current-word)))
    (let ((needle (read-string (concat prompt " <" default ">: "))))
      (if (equal needle "")
          (funcall fn default)
        (funcall fn needle)))))
 
(defmacro defun-my (name prompt &rest body)
  "Will define both a function and a my- version of the function,
which defaults to the word at point."
  `(progn
     (defun ,name (arg) ,@body)
     (defun ,(intern (concat "my-" (symbol-name name))) ()
       (interactive)
       (my-fn (quote ,name) ,prompt))))

These functions will be used in some of the later functions I wrote. These are used for caching large directory structures in a buffer to search for files instead of parsing the output of ‘find’ each time. Specifically, I use these to quickly look up which file I should be referencing to view documentation on Java and C++ classes. create-file-list will just create a list of files in the given buffer, and find-location-for-doc-from-buffer will return the full path of the matching html file you are searching for. Java-find-html-for-class is just a helper function that fills in the arguments for find-location-for-doc-from-buffer for Java buffers.

(defun create-file-list (directory buffer)
  "Creates the list of files in a directory"
  (save-window-excursion
    (let ((default-directory directory))
      (shell-command "find . " buffer)
      (switch-to-buffer buffer)
      (flush-lines "\.svn")
      (flush-lines "class-use"))))
 
(defun find-location-for-doc-from-buffer (arg buffer-name buffer-creation-fn begin)
  "Finds the file for a given documentation name in the buffer
that may be created with buffer-creation"
  (save-excursion
    (save-window-excursion
      (let ((doc-buffer (or (get-buffer buffer-name)
                            (funcall buffer-creation-fn))))
        (switch-to-buffer doc-buffer)
        (goto-char (point-min))
        (while (not (line-matches (concat "/" arg "\.html")))
          (search-forward arg))
        (concat begin
                (buffer-substring (1+ (line-beginning-position))
                                  (line-end-position)))))))

These next functions are used to look up documentation. my-java-describe-class will open up documentation for the input class file, whereas java-describe-variable will take a variable name and look backwards to it’s declaration and find documentation for that class. c-search-docs does something similar; it will prompt you for a keyword and see if anything in my c++ documentation matches it.

(defun-my java-describe-class "Open Javadoc for Class"
  "Loads javadoc for specified class in your browser."
  (interactive "MClass Name: ")
  (browse-url (java-find-html-for-class arg)))
 
(defun-my java-describe-variable "Open Javadoc for Variable"
  "Opens the javadoc for the variable at point, if possible."
  (interactive)
  (save-excursion
    (re-search-backward (concat "[ \t\n]"
                                "[A-Za-z]+"
                                "<[][A-Za-z0-9<>]*>"
                                "[ \t\n]"
                                arg))
    (forward-char)
    (java-describe-class (current-word))))
 
(defun-my c-search-docs "Documentation For"
  "Searches C++ Documentation for the requested term"
  (browse-url (find-location-for-doc-from-buffer
               arg
               "*C Documentation*"
               (lambda () (create-file-list "~/.emacs.d/documentation/c++/" "*C Documentation*"))
               "~/.emacs.d/documentation/c++/")))

Another task that I frequently have to do is fix imports in Java classes. Doing this manually is a huge pain, so I wrote a few functions to help. my-java-import-class will prompt for a class, look up it’s full package name, and add the import to the top of your file. Java-get-undefined-classes will run compile-command and parse the output to add all unimported classes. This needs java-undefined-symbol-regexp to be defined correctly, as well as compile-command to be set to something like ‘javac filename’.

(defun-my java-import-class "Import Class"
  "Adds an import statement for the class at point."
  (save-excursion
    (let ((my-retn-value nil))
      (let ((my-string (java-find-html-for-class arg)))
        (find-file my-string)
        (end-of-buffer)
        (re-search-backward "\"\\([A-Za-z0-9]+\\.\\)+[A-Za-z0-9]+ [ci][ln][at][se][sr]" )
        (let ((start (point)))
          (re-search-forward " " )
          (setq my-retn-value (substring (buffer-string) start (- (point) 2)))))
      (kill-buffer (current-buffer))
      (beginning-of-buffer)
      (re-search-forward "import " (point-max) t)
      (beginning-of-line)
      (when (looking-at "import")
        (end-of-line)
        (newline))
      (insert "import " my-retn-value ";\n" ))))
 
(defvar java-undefined-symbol-regexp "symbol  : class \\([A-Za-z0-9]*\\)")
(defun java-get-undefined-class-names ()
  (interactive)
  (save-window-excursion
    (remove-if
     #'not
     (remove-duplicates
      (mapcar (lambda (x)
                (if (string-match java-undefined-symbol-regexp x)
                    (match-string 1 x)))
              (split-string (shell-command-to-string compile-command) "\n *^\n")) :test #'string-equal))))
 
(defun java-import-undefined-classes ()
  (interactive)
  (save-window-excursion
    (mapc #'java-import-class (java-get-undefined-class-names))))

Installing Trac

Monday, November 2nd, 2009

As part of a group project I’ll be doing over the next few terms, I had to set up a few utilities - trac, mediawiki, and reviewboard. It took me a while to figure out how to install these properly on a shared server - none of the tutorials I saw were entirely correct - so I figured that I should write my own.

Trac is a issue-tracker system that integrates with your version control system so you can track bugs by commits. I haven’t used it extensively yet - I’ll probably do another post on usage once I’ve been using it more. The first thing you need to do to install trac is to install all the required packages. On Ubuntu, you can do this with:

sudo apt-get install apache2 libapache2-mod-python libapache2-svn python-setuptools subversion python-subversion

Next, you need to create a base ‘trac’ directory somewhere on your filesystem and allow the apache user www-data to be able to read and write to it. If you want it to be in /home/trac, for example, you can do the following:

sudo mkdir /home/trac
sudo chown www-data:www-data /home/trac

Now, to create an individual trac project you need to know two things; 1) the name of you project, which is your choice, and 2) the location of your source control repository. Trac supports several different source control systems; we’re using it with subversion. Answer all the questions it asks you and your project will be created.

Next, you need to create the apache configuration file for this. Assuming that you want trac to be accessed by trac.somedomain.com, create the file /etc/apache2/sites-available/trac.somedomain.com from the following template, changing the values in []:


        ServerAdmin [yourEmail]
        ServerName [trac.yourDomain.com]

        ErrorLog [errorLogFile]
        CustomLog [logFile] combined

        # Possible values include: debug, info, notice, warn, error, crit,
        # alert, emerg.
	LogLevel warn
	ServerSignature On

    
		SetHandler mod_python
		PythonInterpreter main_interpreter
		PythonHandler trac.web.modpython_frontend
		PythonOption TracEnvParentDir [tracParentDirectory (e.g /home/trac)]
                PythonOption TracUriRoot /
                PythonOption PYTHON_EGG_CACHE /tmp
    

        # use the following for one authorization for all projects
	# (names containing "-" are not detected):
	
            AuthType Basic
            AuthName "trac"
            AuthUserFile /var/svn/conf/svnusers
            Require valid-user
	

This requires users to be authenticated to access trac.yourdomain.com. To create an authentication file, you need to do the following:

cd /var/svn/conf/
htpasswd svnusers [username] [newpass]
htpasswd svnusers [username2] [newpass2]

Repeat this for all the users you wish to be able to authenticate. You also need to define a policy file in /var/svn/conf/svnpolicy. This file has the following format:

[project1:path]
user1 = rw
user2 = r
usrer3 = rw

[project2:path]
user1 = rw
user2 = rw
user3 = brw

The project name will be the name of your trac project; the path is probably /, unless you want some parts of your trac setup to be authenticated differently than others. This should set up the authentication fro trac.

After this, you need to symlink the file your created in sites-available to sites-enabled and reload and restart apache:

cd /etc/apache2/sites-enabled
ln -s ../sites-available/trac.yourdomain.com .
/etc/init.d/apache2 reload
/etc/init.d/apache2 restart

After this, the site should be accessible if you edit your hosts file to redirect to it by adding the following line to /etc/hosts:

[siteip] trac.yourdomain.com

However, you probably want to be able to access it from any computer without modifying your hosts file. To do this, you need to go into whatever DNS manager you use and add an entry for trac.yourdomain.com that points to your server.

This ended up being longer than I expected, so I’ll cover mediawiki and reviewboard in later posts. Let me know if anything here is incorrect or more should be covered.

Programming Clojure

Saturday, October 31st, 2009

I read Programming Clojure a few months ago; I meant to write a review of it then, but I was busy and then forgot. Better late than never, though, so I’m going to do it now. Clojure, as you may know, is a Lisp dialect that runs on the JVM. It is a very modern Lisp - Clojure 1.0 celebrated it’s first anniversary recently. It can integrate with Java libraries, nullifying the complaints that Lisp doesn’t have enough libraries, and has very good concurrency support.

Programming Clojure was a very good book for helping to get started with Clojure. I had already done a few small projects with it, but hadn’t used it in any substantial way. This book covered all of the features of Clojure, including all of the neat concurrency stuff that I probably wouldn’t have run into on my own for a while. This book provides a good overview of all the features of Clojure, as well as simple examples of how to use them. I still refer to it when writing Clojure code.

In addition to talking about the language features in isolation, there is an overarching project throughout the book that is used to show the features in a real application. This is pretty interesting, even though you’d probably never have to write a build system in real life. This project starts simpler, and then gradually uses more of the concepts you learn in each chapter, making it a great way to follow along and ensure you actually understood what you read in the book. It’s fairly short; I recommend everyone who reads the book should make sure they understand the project.

I’d highly recommend this book if you are interested in learning or getting better at Clojure. It will help you understand how to use Clojure to create real projects. It explains how to use clojure, some of the issues you may initially have with them (printing in lazy sequences, for example), and how to overcome them.



Org-mode

Wednesday, October 28th, 2009

I recently started to use org-mode, a emacs mode used for scheduling and organizing your tasks. I’ve really started to like it; I now do all my scheduling using it, instead of my previous Google Calendar.

Org-mode files are essentially trees, with * as the separator. An example is:

* Heading 1
** Subheading
some notes
more notes
** Subheading
* Heading 2

You can easily cycle the visibility using TAB. This is it’s basic note-taking mode; if you go far enough in, it’s quite complicated. For example, tasks can be scheduled, occur at a time, or have a deadline. You can insert deadlines for the current task with C-c C-d, schedule a task to begin with C-c C-s, or just add a timestamp to an item manually. Any of these can be recurring. These will show up on your agenda, which I’ll talk about shortly. An example with all of these types of timestamps is:

* Heading 1
** Subheading
<2009-10-19 Mon +1w>
some notes
more notes
** Subheading
TODO: <2009-10-21 Wed>
** Subheading 3
SCHEDULED: <2009-10-23 Fri>
* Heading 2

For a more detailed explanation of times with org-mode, read this.

Org-mode also has an ‘agenda’ view, which takes all the org files you’ve specified as agenda files and displays a timeline of them. This is one of my most used features; It allows you to easily see your schedule for a given day, see the list of items you have to do, and in general acts as a calendar. You can customize the various views; I haven’t done much of this, but you should read the org-mode manual in order to learn about this and other uses of org-mode.

To get started with org mode, you should require it:

(require 'org)

I set a few keybindings in order to make org easier to use. I remap left and right to raise the tree one level and indent the tree one level respectively. I map RETURN to add a new item; If i want a linebreak, I can still use C-j. I needed to use <, and it was bound to something I didn't care about, so I bound it to self-insert-command. Org-agenda switches to agenda mode, and I wanted it to be easily accessed.

(define-key org-mode-map (kbd "RET") 'org-meta-return)
(define-key org-mode-map (kbd "<left>") 'org-metaleft)
(define-key org-mode-map (kbd "<right>") 'org-metaright)
(define-key org-mode-map (kbd "<" ) 'self-insert-command)
(global-set-key "\C-ca" 'org-agenda)

There’s a few customizations I add to org to make it a smoother experience. The first thing is to turn off flyspell-mode. This is unfortunate; it could be useful, but as-is it just interferes with org-mode’s highlighting. I also set org-mode to log the time when I mark items as done. I set the list of my agenda files, and the amount of time in advance to start warning me of deadlines as 10 days. The last one is to set pabbrev to not activate; it interferes with org’s expanding and hiding of trees.

(add-hook 'org-mode-hook #'(lambda () (flyspell-mode -1)))
(setq org-log-done 'time)
(setq org-agenda-files '("~/org/technical.org" "~/org/designproject.org" "~/org/personal.org" "~/org/jobs.org" "~/org/martialarts.org" "~/org/emacs.org" "~/org/school.org" "~/org/financial.org"))
(setq org-deadline-warning-days 10)
 
(defadvice pabbrev-global-mode (around org-stop-pabbrev activate)
  (unless (eq major-mode 'org-mode)
    ad-do-it))

These functions are utilities used in publishing my agenda buffer. They work to generate a string describing the section number of the current bullet in an org buffer. These are required because exporting an org file to HTML will use these values as section numbers - to hyperlink to them in the agenda buffer, we will need to know this. There may already be functions to perform this, but I couldn’t find them.

    )))
 
(defun org-full-sections (&optional pos)
  "Returns a list coresponding to the full section number at pos"
  (interactive)
  (save-excursion
    (if pos (goto-char pos))
    (let* ((retn 1)
           (curnum (org-current-section-number))
           (retlist (list curnum)))
      (condition-case nil
          (while t
            (progn
              (outline-up-heading 1)
              (setq retlist (append (list (org-current-section-number)) retlist))))
        (error retlist))
      retlist)))
 
(defun org-full-sections-string (&optional pos)
  "Returns a string corresponding to the section at pos"
  (interactive)
  (substring (reduce (lambda (x y) (concat x "." (number-to-string y)))
                     (org-full-sections)
                     :initial-value "") 1))

Sometimes, I’m not on my primary computer, which has all of my agenda-files, and still want to be able to look at my schedule. To do this, the logical way would be to store my agenda as a HTML file that I can access from anywhere. Org has a few ways to publish to HTML, but I could not find how to export the agenda buffer and files, hyperlinked so that it emulates the actual agenda. I ended up writing a rather ugly method to do it myself; the results are below.

(defun org-publish-agenda ()
  (interactive)
  (save-window-excursion
    (mapcar (lambda (file)
              (find-file file)
              (org-export-as-html 3)
              (kill-buffer))
            org-agenda-files)
    (org-agenda 0 "a")
    (org-agenda-month-view)
    (let ((html-buffer (htmlize-buffer (get-buffer "*Org Agenda*")))
          (agenda-buffer (get-buffer "*Org Agenda*")))
      (switch-to-buffer html-buffer)
      (goto-char (point-min))
      (search-forward "<body>")
      (let ((line-start (line-number-at-pos)))
        (while (< (point) (point-max))
          (beginning-of-line)
          (cond
           ((line-matches "org-agenda-structure") (forward-line))
           ((line-matches "org-agenda-dat") (forward-line))
           ((line-matches "org-time-grid") (forward-line))
           ((line-matches " *\\([^:]+\\):")
            (let ((calendar (after-last " "(match-string 1))))
              (let ((agenda-line-no (1- (- (line-number-at-pos) line-start))))
                (switch-to-buffer agenda-buffer)
                (goto-line agenda-line-no)
                (let* ((marker (or (get-text-property (point) 'org-marker)
                                   (org-agenda-error)))
                       (buffer (marker-buffer marker))
                       (pos (marker-position marker)))
                  (switch-to-buffer buffer)
                  (goto-char pos)
                  (setq sec-string (concat "sec-" (org-full-sections-string)))
                  (switch-to-buffer html-buffer)
                  ))
              (insert (concat "<a href=\"" calendar ".html#" sec-string "\">"))
              (end-of-line)
              (insert "</a>")
              (forward-line)))
           (t (forward-line))
           )))
      (write-file "~/org/agenda.html")
      (let ((default-directory "~/org/"))
        (shell-command "mv *.html publish/")
        (kill-buffer "agenda.html")
        ))))
 
(defun org-publish-agenda-to-site ()
  (interactive)
  (save-window-excursion
    (org-publish-agenda)
    (shell-command "scp -r ~/org/publish nflath.com:/home/nflath/public_html/nflath.com/ &")))
 
(setq agenda-update-timer (run-with-timer 0 3600 'org-publish-agenda-to-site))

These functions require org-publish-location to be defined to the value you wish the HTML to be stored. This isn’t the best code; it assumes that your agenda files are in ~/org, and that the directory ~/org/publish exists, but this should be easy to fix if this is not where you store your agenda files.

org-publish-agenda works by modifying the results of calling htmlize-buffer on the org-agenda; htmlize.el is a package that can turn arbitrary buffers into HTML. It is sometimes useful when you want to export your view of a file, such as in this case. To use it, you have to put the following in your initialization file:

(require 'htmlize)

Google Collections

Monday, October 26th, 2009

Google Collections - Part 1
Google Collections - Part 2

This Google Tech Talk covers the Google Collections library for Java, a library I was introduced to on my last work term. It provides some useful data structures that I’ve had uses for before, as well as utility methods that make working with the standard Java classes easier. This talk gave me a much better overview of what precisely was in these libraries.

Google Collections was open-sourced by Google, so you can also use it. It’s hosted on google code at http://code.google.com/p/google-collections/. It was programmed for JDK 1.5, although being Java it is forward-compatible with Java 1.6. It’s widely used internally at Google and other large companies, so it is very well-tested. It is an extension to the Java Collections library.

The first set of data structures added by Google Collections are Immutable collections. Google provides ImmutableSet, ImmutableMap, ImmutableList, some other immutable data structures and differing implementations of these. These are guarantied to be immutable, and are slightly faster and use less memory. There is also UnmodifiableIterator, which is an iterator that does not supply remove() in order to get a view of a mutable collection. Immutable data structures are made to be initialized - Sets can be created with ImmutableSet.of( Element_1, Element_2, … ), and Immutable Maps are constructed with the following syntax:

ImmutableMap.Builder<Integer, Integer> builder = new ImmutableMap.Builder<Integer, Integer>();
return builder.put( 1, 2 ).put( 2, 3 ).put( 3, 4 ).build();

There is only one problem I’ve had with the Immutable data structures, which more a fault of the ndesign of the Java Collections Library than Google. Since modification methods of most collections return void - like List.add() - Immutable collections throw exceptions when these methods are called on them. In several cases, it would make my life much easier if these methods returned new ImmutableLists without modifying the old one, but I suppose this could be constructor. These data structures are guaranteed to be immutable - All constructors for them are private, so they cannot be subclassed by others who subvert this immutability.

There are a few caveats to the Immutable structures - they cannot store Null, which I’m really quite OK with, and there isn’t a way to enforce deep immutability. This means that if you have a ImmutableList of some data structure, you can retrieve the structures and modify them - just not the contents of the list itself.

Google Collections also introduces a few new types of data structures. One of these is the Multiset, which is a Set that is unordered and allows duplicates. The four main methods on it are count, add, remove, and setCount. There are 6 different implementations of Multiset optimized for various performance characteristics.

Mulimaps are also introduced. These are one of the most usefule things introduced by this library; I’ve had to implement these before, usually in a more unsafe way. Multimaps store many-many relationships; They are Maps, except calls to get() will retrieve a list of values that is added to when you put() a key-value pair. These also do the logical thing of returning empty collections when you attempt to get a value that isn’t in the map, not return null. There are 5 different implementations of Multimap, which can return different types of Collection from Get - ListMultimap, SetMultimap, SortedSetMultimap are a few of these.

Bimaps are also introduced. Bimaps are like Maps which require values to be unique - this has the benefit of being able to map from value->key as well as key->value. This has 3 implementations. Finally, a MapMaker class that will construct a ConcurrentMap. MapMaker allows you to specifie whether you want strong, weak, or soft keys and values, for a total of 9 combinations. It is fully concurrent, and can be used as a ConcurrentMap once you call makeMap(). It also has a makeComputingMap, which allows you to specify a function that will compute the value to be returned.

Another very useful feature Google Collections provides is static factory methods to create all types provided by their library, as well as Java classes. This cuts down the verbosity, due to Java’s type inference for static methods - you can do

Map< Integer, Boolean > m = Maps.newHashMap();

instead of

Map< Integer, Boolean > m = new HashMap< Integer, Boolean >();

Ordering, a better Comparator, is also provided. You can generate an Ordering using Ordering.forComparator( Comparator ), which provides an Ordering object for you to use. Orderings provide a bunch of utility methods - min, max, isIncreasing, and a bunch of others. Orderings work whenever Comparators are expected, so they are pretty much strictly better than Comparators.

Google Collections has as a design philosophy that methods work on Iterators whenever possible. This helps in the cases where you have more data than can fit in memory. To improve iterators, they provide the classes Iterators and Iterables, which provide utility methods for working with Iterators and Iterables. Some of these are transform, filter, and concat - you should look in the documentation for a full list.

The Google Collections library is extensively unit tested. A framework for creating tests was created, resulting in 25000 unit tests generated from a few thousand base tests. Running standard Java Collections classes through this framework has found bugs in these - they are quite thorough.

Having used the Google Collections library, I can testify that if you are programming Java, you should be using it. It makes your life a lot easier in pretty common cases, and some of the more esoteric stuff is always there if you need it. This talk was very interesting, and introduced me to some of the Library I was previously unaware of.