Save-Visited-Files v1.2

January 4th, 2010

A while ago, I released save-visited-files-mode, a mode I wrote to keep files open between Emacs sessions. A few people including me have been using it since then; I’ve written about it before here and here. A few weeks ago, Jonathan Kotta emailed me a patch that fixed a few minor issues, as well as cleaned up the code. After applying the patch, I’m now releasing version 1.2. The changes from version 1.1 are as follows:

  • Changed the default location of the persistence file so a new directory doesn’t have to be created.
  • Makes sure that save-visited-files-location is writable; if not, displays an error.
  • Doesn’t print a message every time it saves the file list.
  • Miscellaneous code cleanup.

I’ve also moved save-visited-files-mode.el to a github repository, located at http://github.com/nflath/save-visited-files.This is a much better solution than keeping a copy just in my emacs customization subversion, since I am distributing this as a stand-alone project. I’ll continue posting updates here, as well, but the source will be committed to that repository.

CUDA setup

December 24th, 2009

I’ve started playing around with CUDA, attempting to make a ray-traced pacman clone. Instructions for how to set up the CUDA SDK on Ubuntu are located at http://lifeofaprogrammergeek.blogspot.com/2008/05/cuda-development-in-ubuntu.html; There were a few others instructions that I tried that didn’t end up working. As a note to all of you on Ubuntu 9.10: CUDA does not work with the version of GCC that comes with Ubuntu 9.10. To be able to do CUDA development, you need to downgrade your GCC version to 4.3.

Once the SDK is set up, you can start coding for CUDA. The most useful documents for helping me do this so far have been the CUDA Programming Guide, NVidia’s official guide and reference manual to CUDA, and Dr. Dobb’s CUDA: Supercomputing from the Masses. These have been enough to get me started; currently, my Pacman clone is having some memory issues that causes my program to work in device emulation mode, but I’ve gotten their simpler examples working and understand the basics.

For editing CUDA files, I use cuda-mode. This is an extension to c-mode that provides syntax highlighting for the new constructs, such as __device__ and __global__. It doesn’t do much else, but CUDA shares enough syntax(virtually all of it) with C so that cc-mode is sufficient for editing it. Once you’ve added cuda-mode.el to your load-path, all you have to do to enable it is add the following to your initialization file:

(require 'cuda-mode)

Languages Matter

December 21st, 2009

Languages Matter
http://www.youtube.com/watch?v=ix2DeCzuckc

The world is driven by computers, but most people don’t love them -
they consider them mere tools. The people who love computers,
programmers, are the ones who change the world. Languages matter,
since all software is written in some language, and computers don’t
work without software.

Similarly to the issue with computers, most programmers don’t love
languages. However, languages are important to programmers - they help programmers think, five inspiration, and affect productivity. Different languages will enlighten programmers in different ways - BASIC did not teach me recursion.

In summary, you should be happy using a language.

Useless-Variable Elimination

December 18th, 2009

Useless-Variable Elimination
Olin Shivers

Useless-Variable Elimination(UVE) is a optimization to remove unnecessary computations from Scheme programs. A useless variable is one that contributes nothing to the final outcome of the computation. We can remove these, and the computations creating them, with no loss. This is easy to detect in simple cases; however, in more complex cases(possibly including recursion), this is harder to detect. An example is a simple factorial function which takes an extra ‘bogus’ argument that is continually square rooted; it is not used in computing the factorial, but it is harder to tell it is unused.

UVE is often used to clean up after other optimizations. For example, in continuation-passing style intermediate forms, the continuation is often passed around a loop needlessly.

Useless variables are found by backwards flow analysis. First, variables who are trivially useful - such as variables returned as the final result - are added to the set of useful variables. The control-flow structure of the code is then traced backwards to mark all variables that contributed to this variable. Once we have marked all useful variables, the unmarked variables are useless.

Once we have found useless variables, we do two things; remove all references to useless variables and remove useless variables from lambda lists. In the first step, we can simply replace all useless variables to #f instead of whatever computation was used to define them. The second step is slightly harder, as we cannot delete variables from external calls; a value is still passed in, even if it is unused. This leads to two recursive dependencies to determine when it is safe to remove variables from parameter lists; you can read these in Shiver’s paper. Using these techniques, we can ‘fix’ the factorial example, removing the extra parameter and useless square root calculation.

C customizations

December 14th, 2009

I’ve ben doing a decent amount of C and C++ coding recently - both for courses and for hacking on emacs. I ended up doing a bit of work on my C configuration, though probably not as much as I should.

Part of these efforts included improving h-file-create, the function that is automatically run whenever I create a new header file. Previously, it only created the #ifndef guards; While these are the only things I’ll always need, the vast majority of time I also wanted to define a class, and so I programmed the class to be inserted automatically.

(defun h-file-create ()
  "Create a new h file.  Insert a infdef/define/endif block"
  (interactive)
  (if (or (equal (substring (buffer-name (current-buffer)) -2 ) ".h")
          (equal (substring (buffer-name (current-buffer)) -4 ) ".hpp"))
      (let* ((buffer-name (buffer-name (current-buffer)))
             (class-name (substring buffer-name 0 (string-match "\\." buffer-name))))
      (when (equal "" (buffer-string))
          (insert "#ifndef "
                  (upcase class-name)
                  "_H\n#define "
                  (upcase class-name) 
                  "_H\n\nclass "
                  (capitalize class-name)
                  " {\npublic:\n\n\nprivate:\n\n\n};"
                  "\n\n#endif")
        (search-backward "public:\n")
        (next-line)))))

Another function I ended up needing was to insert includes to header files or forward declarations of classes. Previously, I’d have to go to the top of my file, interrupting whatever I was doing, and maneuver back. This was just an annoyance, so I wrote the following functions that would either include or forward declare something, defaulting to the current word at point. Since whether I #included or forward declared often depended on whether I was in a header or implementation file, so I wrote another function that delegates to one of the other two depending on which file you were in. String-match-any is a utility function I wrote to help do this.

(defun string-match-any (regexp-list string &optional start)
  "Returns whether the given string, starting at position start,
matches any regexp in the list"
  (if regexp-list
      (let ((result (string-match (car regexp-list) string start)))
            (if result
                result
              (string-match-any (cdr regexp-list) string start)))))
 
(defun-my c-include
  (interactive)
  (save-excursion
    (goto-char (point-min))
    (if (search-forward "#include" nil t)
        (progn (beginning-of-line)
               (insert (concat "#include \"" (downcase arg) ".hpp\"\n")))
      (search-forward "\n\n")
      (insert "#include \"" (downcase arg) ".hpp\"\n\n"))))
 
(defun-my c-forward-declare
  (interactive)
  (save-excursion
    (goto-char (point-min))
    (if (re-search-forward "class [a-zA-Z]*;" nil t)
        (insert (concat "\nclass " (capitalize arg) ";"))
      (search-forward "\n\n")
      (insert "class " (capitalize arg) ";\n\n"))))
 
(defun-my c-fwdinclude
  (interactive)
  (let ((buffer-name (buffer-name (current-buffer))))
    (if (string-match-any (list "\.h" "\.hpp") buffer-name )
        (c-forward-declare arg)
      (c-include arg))))
(define-key c-mode-base-map (kbd "C-c c f") 'my-c-fwdinclude)

I’ve also been using c-eldoc-mode more. It’s pretty awesome, but there is one problem with it: its speed. On a large project like emacs, it will frequently lock the input for some time. It’s useful enough that I’d for it to be faster. If there’s a later version that fixes this that you know of, please tell me; the latest I’ve found is v0.4. Otherwise, I’m going to look into it further to see if I can figure out how to speed it up some more.

Winter plans: Real-Time Raytraced Pacman

December 10th, 2009

As part of the raytracer I wrote for my CS488 (graphics) course, I rendered a pacman scene. This was fun enough that I’ve decided one of my projects over this next work term will be to try and write a real-time raytraced pacman clone, almost certainly using CUDA. I haven’t used CUDA before, so I’ll undoubtedly end up learning a lot. The goal will be to be able to render it at 30FPS on my laptop, which has a NVidia 8500 as a graphics card. I don’t know how feasible this is, but I can try.

While my raytracer had a few performance enhancements, it still took about three minutes to render my final scene. Clearly, this isn’t nearly fast enough, and I don’t think I’ll be able to get anywhere close to necessary on my CPU; an order of magnitude *may* be possible, but certainly not three. The natural solution is to try using the GPU; CUDA seems to be the most popular way of doing this, and I’ve wanted to learn it for a while.

A few additional things I’m going to have to add to my raytracer while porting it to CUDA:
CSG and better KD-Trees, for one. I didn’t end up implementing constructive solid geometry at all, although I have a pretty good idea of how to do that. Unfortunately, it’s going to require a decent amount of refactoring of my raytracer; currently, my Intersection objects only store one intersection point; this will have to change to store ranges of points. I may not have to end up doing this, depending on how much it helps; I’d probably only end up using it to render Pacman, who was a polygon I created using CSG in blender for my project.

The other main optimization I’ll probably have to make is to improve my KD-Trees. I used a good traversal algorithm for my project (Havran’s T_REC_B from his thesis, if you were wondering). However, I use the naive median tree-splitting algorithm instead of implementing SAH. Implementing a correct tree for a scene as complex as a Pac-Man level will probably give me enough of a performance boost that I’ll want to do this properly.

Those are my current concrete plans; I probably won’t be able to work on it very much for the next week or so due to finals, but I may be able to get started. If you know of any good resources, or have any advice, please let me know.

Pretty Pictures

December 7th, 2009

I haven’t had enough time to write a post for the last week, so instead I’ll post a link to the pictures the raytracer I was working on generated.

http://www.student.cs.uwaterloo.ca/~nflath/website

Java Imenu

November 30th, 2009

My patch to CC-mode, which I talk about here, was finally accepted and submitted. The final regex turns out to be:

 
(defvar cc-imenu-java-generic-expression
  `((nil
     ,(concat
       "[" c-alpha "_][\]\[." c-alnum "_<> ]+[ \t\n\r]+" ; type spec
       "\\([" c-alpha "_][" c-alnum "_]*\\)" ; method name
       "[ \t\n\r]*"
       ;; An argument list that is either empty or contains any number
       ;; of arguments.  An argument is any number of annotations
 
       ;; followed by a type spec followed by a word.  A word is an
       ;; identifier.  A type spec is an identifier, possibly followed
       ;; by < typespec > possibly followed by [].
       (concat "("
               "\\("
                 "[ \t\n\r]*"
		 "\\("
		   "@"
		   "[" c-alpha "_]"
		   "[" c-alnum "._]*"
		   "[ \t\n\r]+"
		 "\\)*"
		 "\\("
		   "[" c-alpha "_]"
		   "[\]\[" c-alnum "_.]*"
		   "\\("
		     "<"
		     "[ \t\n\r]*"
		     "[\]\[.," c-alnum "_<> \t\n\r]*"
		     ">"
		   "\\)?"
                   "\\(\\[\\]\\)?"
                   "[ \t\n\r]+"
                 "\\)"
                 "[" c-alpha "_]"
                 "[" c-alnum "_]*"
                 "[ \t\n\r,]*"
               "\\)*"
               ")"
               "[.," c-alnum " \t\n\r]*"
               "{"
               )) 1))
  "Imenu generic expression for Java mode.  See `imenu-generic-expression'.")

You can install this in your emacs initializations for now; it will be in the 23.2 release. I’m glad it finally made it in - it will make my personal customizations slightly shorter.

As a result of this patch, I was also asked to join the cc-mode project in order to bring their Java support up to 1.6 - it’s unfortunately lagged behind, this patch being one example. If you’ve had any similar issue, please let me know!

3B Courses

November 26th, 2009

I decided to write a little bit about the courses I have this term. Overall, I’m pretty happy with them, although Graphics is the only one I have nothing to complain about. Since the term is nearly over, it seemed like a good idea to reflect on these.

Software Requirements and Specification - SE 457
This course wasn’t one I was particularly looking forward to, but it turned out to be better than I expected. Our prof turned out to be quite good, and while the material isn’t the most interesting I’ve ever learned, but it isn’t as boring as I feared. The assignments are unfortunately not that interesting and a lot of work; they involve reverse-engineering the graduate admissions system. I feel the course could be better if it was more integrated with our design project course. The assignments could be to gather requirements and document the design of our project, instead of something fairly useless.

Graphics - CS 488
This course has been a lot of fun so far, but also lives up to its reputation of being a lot of work. Our professor is good, so the lectures are pretty interesting. The assignments are difficult, starting from 3D tetris and going up to a raytracer, but they are not difficult to get good marks in if you just put in the work - the grading scheme is known and pretty objective. I wasn’t sure how I was going to like this course, not having any real graphics experience previously, but I’m glad I took it. However, I dislike the lack of midterm, since it means there is no indication of what the final will be like.

Introduction to Database Management - CS 348
This course hasn’t been particularly interesting so far. Our prof is decent, which helps, but the material is pretty basic. The assignments are fair, and were representative of what was on the midterm. I probably wouldn’t have taken this class if it weren’t required, but at least it’s not actively bad and is instead just boring.

Concurrency - CS 343
This course is a bit of a mixed bag. The prof is rather mediocre, and a lot of the value in the assignments is obscured by the use of uC++. For those of you that don’t know, uC++ is a C++ dialect that is used by essentially nobody. I really disagree with the use of uC++ for this; I think it should be taught in a higher-level language. As it is, the majority of the time in the assignments is spent debugging C++ issues instead of dealing with actual concurrency; even Java wuoldn’t have nearly as many problems in this. Since uC++ doesn’t use concurrency like any other language, the material from the assignments is also not particularly useful, and as long as this is going to be the case the assignments should be about concurrency, in my opinion. Some of the material is pretty interesting, as long as we are talking about the general form of the concurrency and not uC++, but this isn’t nearly as often as I’d like.

Formal Languages and Parsing - CS 462
The topics covered in this course are really interesting. I like a lot of the theoretical aspects of CS, and this course definitelly is on the very theoretically side. Unfortunately, the lecturer we have is fairly bad, which is disappointing as this was the class I was looking forward to most. The assignments have so far been pretty reasonable, although several of the problems are straight from the book. Again, the lack of midterm means there isn’t necessarily a good way to prepare for the final, but the final is take-home at least.

Design Project - SE 390
I’m still not entirely sure what I think about this course. The entire course revolves around a group project which we pick and continue developing for the next 3 school terms. This is a very cool idea, and a project of this length has brought up some issues that don’t generally come up in school assignments. The actual deliver ables for the project this term are very ill-defined, which is pretty annoying since we are getting an objective grade. The lectures, while entertaining, are also more or less useless - Paul Ward mostly just talks about whatever he wants. It seems a lot like an applied entrepreneurship course, which is neat, but I’m not sure it should be a required class.

On building fast kd-Trees for Ray Tracing, and on doing that in O(N log N)

November 23rd, 2009

On building fast kd-Trees for Ray Tracing, and on doing that in O(N log N)
Ingo Wald, Valstimil Havran

Kd-trees seem to be the most widely used technique for spatial subdivisioning. They are believed to be the best known method for fast ray tracing. The Surface Area Heuristic is the construction technique usually used, and how to building efficient ones and traversing them quickly is well understood. However, constructing them is not as well understood, which is what this paper hopes to analyze.

All kd-tree construction schemes follow the same recursive scene:

function RecBuild( triangles T, voxel V ) returns node
    if Terminate(T,V) then:
        return new leaf node( T )
    p = FindPlane( T, V )
    (V_L, V_R ) = Split V with p
    T_L = all members of T on one side of  V_L
    T_R = all members of T on the other side of V_L
    return new node( p, RecBuild(T_L, V_L), RecBuild(T_R, V_R) )

The difference between a good and a naively build kd-tree is often a factor of two or more. The naive method often used, the “spatial median” method, is to just alternate axis and the plane being split at the spatial median of the voxel.

The Surface Area Heuristic(SAH) tends to perform quite well. It makes several assumptions:

  • Rays are uniformly distributed, infinite lines.
  • The cost for both a traversal and a triangle intersection are known.
  • The cost of intersecting N triangles is roughly NK.

The expected cost for a given plane p then is one traversal step, plus the cost of intersecting the two children. Usually, instead of minimizing the cost of traversing the entire tree, a greedy version of the heuristic is used which minimizes it at each level, due to efficiency reasons. We also need to define a termination criterion; when to stop the recursive build: SAH gives us an elegant one, which is if the cost of this node as a leaf is better than even the best split.

The SAH tends to work best, and only a few modifications are known to consistently yield better improvements. One of these is to favor splits that cut off empty space. Requiring the tree to be split a certain number of times can prevent the tree from being stuck in a local minnima, but this number is hard to determine for general scenes.

There are several ways to construct Kd-trees once you have decided upon the cost function. The mmost trivial way is to iterate all triangles, determine their possible split candidates, and compare the cost for each one. This will end up taking O(N^2) operations, where N is the number of triangles in the scene. This is often too slow.

There is also a widely-known algorithm for constructing SAH trees in O(N log^2 N). I’m not going to summarize the algorithm here: if you are interested, read the paper. It essentially ’sweeps’ a plane, calculating costs incrementally. This is a huge improvement, but as the theoretical lower bound is O(N log N) we wish to be better.

This paper introduces a O(N log N) method of building a kd-tree. What caused the previous algorith to be O(N log^2 N ) instead of O(N log N ) is having to sort once per partition. This algorithm improves on that by sorting the evet list only once. This involves having to events for all dimensions. Again, I’m not going to recap specifics of the algorithm here.

The two algorithms were then compared in actual time, and not just asymptotic behavior. Realistic models of varying size were used. The O(N log N) variant is concisely faster than the O(N log^2 N) variant by a factor of 2-3.5. Analysis shows that the O(N log^2 N) variant spends more time re-sorting than the other variant uses to build the entire tree.