Useless-Variable Elimination

December 18th, 2009

Useless-Variable Elimination
Olin Shivers

Useless-Variable Elimination(UVE) is a optimization to remove unnecessary computations from Scheme programs. A useless variable is one that contributes nothing to the final outcome of the computation. We can remove these, and the computations creating them, with no loss. This is easy to detect in simple cases; however, in more complex cases(possibly including recursion), this is harder to detect. An example is a simple factorial function which takes an extra ‘bogus’ argument that is continually square rooted; it is not used in computing the factorial, but it is harder to tell it is unused.

UVE is often used to clean up after other optimizations. For example, in continuation-passing style intermediate forms, the continuation is often passed around a loop needlessly.

Useless variables are found by backwards flow analysis. First, variables who are trivially useful - such as variables returned as the final result - are added to the set of useful variables. The control-flow structure of the code is then traced backwards to mark all variables that contributed to this variable. Once we have marked all useful variables, the unmarked variables are useless.

Once we have found useless variables, we do two things; remove all references to useless variables and remove useless variables from lambda lists. In the first step, we can simply replace all useless variables to #f instead of whatever computation was used to define them. The second step is slightly harder, as we cannot delete variables from external calls; a value is still passed in, even if it is unused. This leads to two recursive dependencies to determine when it is safe to remove variables from parameter lists; you can read these in Shiver’s paper. Using these techniques, we can ‘fix’ the factorial example, removing the extra parameter and useless square root calculation.

C customizations

December 14th, 2009

I’ve ben doing a decent amount of C and C++ coding recently - both for courses and for hacking on emacs. I ended up doing a bit of work on my C configuration, though probably not as much as I should.

Part of these efforts included improving h-file-create, the function that is automatically run whenever I create a new header file. Previously, it only created the #ifndef guards; While these are the only things I’ll always need, the vast majority of time I also wanted to define a class, and so I programmed the class to be inserted automatically.

(defun h-file-create ()
  "Create a new h file.  Insert a infdef/define/endif block"
  (interactive)
  (if (or (equal (substring (buffer-name (current-buffer)) -2 ) ".h")
          (equal (substring (buffer-name (current-buffer)) -4 ) ".hpp"))
      (let* ((buffer-name (buffer-name (current-buffer)))
             (class-name (substring buffer-name 0 (string-match "\\." buffer-name))))
      (when (equal "" (buffer-string))
          (insert "#ifndef "
                  (upcase class-name)
                  "_H\n#define "
                  (upcase class-name) 
                  "_H\n\nclass "
                  (capitalize class-name)
                  " {\npublic:\n\n\nprivate:\n\n\n};"
                  "\n\n#endif")
        (search-backward "public:\n")
        (next-line)))))

Another function I ended up needing was to insert includes to header files or forward declarations of classes. Previously, I’d have to go to the top of my file, interrupting whatever I was doing, and maneuver back. This was just an annoyance, so I wrote the following functions that would either include or forward declare something, defaulting to the current word at point. Since whether I #included or forward declared often depended on whether I was in a header or implementation file, so I wrote another function that delegates to one of the other two depending on which file you were in. String-match-any is a utility function I wrote to help do this.

(defun string-match-any (regexp-list string &optional start)
  "Returns whether the given string, starting at position start,
matches any regexp in the list"
  (if regexp-list
      (let ((result (string-match (car regexp-list) string start)))
            (if result
                result
              (string-match-any (cdr regexp-list) string start)))))
 
(defun-my c-include
  (interactive)
  (save-excursion
    (goto-char (point-min))
    (if (search-forward "#include" nil t)
        (progn (beginning-of-line)
               (insert (concat "#include \"" (downcase arg) ".hpp\"\n")))
      (search-forward "\n\n")
      (insert "#include \"" (downcase arg) ".hpp\"\n\n"))))
 
(defun-my c-forward-declare
  (interactive)
  (save-excursion
    (goto-char (point-min))
    (if (re-search-forward "class [a-zA-Z]*;" nil t)
        (insert (concat "\nclass " (capitalize arg) ";"))
      (search-forward "\n\n")
      (insert "class " (capitalize arg) ";\n\n"))))
 
(defun-my c-fwdinclude
  (interactive)
  (let ((buffer-name (buffer-name (current-buffer))))
    (if (string-match-any (list "\.h" "\.hpp") buffer-name )
        (c-forward-declare arg)
      (c-include arg))))
(define-key c-mode-base-map (kbd "C-c c f") 'my-c-fwdinclude)

I’ve also been using c-eldoc-mode more. It’s pretty awesome, but there is one problem with it: its speed. On a large project like emacs, it will frequently lock the input for some time. It’s useful enough that I’d for it to be faster. If there’s a later version that fixes this that you know of, please tell me; the latest I’ve found is v0.4. Otherwise, I’m going to look into it further to see if I can figure out how to speed it up some more.

Winter plans: Real-Time Raytraced Pacman

December 10th, 2009

As part of the raytracer I wrote for my CS488 (graphics) course, I rendered a pacman scene. This was fun enough that I’ve decided one of my projects over this next work term will be to try and write a real-time raytraced pacman clone, almost certainly using CUDA. I haven’t used CUDA before, so I’ll undoubtedly end up learning a lot. The goal will be to be able to render it at 30FPS on my laptop, which has a NVidia 8500 as a graphics card. I don’t know how feasible this is, but I can try.

While my raytracer had a few performance enhancements, it still took about three minutes to render my final scene. Clearly, this isn’t nearly fast enough, and I don’t think I’ll be able to get anywhere close to necessary on my CPU; an order of magnitude *may* be possible, but certainly not three. The natural solution is to try using the GPU; CUDA seems to be the most popular way of doing this, and I’ve wanted to learn it for a while.

A few additional things I’m going to have to add to my raytracer while porting it to CUDA:
CSG and better KD-Trees, for one. I didn’t end up implementing constructive solid geometry at all, although I have a pretty good idea of how to do that. Unfortunately, it’s going to require a decent amount of refactoring of my raytracer; currently, my Intersection objects only store one intersection point; this will have to change to store ranges of points. I may not have to end up doing this, depending on how much it helps; I’d probably only end up using it to render Pacman, who was a polygon I created using CSG in blender for my project.

The other main optimization I’ll probably have to make is to improve my KD-Trees. I used a good traversal algorithm for my project (Havran’s T_REC_B from his thesis, if you were wondering). However, I use the naive median tree-splitting algorithm instead of implementing SAH. Implementing a correct tree for a scene as complex as a Pac-Man level will probably give me enough of a performance boost that I’ll want to do this properly.

Those are my current concrete plans; I probably won’t be able to work on it very much for the next week or so due to finals, but I may be able to get started. If you know of any good resources, or have any advice, please let me know.

Pretty Pictures

December 7th, 2009

I haven’t had enough time to write a post for the last week, so instead I’ll post a link to the pictures the raytracer I was working on generated.

http://www.student.cs.uwaterloo.ca/~nflath/website

Java Imenu

November 30th, 2009

My patch to CC-mode, which I talk about here, was finally accepted and submitted. The final regex turns out to be:

 
(defvar cc-imenu-java-generic-expression
  `((nil
     ,(concat
       "[" c-alpha "_][\]\[." c-alnum "_<> ]+[ \t\n\r]+" ; type spec
       "\\([" c-alpha "_][" c-alnum "_]*\\)" ; method name
       "[ \t\n\r]*"
       ;; An argument list that is either empty or contains any number
       ;; of arguments.  An argument is any number of annotations
 
       ;; followed by a type spec followed by a word.  A word is an
       ;; identifier.  A type spec is an identifier, possibly followed
       ;; by < typespec > possibly followed by [].
       (concat "("
               "\\("
                 "[ \t\n\r]*"
		 "\\("
		   "@"
		   "[" c-alpha "_]"
		   "[" c-alnum "._]*"
		   "[ \t\n\r]+"
		 "\\)*"
		 "\\("
		   "[" c-alpha "_]"
		   "[\]\[" c-alnum "_.]*"
		   "\\("
		     "<"
		     "[ \t\n\r]*"
		     "[\]\[.," c-alnum "_<> \t\n\r]*"
		     ">"
		   "\\)?"
                   "\\(\\[\\]\\)?"
                   "[ \t\n\r]+"
                 "\\)"
                 "[" c-alpha "_]"
                 "[" c-alnum "_]*"
                 "[ \t\n\r,]*"
               "\\)*"
               ")"
               "[.," c-alnum " \t\n\r]*"
               "{"
               )) 1))
  "Imenu generic expression for Java mode.  See `imenu-generic-expression'.")

You can install this in your emacs initializations for now; it will be in the 23.2 release. I’m glad it finally made it in - it will make my personal customizations slightly shorter.

As a result of this patch, I was also asked to join the cc-mode project in order to bring their Java support up to 1.6 - it’s unfortunately lagged behind, this patch being one example. If you’ve had any similar issue, please let me know!

3B Courses

November 26th, 2009

I decided to write a little bit about the courses I have this term. Overall, I’m pretty happy with them, although Graphics is the only one I have nothing to complain about. Since the term is nearly over, it seemed like a good idea to reflect on these.

Software Requirements and Specification - SE 457
This course wasn’t one I was particularly looking forward to, but it turned out to be better than I expected. Our prof turned out to be quite good, and while the material isn’t the most interesting I’ve ever learned, but it isn’t as boring as I feared. The assignments are unfortunately not that interesting and a lot of work; they involve reverse-engineering the graduate admissions system. I feel the course could be better if it was more integrated with our design project course. The assignments could be to gather requirements and document the design of our project, instead of something fairly useless.

Graphics - CS 488
This course has been a lot of fun so far, but also lives up to its reputation of being a lot of work. Our professor is good, so the lectures are pretty interesting. The assignments are difficult, starting from 3D tetris and going up to a raytracer, but they are not difficult to get good marks in if you just put in the work - the grading scheme is known and pretty objective. I wasn’t sure how I was going to like this course, not having any real graphics experience previously, but I’m glad I took it. However, I dislike the lack of midterm, since it means there is no indication of what the final will be like.

Introduction to Database Management - CS 348
This course hasn’t been particularly interesting so far. Our prof is decent, which helps, but the material is pretty basic. The assignments are fair, and were representative of what was on the midterm. I probably wouldn’t have taken this class if it weren’t required, but at least it’s not actively bad and is instead just boring.

Concurrency - CS 343
This course is a bit of a mixed bag. The prof is rather mediocre, and a lot of the value in the assignments is obscured by the use of uC++. For those of you that don’t know, uC++ is a C++ dialect that is used by essentially nobody. I really disagree with the use of uC++ for this; I think it should be taught in a higher-level language. As it is, the majority of the time in the assignments is spent debugging C++ issues instead of dealing with actual concurrency; even Java wuoldn’t have nearly as many problems in this. Since uC++ doesn’t use concurrency like any other language, the material from the assignments is also not particularly useful, and as long as this is going to be the case the assignments should be about concurrency, in my opinion. Some of the material is pretty interesting, as long as we are talking about the general form of the concurrency and not uC++, but this isn’t nearly as often as I’d like.

Formal Languages and Parsing - CS 462
The topics covered in this course are really interesting. I like a lot of the theoretical aspects of CS, and this course definitelly is on the very theoretically side. Unfortunately, the lecturer we have is fairly bad, which is disappointing as this was the class I was looking forward to most. The assignments have so far been pretty reasonable, although several of the problems are straight from the book. Again, the lack of midterm means there isn’t necessarily a good way to prepare for the final, but the final is take-home at least.

Design Project - SE 390
I’m still not entirely sure what I think about this course. The entire course revolves around a group project which we pick and continue developing for the next 3 school terms. This is a very cool idea, and a project of this length has brought up some issues that don’t generally come up in school assignments. The actual deliver ables for the project this term are very ill-defined, which is pretty annoying since we are getting an objective grade. The lectures, while entertaining, are also more or less useless - Paul Ward mostly just talks about whatever he wants. It seems a lot like an applied entrepreneurship course, which is neat, but I’m not sure it should be a required class.

On building fast kd-Trees for Ray Tracing, and on doing that in O(N log N)

November 23rd, 2009

On building fast kd-Trees for Ray Tracing, and on doing that in O(N log N)
Ingo Wald, Valstimil Havran

Kd-trees seem to be the most widely used technique for spatial subdivisioning. They are believed to be the best known method for fast ray tracing. The Surface Area Heuristic is the construction technique usually used, and how to building efficient ones and traversing them quickly is well understood. However, constructing them is not as well understood, which is what this paper hopes to analyze.

All kd-tree construction schemes follow the same recursive scene:

function RecBuild( triangles T, voxel V ) returns node
    if Terminate(T,V) then:
        return new leaf node( T )
    p = FindPlane( T, V )
    (V_L, V_R ) = Split V with p
    T_L = all members of T on one side of  V_L
    T_R = all members of T on the other side of V_L
    return new node( p, RecBuild(T_L, V_L), RecBuild(T_R, V_R) )

The difference between a good and a naively build kd-tree is often a factor of two or more. The naive method often used, the “spatial median” method, is to just alternate axis and the plane being split at the spatial median of the voxel.

The Surface Area Heuristic(SAH) tends to perform quite well. It makes several assumptions:

  • Rays are uniformly distributed, infinite lines.
  • The cost for both a traversal and a triangle intersection are known.
  • The cost of intersecting N triangles is roughly NK.

The expected cost for a given plane p then is one traversal step, plus the cost of intersecting the two children. Usually, instead of minimizing the cost of traversing the entire tree, a greedy version of the heuristic is used which minimizes it at each level, due to efficiency reasons. We also need to define a termination criterion; when to stop the recursive build: SAH gives us an elegant one, which is if the cost of this node as a leaf is better than even the best split.

The SAH tends to work best, and only a few modifications are known to consistently yield better improvements. One of these is to favor splits that cut off empty space. Requiring the tree to be split a certain number of times can prevent the tree from being stuck in a local minnima, but this number is hard to determine for general scenes.

There are several ways to construct Kd-trees once you have decided upon the cost function. The mmost trivial way is to iterate all triangles, determine their possible split candidates, and compare the cost for each one. This will end up taking O(N^2) operations, where N is the number of triangles in the scene. This is often too slow.

There is also a widely-known algorithm for constructing SAH trees in O(N log^2 N). I’m not going to summarize the algorithm here: if you are interested, read the paper. It essentially ’sweeps’ a plane, calculating costs incrementally. This is a huge improvement, but as the theoretical lower bound is O(N log N) we wish to be better.

This paper introduces a O(N log N) method of building a kd-tree. What caused the previous algorith to be O(N log^2 N ) instead of O(N log N ) is having to sort once per partition. This algorithm improves on that by sorting the evet list only once. This involves having to events for all dimensions. Again, I’m not going to recap specifics of the algorithm here.

The two algorithms were then compared in actual time, and not just asymptotic behavior. Realistic models of varying size were used. The O(N log N) variant is concisely faster than the O(N log^2 N) variant by a factor of 2-3.5. Analysis shows that the O(N log^2 N) variant spends more time re-sorting than the other variant uses to build the entire tree.

Supporting Dynamic Languages on the JVM

November 19th, 2009

Supporting Dynamic Languages on the Java Virtual Machine
Author: Olin Shivers

This paper is about enhancements to the JVM that would make it easier to port dynamic languages, such as Scheme, to the JVM. This is desirable due to the fact that there is now a JVM for practically everything; running on the JVM means that the language will run on every architecture it will run on. However, the JVM, while well-designed for speed of Java programs, does not support dynamic languages very well. One example of a language that suffers performance problems from this porting process is Scheme. In Scheme, there must be a uniform representation of data - all types can be in cons cells. Working this into the Java class models means that every type must extend Object, and boxing and unboxing is expensive for primitive types.

The proposal to support immediates is to give pointers to Java objects a low bit of one. Since these are allocated on word boundaries, this does not hurt. This allows a final ImmediateDescriptor class that has 31 bits of state that can quickly be converted to an even integer(or vice versa). This causes no penalty to programs not using them - if a method iis called on them, this would generate a memory alignment exception that the VM can catch, which would still be fast.

There are still problems with issues such as method lookup, for example. The JVM bytecode is well-optimized for Java code, but not necessarily other paradigms. There is a tension in the bytecode between verification and efficiency - we don’t want an unsafe RISC bytecode system, we do want a safe system, but this will make it less efficient in few cases. This tradeoff is made well for Java, but not other languages.

A proposal to fix this is to have some of the bytecode instructions to be linked to C routines at runtime. This allows language implementers to efficiently represent whatever they need. However, this brings up the problem of verification - these C routines may be unsafe. The solution to this is to have some central body, where the JVM will ‘checkout’ the required instructions as told by the language implementer, and warn the user if the requested code is not in this standard and must be checked out from an unverified location.

Acer AspireONE and Dropbox

November 16th, 2009

My ASUS’ motherboard was recently blown, so I ended up getting a netbook to use until I get a more powerful laptop. Since I mostly use my laptop for development and web browsing, I don’t have a very pressing need for a hugely powerful comptuer anyway. I got an Acer ASPIREOne, which was one of the cheapest ones I could find on short notice.

I really should have learned this by now, but you should regularly back up all your files. I had most of my documents in SVN and Dropbox, but losing my current changes to my repositories is pretty annoying. I’m currently copying over files, since my hard drive is OK, but I htink I’m going to move as much as possible onto Dropbox.

Dropbox is a program that is used for backing up and synchronizing files across computers. It acts as just a regular folder on your computer, and when changes are made they are propogated up to Dropbox’s servers. It works offline and can be used with pretty much any OS. The first 2GB you use are free; if you need any more, they charge you.

The one problem I have so far with the netbook so far is the keyboard. It’s understandable that it’s small, but some of the keys are in strange position, makign it more annoying than it should be. For example, there are two ‘\’ keys, neither of which are in the correct position and both in spaces better used by ther keys. The placement of them makes the left shift key to be very small, and the ‘Enter’ button to not be as wide as I’m used to. I’m not sure that this was at all necessary.

Other than that, everything’s working pretty well. The screen is much smaller than I’m used to, which was expected, but is still somewhat disconcrting at first. If you have no reason to have a powerful laptop, I’d recommend just getting a cheap netbook and not spending a huge amount for a really good laptop.

The Clean Code Talks - Unit Testing

November 12th, 2009

The Clean Code Talks - Unit Testing

This Google tech talk is about the benefit of unit tests. It starts out asking one fundamental question: how do you write hard to test code? Despite most developers(myself included) being good at writing hard-to-test code, most aren’t quite sure how to do it. The speaker describes some of these ways; mixing the ‘new’ operator in business logic, looking for things, doing work in the constructor, global state, and deep inheritance, among others. All of these make it hard to isolate specific parts of the code to test - in the deep inheritance case, you’re also testing superclasses; in the other cases, you can’t isolate one function from all the other functions. It’s also hard to test purely procedural code - there are no ’seams’ that can be exploited to isolate specific parts.

The speaker then describes the progression of levels of testing. The highest level are scenario tests. These test the whole application, doing something a user would do and ensuring the correct thing happens. These test the whole app, which makes them slow; if the test fails, it’s also hard to know precisely where the test failed; you generally have to trace through with a debugger in order to find the specific place.

The next level of testing are functional tests. These test specific subsystems, with simulators for external parts. These are much faster than scenario tests, and you have a better idea of what failed. However, you still don’t know precisely what went wrong - If you’re testing a radio and it doesn’t work, you don’t have much of an idea why. You still have to use a debugger to isolate the point of failure.

This leads to the bottom level of testing: unit tests. These test individual classes in isolation from one another. They are very fast - the speaker suggests running them whenever you save. If all your unit tests pass, you have a high confidence that the class in question is OK, even though you are unsure about the interaction between class. It’s also very easy to figure out what precisely went wrong.

The speaker stressed that this is a continuum of tests; you shouldn’t have only unit tests or only scenario tests - you should have both, although unit tests are more important. It’s also important to have ’seams’, where you can inject the class’s dependencies in order to mock out their behaviour. This is done using dependency injection, where a class asks for its dependencies in the constructor. Going back to before, a class should be either in the business of construction or finding objects, or doing actual work - it’s easy to test either of these methods, just hard to test a method that mixes the two.