Posts Tagged ‘call graph optimization’

Practical Virtual Method Call Resolution for Java

Monday, October 5th, 2009

This paper was on new methods of statically determining which methods can be called by virtual method calls in Java. This is used for method inlining and producing more accurate call graphs, which can improve other analysis. This paper suggest two algorithms that builds type propagation graphs, where vertexes are variables and edges represent the flow of types due to assignments,. anc compares them against CHA and RTA. These new algorithms were designed so that they would scale linearly with the size of the Java program being analyzed. The analysis was implemented with the Soot framework, which provides Jimple, which is a representation of Java bytecode. Thus, this analysis can be used for all JVM languages and not just Java.

The Soot framework is an APO for manipulating Java bytecode. The implementation of the algorithms worked by reading all class files required by an application, converting them into Soot representations of the class, with Jimple classes representing the bytecode. Once the analysis and transformation is complete, the optimized code is output again as bytecode.

Class Hierarchy Analysis is a method for determining the set of types for objects used in Java applications. Basically, for any method call c.m() where the declared type of c is a class, the possible types of c are the declared type of c and it’s subclasses. If the declared type of c is an interface I, the possible types of c are all classes that implement I or a subinterface of I, plus all classes that are subclasses of those classes.

A call graph consists of one node for each method that is reachable from the main method. Each node in the call graph contains a collection of call sites, which are method calls. Edges go from call sites to all nodes that could be called from that call site. This call graph may contain edges and nodes that are not necessary; we want to remove these as much as possible.

Rapid type analysis is a way to improve the call graph. It uses the fact that an object can only be called if it is created by a ‘new’. There are two types of RTA: optimistic and conservative. In conservative RTA, CHA is performed and then the reachable methods are inspected to determine which types can be instantiated. Optimistic RTA constructs the call graph iteratively,

The two new algorithms refine RTA using the property that for any call c.m(), c can only be a type T if there is an expression of the form v = new T() followed by a set of assignments x_1 = v, x_2 = x_1, …, x_n = x_{n - 1}, c = x_{n - 1}. The two analysis work by building a type propagation graph, where nodes represent variables and a directed edge from a to b represents an assignment of the form b = a. This graph is used to determine which variables that methods are called can correspond to which classes in a more precise way than RTA.

One of the new algorithms, Variable-Type Analysis, uses the name of a variable as its representative. There are three types of variables, with the following representative names: local variables/parameters have a name C.m.a, fields are named C.f, and array references are named a[x].

The nodes of the type propagation graph is constructed as follows:

For every class C in the program:
    For every field F in C
        create a node C.F

For every method C.m in the call graph of P:
    For every parameter p_i of C.m, create C.m.p_i
    For every local variable l of C.m, create C.m.l
    Create C.m.this
    Create C.m.return

Afterwards, assignments between variables in the form of assignment statements and method calls are used to construct the edges of the type propagation graph. Native methods have their information input by hand. Aliases do not affect this alias’, because locals and fields cannot be aliased - if there is are two locals a and b, they refer to different locations. This is not true for arrays, and references to arrays can be stored in Objects. To solve this issue, when adding edges to the type propagation graph, if the two types are of array types or java.lang.Object, a bidirectional edge is added to the graph. Declared-type analysis works the same way, except that a node is used for each declared type and not each variable. The size of the graph is smaller, since there are fewer nodes, but the final answer is less precise.

To keep the algorithm simple, some design tradeoffs were made:

  • Aliases - To avoid solving the aliasing problem except in the case of arrays, one node was used for all instances of a field instead of one per arrays.
  • No killing based on casts - For assignments of the form lhs = (C)rhs, we do not take into account the extra type information given by the task of C, as this would require either a iterative solver or a more complex constraint solver.
  • Pessimistic algorithm - The algorithm adds edges for all method calls in the conservative call graph, instead of using which types could currently reach the call at each step of the analysis. However, this would require iteration. We improve the pessimistic algorithm by generating a better conservative call graph.
  • A whole-program approach - Instead of only propagating type information within a procedure and using the call graph to determine the effect of method calls.
  • A 3-address representation of bytecode - This greatly simplified the analysis

Seven benchmarks were used to compare algorithms, including javac. One use of these algorithms is to reduce the size of the call graph. For this, both VTA and DTA perform better than even optimistic RTA, with VTA performing better. VTA removes 10% to 65% of the methods and 17 to 65% of the edges, whereas optimistic RTA varies from 7% to 56%. VTA has more notable improvements on more object-oriented benchmarks. VTA in general seems to be an improvement over RTA in all cases.

The other measurement is how many polymorphic calls. Given a call graph constructed by CHA, the percentage of polymorphic calls that were determined to be monomorphic by RTA, DTA, and VTA were measured. VTA showed a significant improvement, whereas RTA and DTA did not prune the call graph very much - VTA can resolve between 12% and 96% of the calls. The benchmarks were then instrumented to output profiling information to construct a complete call graph. On the large OO benchmarks, while VTA did significantly better than RTA, it still did not match the profiling information. In javac, VTA only determined 82% of the calls that the profile indicated were monomorphic. There are two ways to attempt to improve this percentage; perform more expensive static analysis or do dynamic optimization, such as branch target prediction.

Using this information to implement inlining showed small increases in speed on both the soot and javac benchmarks. For soot, using CHA to determine monomorhpic calls gives a 1% speedup in execution time, whereas VTA give 3% speedup. In javac, these percentages are 0% and 2%.

Both DTA and VTA behave linearly in practise, and thus scale well. VTA is approximately a constant factor of 10 slower than DTA, and so while is more precise is also more expensive. In conclusion, VTA gives significant improvements to DTA, which performs better than both CHA and RTA.

Scalable Propagation-Based Call Graph Construction Algorithms

Wednesday, September 30th, 2009

This paper goes over several algorithms for constructing call graphs for Java programs. They compare an old algorithm, RTA, and three new algorithms (MTA, FTA, and XTA). The problem is that existing algorithms either are scalable to large programs but compute an excessively large call graph or compute a much more precise call graph but do not scale to larger problems. This paper introduced algorithms that appear to scale to large problems while computing a much more precise call graph.

A call graph is just a directed graph of what methods each method can call. Each method is a vertex in a call graph, and there is an edge from V_1 to V_2 if V_1 can call method V_2 in it’s execution. These graphs are determined by analysis of call sites, which are just Java lines of the form e.M(), or just a method call. Since these functions are dynamically resolved, these call graphs may be more imprecise than strictly necessary, depending on the algorithm used to figure out which call site can call each method. Call graphs are used for whole-program optimization. Methods that are not called from the main method can be removed, if only one method can be called from a call site then it can be inlined instead of called dynamically, etc.

Old algorithms

  • RA - RA, Reachability Analysis, only takes into accound the name of a method. Essentially, the main method is added to the list of reachable methods. For each method that is added to the list of reachable methods, go through all the call sites e.M() in the method and add all methods with name M to the list of reachable methods. This will give a set of reachable methods, and you can use this list to determine which methods are unreachable and can be deleted.
  • CHA - Class Hierarchy Analysis is an extension of RA that takes information about the class
    hierarchy to perform its reachability analysis. It works the same as RA, except instead of adding
    all methods with name m from a call site e.m(), it only adds methods with name m in classes that are subclasses of e.

  • RTA - Rapid Type Analysis is an extension of CHA that takes into account which classes were actually instantiated. For a call e.m(), it will only add those methods with name m that occur in subclasses of e that are instantiated in a reachable method. RTA is still simple to implement, scales, and computes call graphs significantly more precise than CHA.

New algorithms

  • XTA - XTA is an algorithm that uses a distinct set for each method M(S_M) and field x(S_x), in order to give a more local view of what types are available. XTA is defined by five constraints:
    1. The main method is reachable.
    2. For any method C.M, add C to M_X. Add the
    3. If a method is reachable, for all statements of the form “new C()” occurring inside M, add C to S_M.
    4. If a method is reachable, for all fields x that are read, add S_X to S_M.
    5. If a method is reachable, for All Fields x that are written, add the intersection of S_M and all subtypes of the declared types of x to S_x.
  • CTA - CTA uses a distinct set variable S_C for each class C, which unifies the flow of information about both methods and fields. It is defined by adding two constraints to XTA: If a class C defines a method M, S_C = S_M and if C defines a field x, S_C = S_x. This is less precise than XTA, but has the potential to be more scalable.
  • MTA - MTA uses a set variable S_C for each class C, and S_x for every field x. MTA adds this constraint to XTA: If a class C defines a method M, S_C = S_M.
  • FTA - FTA uses a set variable S_C for each class aC, and S_M for each method M. MTA adds this constraint to XTA: If a class C defines a field x, S_C = S_x.

The paper next covered some implementation issues that occurred while implementing these algorithms. The implementations use JikesBT for parsing class files. The new algorithms were added to Jax, which is an application extractor for Java. Jax uses RTA for constructing call graphs, so many of the data structures from Jax could be reused.

The algorithms were constructed in an iterative style, where types are propagated outwards from the main method. XTA had three lists associated with each method and field; one with tyeps that have been propagated to other components, current types that will be propagated in the next iteration, and new types that are propogated to the component in the current iteration. FTA and MTA use a shared set for all the methods and fields in a class.

Arrays are abstracted as classes with a field representing all of it’s elements. Methods are assumed to read an element from array A if an object of type A is propagated to the method, and the mehtod contains an aaload bytecode instruction. Array writes are treated similarly. Exceptions can also cause complex type flow between methods, since several stack frames may be skipped. Instead of implementing some way to handle type propagation of exceptions, there is one set of types that represent the type of all expressions whose static type is of type Throwable, and use that set to resolve method calls on Exception objects. This is not very precise, but the number of types involved will likely be very small. Reflective calls are dealt with by manually supplying the necessary information.

Another issue that had to be solved was the analysis of incomplete applications. Most applications use external libraries which you may not have the source for, such as the java standard library. The solution to this is to have one set of objects for all external classes. This works by assuming that any library call can call any method on an object it is passed. However, this is oftentimes quite conservative, and for some heavily-called methods the analysis is coded to not add types to the global set.

12 benchmark programs were used to test the different algorithms. XTA greatly reduced the number of types available per average in methods from RTA - XTA only has 12.3% of the types being available on average. XTA averages 1.6% fewer possible method definitions than RTA, leading to more methods that can be removed. XTA also computes call graphs with an average of 7.2% fewer edges than RTA. The last comparison, the number of virtual method calls determined to be monomorphic, shows that RTA classifies 7.8% of all call sites as polymorphic and XTA classifies 7% of call sites as polymorphic. MTA and FTA both had better results than RTA, but worse than XTA.

An analysis of the running times of the various algorithms showed that RTA was the fastest, and XTA was slower than RTA by a factor of about 8.3 on average. However, there does not seem to be correlation between program size and how much slower XTA is than RTA, implying they have the same asymptotic running times. MTA and FTA were both surprisingly slightly slower than XTA, although this is probably due to a poor implementation.

XTA, FTA, and MTA all produce more accurate call graphs. FTA is usually not much worse than XTA, and so if space is at a premium it may be used over XTA. XTA is considered the best algorithm for most cases. There does not seem to be a compelling reason to use MTA over RTA.