Supporting Dynamic Languages on the Java Virtual Machine
Author: Olin Shivers
This paper is about enhancements to the JVM that would make it easier to port dynamic languages, such as Scheme, to the JVM. This is desirable due to the fact that there is now a JVM for practically everything; running on the JVM means that the language will run on every architecture it will run on. However, the JVM, while well-designed for speed of Java programs, does not support dynamic languages very well. One example of a language that suffers performance problems from this porting process is Scheme. In Scheme, there must be a uniform representation of data - all types can be in cons cells. Working this into the Java class models means that every type must extend Object, and boxing and unboxing is expensive for primitive types.
The proposal to support immediates is to give pointers to Java objects a low bit of one. Since these are allocated on word boundaries, this does not hurt. This allows a final ImmediateDescriptor class that has 31 bits of state that can quickly be converted to an even integer(or vice versa). This causes no penalty to programs not using them - if a method iis called on them, this would generate a memory alignment exception that the VM can catch, which would still be fast.
There are still problems with issues such as method lookup, for example. The JVM bytecode is well-optimized for Java code, but not necessarily other paradigms. There is a tension in the bytecode between verification and efficiency - we don’t want an unsafe RISC bytecode system, we do want a safe system, but this will make it less efficient in few cases. This tradeoff is made well for Java, but not other languages.
A proposal to fix this is to have some of the bytecode instructions to be linked to C routines at runtime. This allows language implementers to efficiently represent whatever they need. However, this brings up the problem of verification - these C routines may be unsafe. The solution to this is to have some central body, where the JVM will ‘checkout’ the required instructions as told by the language implementer, and warn the user if the requested code is not in this standard and must be checked out from an unverified location.
Tags: paper