A few days ago at work, I had to track down why my Java application was running out of memory. It processed a few CSV files, storing some of the data in them. The files were large, but the data my application was storing should have easily been able to fit into memory. After an hour or so of investigation, I found it was an issue I had heard about but never run into myself: the String.substring memory leak.
The bug in question is that String.substring, instead of returning a new string, returns a string based on the original string. This means that the larger string can’t be garbage-collected, so if you store a substring thben the entire string will be kept in memory. In my specific case, I was using String.split and storing a few of the fields - however, String.split uses String.substring behind the scenes, so I ended up keeping the entire files in memory, explaining my OutOfMemoryErrors.
This bug is due to the Java implementors wanting String.substring to be as fast as possible. If it did not have this behaviour, a new char would have to be allocated on the heap each time the function was called and the data would have had to be copied into it. Moreover, if you did need the larger string to stay around, part of the string would be duplicated, which is inefficient memory-wise. The workaround for this bug is to just wrap calls to String.substring or calls that use it in a new String(). For example:
String sub = new String( oldString.substring(0, 4) );
and the old string will be garbage collected.
For the above-mentioned reasons of the bug’s existence and the length of time it’s been opened, it probably won’t be fixed. The workaround suggested currently is not *too* painful, although you do have to find which calls themselves use String.substring, in many cases even if you forget the memory inefficiency isn’t too terrible (if you strip off the last character of a string, for example). It would be nice if there was a mention in the javadoc for these methods that this behaviour occurred, so that looking at the description of a method would warn you about issues such as this.