<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	>
<channel>
	<title>Comments on: The Dangers of String.substring</title>
	<atom:link href="http://nflath.com/2009/07/the-dangers-of-stringsubstring/feed/" rel="self" type="application/rss+xml" />
	<link>http://nflath.com/2009/07/the-dangers-of-stringsubstring/</link>
	<description>Technology-related ideas, mainly involving Emacs</description>
	<pubDate>Thu, 11 Mar 2010 10:10:22 +0000</pubDate>
	<generator>http://wordpress.org/?v=2.7.1</generator>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
		<item>
		<title>By: JavaBlogging &#187; String and memory leaks</title>
		<link>http://nflath.com/2009/07/the-dangers-of-stringsubstring/comment-page-1/#comment-131</link>
		<dc:creator>JavaBlogging &#187; String and memory leaks</dc:creator>
		<pubDate>Tue, 28 Jul 2009 11:50:31 +0000</pubDate>
		<guid isPermaLink="false">http://nflath.com/?p=147#comment-131</guid>
		<description>[...] post is inspired by an entry on nflath.com about the dangers of String.substring() [...]</description>
		<content:encoded><![CDATA[<p>[...] post is inspired by an entry on nflath.com about the dangers of String.substring() [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Terry</title>
		<link>http://nflath.com/2009/07/the-dangers-of-stringsubstring/comment-page-1/#comment-74</link>
		<dc:creator>Terry</dc:creator>
		<pubDate>Tue, 07 Jul 2009 14:17:57 +0000</pubDate>
		<guid isPermaLink="false">http://nflath.com/?p=147#comment-74</guid>
		<description>I don't understand why this bug is still lurking in Java. Java has been open sourced correct? So someone should be able to go in and change the behavior of the C++ code in the JVM that's causing this issue.  Right?</description>
		<content:encoded><![CDATA[<p>I don&#8217;t understand why this bug is still lurking in Java. Java has been open sourced correct? So someone should be able to go in and change the behavior of the C++ code in the JVM that&#8217;s causing this issue.  Right?</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: admin</title>
		<link>http://nflath.com/2009/07/the-dangers-of-stringsubstring/comment-page-1/#comment-73</link>
		<dc:creator>admin</dc:creator>
		<pubDate>Mon, 06 Jul 2009 21:17:10 +0000</pubDate>
		<guid isPermaLink="false">http://nflath.com/?p=147#comment-73</guid>
		<description>Salman:   
You have to wrap the elements of the array that you want to keep, so something like:
&lt;pre lang="java"&gt;
String[] items = oldString.split(" ");
String stringToKeep = new String(items[0]);
&lt;/pre&gt;
Will allow the original string to be garbage collected.</description>
		<content:encoded><![CDATA[<p>Salman:<br />
You have to wrap the elements of the array that you want to keep, so something like:</p>

<div class="wp_syntax"><div class="code"><pre class="java" style="font-family:monospace;"><span style="color: #003399;">String</span><span style="color: #009900;">&#91;</span><span style="color: #009900;">&#93;</span> items <span style="color: #339933;">=</span> oldString.<span style="color: #006633;">split</span><span style="color: #009900;">&#40;</span><span style="color: #0000ff;">&quot; &quot;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
<span style="color: #003399;">String</span> stringToKeep <span style="color: #339933;">=</span> <span style="color: #000000; font-weight: bold;">new</span> <span style="color: #003399;">String</span><span style="color: #009900;">&#40;</span>items<span style="color: #009900;">&#91;</span><span style="color: #cc66cc;">0</span><span style="color: #009900;">&#93;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span></pre></div></div>

<p>Will allow the original string to be garbage collected.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Salman Ahmed</title>
		<link>http://nflath.com/2009/07/the-dangers-of-stringsubstring/comment-page-1/#comment-72</link>
		<dc:creator>Salman Ahmed</dc:creator>
		<pubDate>Mon, 06 Jul 2009 19:44:05 +0000</pubDate>
		<guid isPermaLink="false">http://nflath.com/?p=147#comment-72</guid>
		<description>So, String.split() also leaks memory? What workaround would you recommend for the case when String.split() is used?</description>
		<content:encoded><![CDATA[<p>So, String.split() also leaks memory? What workaround would you recommend for the case when String.split() is used?</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: ghettoimp</title>
		<link>http://nflath.com/2009/07/the-dangers-of-stringsubstring/comment-page-1/#comment-71</link>
		<dc:creator>ghettoimp</dc:creator>
		<pubDate>Mon, 06 Jul 2009 02:48:08 +0000</pubDate>
		<guid isPermaLink="false">http://nflath.com/?p=147#comment-71</guid>
		<description>@SJS: It seems unlikely that Andrew's solution would have any impact on GC performance.  

Instead, his proposal is merely for the JVM to attempt a more aggressive recovery strategy before dying with an OutOfMemoryError.  It's true that in such a case, one might reasonably expect the JVM to run its garbage collector to try to find some memory that can be freed.  But during any "ordinary" run of the garbage collector, there is still plenty of memory available, and so the "last ditch effort" that Andrew describes (looking for some strings to free) would simply not be invoked.

Andrew's solution seems fairly reasonable.  Ask yourself: would you rather that your program immediately die with an OutOfMemoryError, or that the JVM spends some time in a search for additional memory before it gives up?  To me, it seems better to win some of the time than to lose all of the time.</description>
		<content:encoded><![CDATA[<p>@SJS: It seems unlikely that Andrew&#8217;s solution would have any impact on GC performance.  </p>
<p>Instead, his proposal is merely for the JVM to attempt a more aggressive recovery strategy before dying with an OutOfMemoryError.  It&#8217;s true that in such a case, one might reasonably expect the JVM to run its garbage collector to try to find some memory that can be freed.  But during any &#8220;ordinary&#8221; run of the garbage collector, there is still plenty of memory available, and so the &#8220;last ditch effort&#8221; that Andrew describes (looking for some strings to free) would simply not be invoked.</p>
<p>Andrew&#8217;s solution seems fairly reasonable.  Ask yourself: would you rather that your program immediately die with an OutOfMemoryError, or that the JVM spends some time in a search for additional memory before it gives up?  To me, it seems better to win some of the time than to lose all of the time.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Brian</title>
		<link>http://nflath.com/2009/07/the-dangers-of-stringsubstring/comment-page-1/#comment-70</link>
		<dc:creator>Brian</dc:creator>
		<pubDate>Mon, 06 Jul 2009 00:34:40 +0000</pubDate>
		<guid isPermaLink="false">http://nflath.com/?p=147#comment-70</guid>
		<description>Actually, you split with a regex that ignores commas inside of quotes. You don't just split with ",".</description>
		<content:encoded><![CDATA[<p>Actually, you split with a regex that ignores commas inside of quotes. You don&#8217;t just split with &#8220;,&#8221;.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Nick</title>
		<link>http://nflath.com/2009/07/the-dangers-of-stringsubstring/comment-page-1/#comment-69</link>
		<dc:creator>Nick</dc:creator>
		<pubDate>Sun, 05 Jul 2009 23:18:07 +0000</pubDate>
		<guid isPermaLink="false">http://nflath.com/?p=147#comment-69</guid>
		<description>If you're parsing CSV by calling String.split you're doing it wrong anyway.

Foo,"Baz,Baz",Qux</description>
		<content:encoded><![CDATA[<p>If you&#8217;re parsing CSV by calling String.split you&#8217;re doing it wrong anyway.</p>
<p>Foo,&#8221;Baz,Baz&#8221;,Qux</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Chris</title>
		<link>http://nflath.com/2009/07/the-dangers-of-stringsubstring/comment-page-1/#comment-68</link>
		<dc:creator>Chris</dc:creator>
		<pubDate>Sun, 05 Jul 2009 21:28:42 +0000</pubDate>
		<guid isPermaLink="false">http://nflath.com/?p=147#comment-68</guid>
		<description>@Mark

&gt; Variables should default to string type too
WTF are you smoking? How about real type inference, rather than picking one type and raising it up above the others?</description>
		<content:encoded><![CDATA[<p>@Mark</p>
<p>&gt; Variables should default to string type too<br />
WTF are you smoking? How about real type inference, rather than picking one type and raising it up above the others?</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: SJS</title>
		<link>http://nflath.com/2009/07/the-dangers-of-stringsubstring/comment-page-1/#comment-67</link>
		<dc:creator>SJS</dc:creator>
		<pubDate>Sun, 05 Jul 2009 21:15:47 +0000</pubDate>
		<guid isPermaLink="false">http://nflath.com/?p=147#comment-67</guid>
		<description>The problem is indeed subtle, but the workaround seems perfectly sensible. 

Modifying the GC to look for the (rare) instances of large strings where only a small substring is kept seems likely to be a performance killer. People already whine about GC languages being slow, making one slower for a rare case doesn't seem to be a very good tradeoff.  Let a few CS grad students work on a fast and efficient mechanism for handling this case in the JVM, and *then* look into adding it in.

@mark - Java isn't Ruby, nor should it be. The ultimate problem with Ruby and languages like it is that too often the advocates want to turn every other language into a poor copy of their favorite language, so they can smugly point out that everyone else should be using their favorite language.

Take, for example, statement separators.  They're useful, in that they *conveniently* let you wrap long statements across several lines, aiding readability.  Newline-terminated statements require compiler magic (confusing to the reader) or continuation characters (ugly), which is fine in small, uncomplicated, or low-use programs.

And as for the elimination of "String", well, consistency is preferential to conciseness.  One of Java's biggest warts is the package-private scope being "default", instead of making it a required keyword.  Verboseness may not be a virtue, but it's not always a sin, either.

* * *

I think the idea of making the javadoc more clear is the best approach. I don't think I would have looked at "String x = reader.readLine().split(pattern)[1];" as a potential memory "leak" -- it's a non-obvious issue. Good catch.</description>
		<content:encoded><![CDATA[<p>The problem is indeed subtle, but the workaround seems perfectly sensible. </p>
<p>Modifying the GC to look for the (rare) instances of large strings where only a small substring is kept seems likely to be a performance killer. People already whine about GC languages being slow, making one slower for a rare case doesn&#8217;t seem to be a very good tradeoff.  Let a few CS grad students work on a fast and efficient mechanism for handling this case in the JVM, and *then* look into adding it in.</p>
<p>@mark - Java isn&#8217;t Ruby, nor should it be. The ultimate problem with Ruby and languages like it is that too often the advocates want to turn every other language into a poor copy of their favorite language, so they can smugly point out that everyone else should be using their favorite language.</p>
<p>Take, for example, statement separators.  They&#8217;re useful, in that they *conveniently* let you wrap long statements across several lines, aiding readability.  Newline-terminated statements require compiler magic (confusing to the reader) or continuation characters (ugly), which is fine in small, uncomplicated, or low-use programs.</p>
<p>And as for the elimination of &#8220;String&#8221;, well, consistency is preferential to conciseness.  One of Java&#8217;s biggest warts is the package-private scope being &#8220;default&#8221;, instead of making it a required keyword.  Verboseness may not be a virtue, but it&#8217;s not always a sin, either.</p>
<p>* * *</p>
<p>I think the idea of making the javadoc more clear is the best approach. I don&#8217;t think I would have looked at &#8220;String x = reader.readLine().split(pattern)[1];&#8221; as a potential memory &#8220;leak&#8221; &#8212; it&#8217;s a non-obvious issue. Good catch.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: mark</title>
		<link>http://nflath.com/2009/07/the-dangers-of-stringsubstring/comment-page-1/#comment-66</link>
		<dc:creator>mark</dc:creator>
		<pubDate>Sun, 05 Jul 2009 19:48:20 +0000</pubDate>
		<guid isPermaLink="false">http://nflath.com/?p=147#comment-66</guid>
		<description>Ruby:
  sub = oldString[0..4]

I couldn't resist ;)

I think ultimately one huge problem with languages like C and similar is that they make simple stuff more complicated than necessary.

Take aside memory manipulation or pointer handling - these can stay complicated, but why not try to find the easiest approach altogether - in general? The requirement for a semicolon is one example - it shouldnt be needed. It adds no meaningful information
to the human coder (unless he wants to use multiple instructions on the same line, but this is not a common cause, because people like readable code)

Java should be a lot shorter. And Variables should default to string type too, without a need to specify "String".</description>
		<content:encoded><![CDATA[<p>Ruby:<br />
  sub = oldString[0..4]</p>
<p>I couldn&#8217;t resist <img src='http://nflath.com/wp-includes/images/smilies/icon_wink.gif' alt=';)' class='wp-smiley' /> </p>
<p>I think ultimately one huge problem with languages like C and similar is that they make simple stuff more complicated than necessary.</p>
<p>Take aside memory manipulation or pointer handling - these can stay complicated, but why not try to find the easiest approach altogether - in general? The requirement for a semicolon is one example - it shouldnt be needed. It adds no meaningful information<br />
to the human coder (unless he wants to use multiple instructions on the same line, but this is not a common cause, because people like readable code)</p>
<p>Java should be a lot shorter. And Variables should default to string type too, without a need to specify &#8220;String&#8221;.</p>
]]></content:encoded>
	</item>
</channel>
</rss>
