Wednesday, April 6, 2011

Why is it a bad practice to call System.gc?

After answering to a question about how to force-free objects in Java (the guy was clearing a 1.5GB HashMap) with System.gc(), I've been told it's a bad practice to call System.gc() manually, but the comments seemed mitigated about it. So much that no one dared to upvote it, nor downvote it.

I've been told there it's a bad practice, but then I've also been told garbage collector runs don't systematically stop the world anymore, and that it could also be only seen as a hint, so I'm kind of at loss.

I do understand that usually the JVM knows better than you when it needs to reclaim memory. I also understand that worrying about a few kilobytes of data is silly. And I also understand that even megabytes of data isn't what it was a few years back. But still, 1.5 gigabyte? And you know there's like 1.5 GB of data hanging around in memory; it's not like it's a shot in the dark. Is System.gc() systematically bad, or is there some point at which it becomes okay?

So the question is actually double:

  • Why is it or not a bad practice to call System.gc()? Is it really a hint under certain implementations, or is it always a full collection cycle? Are there really garbage collector implementations that can do their work without stopping the world? Please shed some light over the various assertions people have made.
  • Where's the threshold? Is it never a good idea to call System.gc(), or are there times when it's acceptable? If any, what are those times?
From stackoverflow
  • The reason everyone always says to avoid System.gc() is that it is a pretty good indicator of fundamentally broken code. Any code that depends on it for correctness is certainly broken; any that rely on it for performance are most likely broken.

    You don't know what sort of garbage collector you are running under. There are certainly some that do not "stop the world" as you assert, but some JVMs aren't that smart or for various reasons (perhaps they are on a phone?) don't do it. You don't know what it's going to do.

    Also, it's not guaranteed to do anything. The JVM may just entirely ignore your request.

    The combination of "you don't know what it will do," "you don't know if it will even help," and "you shouldn't need to call it anyway" are why people are so forceful in saying that generally you shouldn't call it. I think it's a case of "if you need to ask whether you should be using this, you shouldn't"

    EDIT to address a few concerns from the other thread:

    After reading the thread you linked, there's a few more things I'd like to point out. First, someone suggested that calling gc() may return memory to the system. That's certainly not necessarily true - the Java heap itself grows independently of Java allocations.

    As in, the JVM will hold memory (many tens of megabytes) and grow the heap as necessary. It doesn't necessarily return that memory to the system even when you free Java objects; it is perfectly free to hold on to the allocated memory to use for future Java allocations.

    To show that it's possible that System.gc() does nothing, view:

    http://bugs.sun.com/view_bug.do?bug_id=6668279

    and in particular that there's a -XX:DisableExplicitGC VM option.

    zneak : How can code depend on garbage collection for its correctness? Also, can I infer that your answer to question 2 would be "it's never a good thing"?
    Steven Schlansker : You may be able to construct some weird Rube Goldberg-esque setup where the method in which the GC is run affects the correctness of your code. Perhaps it's masking some strange threading interaction, or maybe a finalizer has a significant effect on the running of the program. I'm not entirely sure it's possible but it may be, so I figured I'd mention it.
    Martin : @zneak you might for example have put critical code in finalizers (which is fundamentally broken code)
    Steven Schlansker : As for my direct answer to number 2, I'd say that the rule is that it's generally not worth your time. There are exceptions to every rule, though. I'm simply trying to explain the thought process. On a recent Sun JVM it probably won't cause much harm, but when you start moving to other JVMs it may not do what you expect anymore. That's why I'd in general avoid it.
    zneak : @Steve & @Martin: I didn't think about critical code in finalizers. It's been so long since I've written one I almost forgot about them. Thanks for reminding.
    Joachim Sauer : I'd like to add that there are a few corner cases where `System.gc()` is useful and might even be necessary. For example in UI applications on Windows it can greatly speed up the restoring-process of a Window when you call System.gc() *before* you minimize the Window (especially when it stays minimized for quite some time and parts of the process get swapped to disk).
  • Calling System.gc doesn't mean the GC is run. It's on its own thread, and it runs according to its own lights.

    As far as stopping the world goes, I think the newer generational models do a better job than their 1.0 incarnations.

    I'll repeat it: calling System.gc() does NOT run the garbage collector. That's been true since Java 1.0.

    zneak : Javadoc doesn't agree with you though: _When control returns from the method call, the virtual machine has made its best effort to recycle all discarded objects._ That being said, you didn't answer on if it's a good practice or not, and if there ever is a good time to call it.
    duffymo : It didn't define "best effort". It might have done absolutely nothing. Not a good practice, not now, not ever. I've never written code to call it, ever.
    zneak : So you mean it's a bad practice because it's unreliable and can potentially do nothing? Under which circumstances can it hurt your code? And, on a side note, I think "best efforts to recycle _all discarded objects_" should at least mean it's trying to.
    Steven Schlansker : See the links I put in my answer for the "best effort" claim...
    Stephen C : @duffymo - "calling System.gc() does NOT run the garbage collector" - that statement is not strictly correct. A more correct statement is "calling System.gc() **does not necessarily** run the garbage collector." The actual behavior is controlled by an -XX option ... at least what the documentation says.
    duffymo : It's not the calling of the GC that might do nothing that's a bad practice; it's writing code that makes you think that you can and should call it that's a bad sign. I'm saying that regardless of what the documentation says, I've never had to call System.gc. Ever. Is your app running on an app server? That's what handles threading in Java EE apps.
  • Yes, calling System.gc() doesn't guarantee that it will run, it's a request to the JVM that may be ignored. From the docs:

    Calling the gc method suggests that the Java Virtual Machine expend effort toward recycling unused objects

    It's almost always a bad idea to call it because the automatic memory management usually knows better than you when to gc. It will do so when its internal pool of free memory is low, or if the OS requests some memory be handed back.

    It might be acceptable to call System.gc() if you know that it helps. By that I mean you've thoroughly tested and measured the behaviour of both scenarios on the deployment platform, and you can show it helps. Be aware though that the gc isn't easily predictable - it may help on one run and hurt on another.

    zneak : But also from the Javadoc: _When control returns from the method call, the virtual machine has made its best effort to recycle all discarded objects, which I see as a more imperative form of what you've posted about. Screw that, there's a bug report about it being misleading. As of which knows better, what are the harms of hinting the JVM?
    Steven Schlansker : Nothing at all, until you hint it incorrectly ;-)
    tom : The harm is that doing collection at the wrong time can be a huge slow down. The hint you are giving is probably a bad one. As for "best effort" comment, try it and see in a tool like JConsole. Sometimes clicking the "Perform GC" button does nothing
  • It has already been explained that calling system.gc() may do nothing, and that any code that "needs" the garbage collector to run is broken.

    However, the real reason that it is bad practice to call System.gc() is that it is inefficient. And in the worst case, it is horribly inefficient! Let me explain.

    A typical GC algorithm identifies garbage by traversing all non-garbage objects in the heap, and inferring that any object not visited must be garbage. From this, we can model the total work of of a garbage collection consists of one part that is proportional to the amount of live data, and another part that is proportional to the amount of garbage; i.e. work = (live * W1 + garbage * W2).

    Now suppose that you do the following in a single-threaded application.

    System.gc(); System.gc();
    

    The first call will (we predict) do (live * W1 + garbage * W2) work, and get rid of the outstanding garbage.

    The second call will do (live* W1 + 0 * W2) work and reclaim nothing. In other words we have done (live * W1) work and achieved absolutely nothing.

    We can model the efficiency of the collector as the amount of work needed to collect a unit of garbage; i.e. efficiency = (live * W1 + garbage * W2) / garbage. So to make the GC as efficient as possible, we need to maximize the value of garbage when we run the GC; i.e. wait until the heap is full. (And also, make the heap as big as possible. But that is a separate topic.)

    If the application does not interfere (by calling System.gc()), the GC will wait until the heap is full before running, resulting in efficient collection of garbage. But if the application forces the GC to run, the chances are that the heap won't be full, and the result will be that garbage is collected inefficiently. And the more often the application forces GC, the more inefficient the GC becomes.

    Note: the above explanation glosses over the fact that a typical modern GC partitions the heap into "spaces", the GC may dynamically expand the heap, the application's working set of non-garbage objects may vary and so on. Even so, the same basic principal applies across the board to all true garbage collectors. It is inefficient to force the GC to run.

    (I'm also excluding memory managers that use reference counting exclusively, but no current Java implementation uses that approach ... for good reason.)

    sleske : +1 Good explanation. Note however that this reasoning only applies if you care about throughput. If you want to optimize latentcy at specific points, forcing GC may make sense. E.g. (hypothetically speaking) in a game you might want to avoid delays during levels, but you don't care about delays during level load. Then it would make sense to force GC after level load. It does decrease overall throughput, but that's not what you are optimizing.
  • People have been doing a good job explaining why NOT to use, so I will tell you a couple situations where you should use it:

    (The following comments apply to Hotspot running on Linux with the CMS collector, where I feel confident saying that System.gc() does in fact always invoke a full garbage collection).

    JT : 1) After the initial work of starting up your application, you may be a terrible state of memory usage. Half your tenured generation could be full of garbage, meaning that you are that much closer to your first CMS. In applications where that matters, it is not a bad idea to call System.gc() to "reset" your heap to the starting state of live data.
    JT : 2) Along the same lines as #1, if you monitor your heap usage closely, you want to have an accurate reading of what your baseline memory usage is. If the first 2 minutes of your application's uptime is all initialization, your data is going to be messed up unless you force (ahem... "suggest") the full gc up front.
    JT : 3) You may have an application that is designed to never promote anything to the tenured generation while it is running. But maybe you need to initialize some data up-front that is not-so-huge as to automatically get moved to the tenured generation. Unless you call System.gc() after everything is set up, your data could sit in the new generation until the time comes for it to get promoted. All of a sudden your super-duper low-latency, low-GC application gets hit with a HUGE (relatively speaking, of course) latency penalty for promoting those objects during normal operations.
    JT : 4) It is sometimes useful to have a System.gc call available in a production application for verifying the existence of a memory leak. If you know that the set of live data at time X should exist in a certain ratio to the set of live data at time Y, then it could be useful to call System.gc() a time X and time Y and compare memory usage.
    zneak : Can't you edit your message? Important contents should be there, not in the comments.
  • Maybe I write crappy code, but I've come to realize that clicking the trash-can icon on eclipse and netbeans IDEs is a 'good practice'.

  • It's NOT a bad practice.

    But keep in mind that when you call System.gc() you have no guarantee that garbage collection will really be run. It's up to the JVM to decide if it follow your "suggestion" System.gc() .

    From the API:

    Calling the gc method suggests that the Java Virtual Machine expend effort toward recycling unused objects in order to make the memory they currently occupy available for quick reuse. When control returns from the method call, the Java Virtual Machine has made a best effort to reclaim space from all discarded objects.

    zneak : This quote from the Javadoc is so ambiguous considering the last sentence that it shouldn't be allowed to exist.
    Bruno Rothgiesser : zneak, I don't think that it's ambiguous, but the phrase "the JVM has made it's best effort" is definetily vague.
  • GC efficiency relies on a number of heuristics. For instance, a common heuristic is that write accesses to objects usually occur on objects which were created not long ago. Another is that many objects are very short-lived (some objects will be used for a long time, but many will be discarded a few microseconds after their creation).

    Calling System.gc() is like kicking the GC. It means: "all those carefully tuned parameters, those smart organizations, all the effort you just put into allocating and managing the objects such that things go smoothly, well, just drop the whole lot, and start from scratch". It may improve performance, but most of the time it just degrades performance.

    To use System.gc() reliably(*) you need to know how the GC operates in all its fine details. Such details tend to change quite a bit if you use a JVM from another vendor, or the next version from the same vendor, or the same JVM but with slightly different command-line options. So it is rarely a good idea, unless you want to address a specific issue in which you control all those parameters. Hence the notion of "bad practice": that's not forbidden, the method exists, but it rarely pays off.

    (*) I am talking about efficiency here. System.gc() will never break a correct Java program. It will neither conjure extra memory that the JVM could not have obtained otherwise: before throwing an OutOfMemoryError, the JVM does the job of System.gc(), even if as a last resort.

    sleske : +1 for mentioning that System.gc() does not prevent OutOfMemoryError. Some people believe this.
  • Lots of people seem to be telling you not to do this. I disagree. If, after a large loading process like loading a level, you believe that:

    1. You have a lot of objects that are unreachable and may not have been gc'ed. and
    2. You think the user could put up with a small slowdown at this point

    there is no harm in calling System.gc(). I look at it like the c/c++ inline keyword. It's just a hint to the gc that you, the developer, have decided that time/performance is not as important as it usually is and that some of it could be used reclaiming memory.

    Advice to not rely on it doing anything is correct. Don't rely on it working, but giving the hint that now is an acceptable time to collect if perfectly fine. I'd rather waste time at a point in the code where it doesn't matter (loading screen) than when the user is actively interacting with the program (like during a level of a game.)

    There is one time when i will force collection: when attempting to find out is a particular object leaks (either native code or large, complex callback interaction. Oh and any UI component that so much as glances at Matlab.) This should never be used in production code.

    sleske : +1 for GC while analyzing for mem leaks. Note that the information about heap usage (Runtime.freeMemory() et al.) is really only meaningful after forcing a GC, otherwise it would depend on when the system last bothered to run a GC.
  • If JVM is on edge of OutOfMemoryError, it will run the GC at any way. If that didn't help (and your code thus dies with an OOME), then either the code simply requires more memory, or the code is simply memory-inefficient. Run a profiler to find that out.

    In a nut: calling System#gc() has no value. Let the GC do its work transparently, it's terribly good at it.

  • In my experience, using System.gc() is effectively a platform-specific form of optimization (where "platform" is the combination of hardware architecture, OS, JVM version and possible more runtime parameters such as RAM available), because its behaviour, while roughly predictable on a specific platform, can (and will) vary considerably between platforms.

    Yes, there are situations where System.gc() will improve (perceived) performance. On example is if delays are tolerable in some parts of your app, but not in others (the game example cited above, where you want GC to happen at the start of a level, not during the level).

    However, whether it will help or hurt (or do nothing) is highly dependent on the platform (as defined above).

    So I think it is valid as a last-resort platform-specific optimization (i.e. if other performance optimizations are not enough). But you should never call it just because you believe it might help(without specific benchmarks), because chances are it will not.

0 comments:

Post a Comment