Tuesday, March 1, 2011

Choosing when to instantiate classes

I recently wrote a class for an assignment in which I had to store names in an ArrayList (in java). I initialized the ArrayList as an instance variable private ArrayList<String> names. Later when I checked my work against the solution, I noticed that they had initialized their ArrayList in the run() method instead.

I thought about this for a bit and I kind of feel it might be a matter of taste, but in general how does one choose in situations like this? Does one take up less memory or something?

PS I like the instance variables in Ruby that start with an @ symbol: they are lovelier.

(meta-question: What would be a better title for this question?)

From stackoverflow
  • In the words of the great Knuth "Premature optimization is the root of all evil".

    Just worry that your program functions correctly and that it does not have bugs. This is far more important than an obscure optimization that will be hard to debug later on.

    But to answer your question - if you initialize in the class member, the memory will be allocated the first time a mention of your class is done in the code (i.e. when you call a method from it). If you initialize in a method, the memory allocation occurs later, when you call this specific method.

    So it is only a question of initializing later... this is called lazy initialization in the industry.

    Vinko Vrsalovic : I'm not sure if it's just lazy initialization or using an instance variable instead of a local variable.
  • From wikibooks:

    There are three basic kinds of scope for variables in Java:

    • local variable, declared within a method in a class, valid for (and occupying storage only for) the time that method is executing. Every time the method is called, a new copy of the variable is used.

    • instance variable, declared within a class but outside any method. It is valid for and occupies storage for as long as the corresponding object is in memory; a program can instantiate multiple objects of the class, and each one gets its own copy of all instance variables. This is the basic data structure rule of Object-Oriented programming; classes are defined to hold data specific to a "class of objects" in a given system, and each instance holds its own data.

    • static variable, declared within a class as static, outside any method. There is only one copy of such a variable no matter how many objects are instantiated from that class.

    So yes, memory consumption is an issue, especially if the ArrayList inside run() is local.

  • I am not completely I understand your complete problem.

    But as far as I understand it right now, the performance/memory benefit will be rather minor. Therefore I would definitely favour the easibility side.

    So do what suits you the best. Only address performance/memory optimisation when needed.

  • Initialization

    As a rule of thumb, try to initialize variables when they are declared.

    If the value of a variable is intended never to change, make that explicit with use of the final keyword. This helps you reason about the correctness of your code, and while I'm not aware of compiler or JVM optimizations that recognize the final keyword, they would certainly be possible.

    Of course, there are exceptions to this rule. For example, a variable may by be assigned in an if–else or a switch. In a case like that, a "blank" declaration (one with no initialization) is preferable to an initialization that is guaranteed to be overwritten before the dummy value is read.

    /* DON'T DO THIS! */
    Color color = null;
    switch(colorCode) {
      case RED: color = new Color("crimson"); break;
      case GREEN: color = new Color("lime"); break;
      case BLUE: color = new Color("azure"); break;
    }
    color.fill(widget);
    

    Now you have a NullPointerException if an unrecognized color code is presented. It would be better not to assign the meaningless null. The compiler would produce an error at the color.fill() call, because it would detect that you might not have initialized color.

    In order to answer your question in this case, I'd have to see the code in question. If the solution initialized it inside the run() method, it must have been used either as temporary storage, or as a way to "return" the results of the task.

    If the collection is used as temporary storage, and isn't accessible outside of the method, it should be declared as a local variable, not an instance variable, and most likely, should be initialized where it's declared in the method.

    Concurrency Issues

    For a beginning programming course, your instructor probably wasn't trying to confront you with the complexities of concurrent programming—although if that's the case, I'm not sure why you were using a Thread. But, with current trends in CPU design, anyone who is learning to program needs to have a firm grasp on concurrency. I'll try to delve a little deeper here.

    Returning results from a thread's run method is a bit tricky. This method is the Runnable interface, and there's nothing stopping multiple threads from executing the run method of a single instance. The resulting concurrency issues are part of the motivation behind the Callable interface introduced in Java 5. It's much like Runnable, but can return a result in a thread-safe manner, and throw an Exception if the task can't be executed.

    It's a bit of a digression, but if you are curious, consider the following example:

    class Oops extends Thread { /* Note that thread implements "Runnable" */
    
      private int counter = 0;
    
      private Collection<Integer> state = ...;
    
      public void run() {
        state.add(counter);
        counter++;
      }
    
      public static void main(String... argv) throws Exception {
        Oops oops = new Oops();
        oops.start();
        Thread t2 = new Thread(oops); /* Now pass the same Runnable to a new Thread. */
        t2.start(); /* Execute the "run" method of the same instance again. */
        ...
      }
    }
    

    By the end of the the main method you pretty much have no idea what the "state" of the Collection is. Two threads are working on it concurrently, and we haven't specified whether the collection is safe for concurrent use. If we initialize it inside the thread, at least we can say that eventually, state will contain one element, but we can't say whether it's 0 or 1.

    Zarkonnen : Not sure if the poster is actually using a Thread. The method he's using just might happen to be called "run"?
    erickson : Good point, could be...
  • My personal rule of thumb for instance variables is to initialize them, at least with a default value, either:

    1. at delcaration time, i.e.

      private ArrayList<String> myStrings = new ArrayList<String>();

    2. in the constructor

    If it's something that really is an instance variable, and represents state of the object, it is then completely initialized by the time the constructor exits. Otherwise, you open yourself to the possibility of trying to access the variable before it has a value. Of course, that doesn't apply to primitives where you will get a default value automatically.

    For static (class-level) variables, initialize them in the declaration or in a static initializer. I use a static initializer if I have do calculations or other work to get a value. Initialize in the declaration if you're just calling new Foo() or setting the variable to a known value.

  • You have to avoid Lazy initialization. It leads to problems later.
    But if you have to do it because the initialization is too heavy you have to do it like this:

    Static fields:

    // Lazy initialization holder class idiom for static fields
    private static class FieldHolder {
       static final FieldType field = computeFieldValue();
    }
    static FieldType getField() { return FieldHolder.field; }
    

    Instance fields:

    // Double-check idiom for lazy initialization of instance fields
    private volatile FieldType field;
    FieldType getField() {
       FieldType result = field;
       if (result == null) { // First check (no locking)
          synchronized(this) {
             result = field;
             if (result == null) // Second check (with locking)
                field = result = computeFieldValue();
          }
       }
       return result;
    }
    

    Acording to Joshua Bolch book's "Effective Java™ Second Edition" (ISBN-13: 978-0-321-35668-0):
    "Use lazy initialization judiciously"

    Zarkonnen : Actually, your double checked locking code isn't quite threadsafe. And your indentation makes it very non-obvious that the call to computeFieldValue() is inside the if (result == null).
    Zarkonnen : http://en.wikipedia.org/wiki/Double-checked_locking to back up my "not quite threadsafe" claim.
    damian : I fix the identation. About the claim: So Joshua Bolch is wrong? In the wikipedia it saids: "As of J2SE 5.0, this problem has been fixed. The volatile keyword now ensures that multiple threads handle the singleton instance correctly"
    Zarkonnen : Re: damian: Actually, you're right, once you use volatile, it's fine. Apologies.

0 comments:

Post a Comment