Friday, February 4, 2011

How should I handle a situation where I need to store several unrelated types but provide specific types on demand?

I'm working on an editor for files that are used by an important internal testing tool we use. The tool itself is large, complicated, and refactoring or rewriting would take more resources than we are able to devote to it for the forseeable future, so my hands are tied when it comes to large modifications. I must use a .NET language.

The files are XML serialized versions of four classes that are used by the tool (let's call them A, B, C, and D). The classes form a tree structure when all is well. Our editor works by loading a set of files, deserializing them, working out the relationships between them, and keeping track of any bad states it can find. The idea is for us to move away from hand-editing these files, which introduces tons of errors.

For a particular type of error, I'd like to maintain a collection of all files that have the problem. All four classes can have the problem, and I'd like to reduce duplication of code as much as possible. An important requirement is the user needs to be able to get the items in sets; for example, they need to get all A objects with an error, and telling them to iterate over the whole collection and pick out what they want is unacceptable compared to a GetAs() method. So, my first thought was to make a generic item that related the deserialized object and some metadata to indicate the error:

public class ErrorItem<T>
{
    public T Item { get; set; }
    public Metadata Metadata { get; set; }
}

Then, I'd have a collection class that could hold all of the error items, with helper methods to extract the items of a specific class when the user needs them. This is where the trouble starts.

None of the classes inherit from a common ancestor (other than Object). This was probably a mistake of the initial design, but I've spent a few days thinking about it and the classes really don't have much in common other than a GUID property that uniquely identifies each item so I can see why the original designer did not relate them through inheritance. This means that the unified error collection would need to store ErrorItem<Object> objects, since I don't have a base class or interface to restrict what comes in. However, this makes the idea of this unified collection a little sketchy to me:

Public Class ErrorCollection
{
    public ErrorItem<Object> AllItems { get; set; }
}

However, this has consequences on the public interface. What I really want is to return the appropriate ErrorItem generic type like this:

public ErrorItem<A>[] GetA()

This is impossible because I can only store ErrorItem<Object>! I've gone over some workarounds in my head; mostly they include creating a new ErrorItem of the appropriate type on-the-fly, but it just feels kind of ugly. Another thought has been using a Dictionary to keep items organized by type, but it still doesn't seem right.

Is there some kind of pattern that might help me here? I know the easiest way to solve this is to add a base class that A, B, C, and D derive from, but I'm trying to have as small an impact on the original tool as possible. Is the cost of any workaround great enough that I should push to change the initial tool?

  • If A, B, C and D have nothing in common then adding a base class won't really get you anything. It will just be an empty class and in effect will be the same as object.

    I'd just create an ErrorItem class without the generics, make Item an object and do some casting when you want to use the objects referenced. If you want to use any of the properties or methods of the A, B, C or D class other than the Guid you would have had to cast them anyway.

    From Mendelt
  • Is this what you are looking for?

    private List<ErrorItem<object>> _allObjects = new List<ErrorItem<object>>();
    
    public IEnumerable<ErrorItem<A>> ItemsOfA
    {
        get
        {
            foreach (ErrorItem<object> obj in _allObjects)
            {
                if (obj.Item is A)
                    yield return new ErrorItem<A>((A)obj.Item, obj.MetaData);
            }
        }
    }
    

    If you want to cache the ItemsOfA you can easily do that:

    private List<ErrorItem<A>> _itemsOfA = null;
    
    public IEnumerable<ErrorItem<A>> ItemsOfACached
    {
        if (_itemsOfA == null)
            _itemsOfA = new List<ErrorItem<A>>(ItemsOfA);
        return _itemsOfA;
    }
    
    From fryguybob
  • The answer I'm going with so far is a combination of the answers from fryguybob and Mendelt Siebenga.

    Adding a base class would just pollute the namespace and introduce a similar problem, as Mendelt Siebenga pointed out. I would get more control over what items can go into the collection, but I'd still need to store ErrorItem<BaseClass> and still do some casting, so I'd have a slightly different problem with the same root cause. This is why I selected the post as the answer: it points out that I'm going to have to do some casts no matter what, and KISS would dictate that the extra base class and generics are too much.

    I like fryguybob's answer not for the solution itself but for reminding me about yield return, which will make a non-cached version easier to write (I was going to use LINQ). I think a cached version is a little bit more wise, though the expected performance parameters won't make the non-cached version noticably slower.

    Mendelt : Thanks for the reply. Most people just do fire-and-forget questions. Nice to see you adding some value by reviewing the answers and sharing your conclusions.
    From OwenP

0 comments:

Post a Comment