Wednesday, April 20, 2011

Serialization / Derialization of a tree structure

I'm trying to figure out the best way to save (serialize) and later open (deserialize) a tree structure. My structure is made up of various object types with different properties, but each inherits from a base abstract "Node" class.

Each node has unique ID (GUID), and has an AddSuperNode(Node nd) method that will set the parent of a node. This in turn calls other methods that allow the parent node to know what sub nodes it has. However, some nodes also utilize a AddAuxSuperNode() method that adds a secondary parent to the Node.

I was using binary serialization, but now I think I want to use something where I have a bit more control, and the serialized data is more accessible. I also want to retain Type information when I deserialize, and be able to serialize private values. So DataContractSerializer seemed like the best way to go.

I can't just serialize the root Node directly because of nodes having multiple parents. I do not want to create duplicate objects. So it would seem that I need to deconstruct the tree into a flat list, and then serialize that. Then after serializing that list reconstruct the tree. Does this sound right?

Like I said before each Node has a unique GUID identifier, but right now Nodes reference their parents/children directly and do not store their ids. I could update the AddSuperNode() and AddAuxSuperNode() methods to also update a list of parent ids to be serialized in addition to the direct references. But I'd rather only update/create this list when the object is being serialized. So i was thinking create an UpdateSuperNodeIDRefs() method in the node that would be called right before serialization.

The following is what I'm planning to do for serialization and deserialization of this structure. Can anyone suggestion a better/cleaner/more efficient way to do this?

Serialization

1) Provide the root node of the tree structure

2) Break down tree structure into a flat Dictionary(Guid id,Node nd) where id is the guid of nd.

3) Call UpdateSuperNodeIDRefs(); for each node to update the IDs it has saved for its parents.

4) Serialize the Dictionary of nodes with DataContractSerializer

Deserialization

1) Deserialize the Dictionary of nodes

2) Itterate through each Node in the Dictionary, reconnecting each to their parents. For any Parent IDs stored find the respective Node(s) in the Dictionary with matching ID(s) call the AddSuperNode() or AddAuxSuperNode() to re-connnect the node to its parent(s)

3) From any Node in the Dictionary find the root of the structure

4) Return the root Node

From stackoverflow
  • If a node has multiple parents, then it isn't a tree; it is, presumably, a graph. However - worry not; DataContractSerializer can handle this for you:

    using System;
    using System.IO;
    using System.Runtime.Serialization;
    
    [DataContract]
    class Node {
        [DataMember]
        public Node AnotherNode { get; set; }
    }
    
    static class Program
    {
        static void Main()
        {
            Node a = new Node(), b = new Node();
            // make it a cyclic graph, to prove reference-mode
            a.AnotherNode = b;
            b.AnotherNode = a;
    
            // the preserveObjectReferences argument is the interesting one here...
            DataContractSerializer dcs = new DataContractSerializer(
                typeof(Node), null, int.MaxValue, false, true, null);
            using (MemoryStream ms = new MemoryStream())
            {
                dcs.WriteObject(ms, a);
                ms.Position = 0;
                Node c = (Node) dcs.ReadObject(ms);
                // so .AnotherNode.Another node should be back to "c"
                Console.WriteLine(ReferenceEquals(c, c.AnotherNode.AnotherNode));
            }
    
        }
    }
    

0 comments:

Post a Comment