Wednesday, April 13, 2011

Python - Intersection of two lists

Hi,

I know how to get an intersection of two flat lists:

b1 = [1,2,3,4,5,9,11,15]
b2 = [4,5,6,7,8]
b3 = [val for val in b1 if val in b2]

or

def intersect(a, b):
     return list(set(a) & set(b))

print intersect(b1, b2)

But when I have to find intersection for nested lists then my problems starts:

c1 = [1, 6, 7, 10, 13, 28, 32, 41, 58, 63]
c2 = [[13, 17, 18, 21, 32], [7, 11, 13, 14, 28], [1, 5, 6, 8, 15, 16]]

In the end I would like to receive:

c3 = [[13,32],[7,13,28],[1,6]]

Can you guys give me a hand with this?

Related

From stackoverflow
  • Do you consider [1,2] to intersect with [1, [2]]? That is, is it only the numbers you care about, or the list structure as well?

    If only the numbers, investigate how to "flatten" the lists, then use the set() method.

    unbeknown : +1 And there are already plenty of questions about how to flatten nested lists.
    : I'd like to leave the structure of the lists unchanged.
  • You should flatten using this code ( taken from http://kogs-www.informatik.uni-hamburg.de/~meine/python_tricks ), the code is untested, but I'm pretty sure it works:

    
    def flatten(x):
        """flatten(sequence) -> list
    
        Returns a single, flat list which contains all elements retrieved
        from the sequence and all recursively contained sub-sequences
        (iterables).
    
        Examples:
        >>> [1, 2, [3,4], (5,6)]
        [1, 2, [3, 4], (5, 6)]
        >>> flatten([[[1,2,3], (42,None)], [4,5], [6], 7, MyVector(8,9,10)])
        [1, 2, 3, 42, None, 4, 5, 6, 7, 8, 9, 10]"""
    
        result = []
        for el in x:
            #if isinstance(el, (list, tuple)):
            if hasattr(el, "__iter__") and not isinstance(el, basestring):
                result.extend(flatten(el))
            else:
                result.append(el)
        return result
    

    After you had flattened the list, you perform the intersection in the usual way:

    
    c1 = [1, 6, 7, 10, 13, 28, 32, 41, 58, 63]
    c2 = [[13, 17, 18, 21, 32], [7, 11, 13, 14, 28], [1, 5, 6, 8, 15, 16]]
    
    def intersect(a, b):
         return list(set(a) & set(b))
    
    print intersect(flatten(c1), flatten(c2))
    
    
  • If you want:

    c1 = [1, 6, 7, 10, 13, 28, 32, 41, 58, 63]
    c2 = [[13, 17, 18, 21, 32], [7, 11, 13, 14, 28], [1, 5, 6, 8, 15, 16]]
    c3 = [[13, 32], [7, 13, 28], [1,6]]
    

    Then here is your solution:

    c3 = [filter(lambda x: x in c1, sublist) for sublist in c2]
    

    Explanation:

    The filter part takes each sublist's item and checks to see if it is in the source list c1. The list comprehension is executed for each sublist in c2.

    : Simple and right in the bull's eye. Thank YOU!
    J.F. Sebastian : You can use `filter(set(c1).__contains__, sublist)` for efficiency. btw, the advantage of this solution is that `filter()` preserves strings and tuples types.
  • You don't need to define intersection. It's already a first-class part of set.

    >>> b1 = [1,2,3,4,5,9,11,15]
    >>> b2 = [4,5,6,7,8]
    >>> set(b1).intersection( set(b2) )
    set([4, 5])
    
  • Pure list comprehension version

    >>> c1 = [1, 6, 7, 10, 13, 28, 32, 41, 58, 63]
    >>> c2 = [[13, 17, 18, 21, 32], [7, 11, 13, 14, 28], [1, 5, 6, 8, 15, 16]]
    >>> c1set = frozenset(c1)
    

    Flatten variant:

    >>> [n for lst in c2 for n in lst if n in c1set]
    [13, 32, 7, 13, 28, 1, 6]
    

    Nested variant:

    >>> [[n for n in lst if n in c1set] for lst in c2]
    [[13, 32], [7, 13, 28], [1, 6]]
    
    MizardX : +1 for using a set of c1 for lookup.
  • The functional approach:

    input_list = [[1,2,3,4,5],[2,3,4,5,6],[3,4,5,6,7]]
    
    result = reduce(set.intersection,map(set,input_list))
    

    and it can be applied to the more general case of 1+ lists

  • For people just looking to find the intersection of two lists, the Asker provided two methods:

    b1 = [1,2,3,4,5,9,11,15]
    b2 = [4,5,6,7,8]
    b3 = [val for val in b1 if val in b2]
    

    and

    def intersect(a, b):
         return list(set(a) & set(b))
    
    print intersect(b1, b2)
    

    But there is a hybrid method that is more efficient, because you only have to do one conversion between list/set, as opposed to three:

    b1 = [1,2,3,4,5]
    b2 = [3,4,5,6]
    s2 = set(b2)
    b3 = [val for val in b1 if val in s2]
    

    This will run in O(n), whereas his original method involving list comprehension will run in O(n^2)

0 comments:

Post a Comment