Thursday, March 3, 2011

How do I skip processing the attachments of an email which is an attachment of a different email.

using jython

I have a situation where emails come in with different attachments. Certain file types I process others I ignore and dont write to file. I am caught in a rather nasty situation, because sometimes people send an email as an attachment, and that attached email has legal attachments.

What I want to do is skip that attached email and all its attachments.

using python/jythons std email lib how can i do this?


to make it clearer

I need to parse an email (named ROOT email), I want to get the attachments from this email using jython. Next certain attachments are supported ie .pdf .doc etc now it just so happens that, the clients send an email (ROOT email) with another email message (CHILD email) as an attachment, and in CHILD email it has .pdf attachments and such like.

What I need is: to get rid of any CHILD emails attached to the ROOT email AND the CHILD emails attachments. What happens is I walk over the whole email and it just parses every attachment, BOTH ROOT attachments and CHILD attachments as if they were ROOT attachments.

I cannot have this. I am only interested in ROOT attachements that are legal ie .pdf .doc. xls .rtf .tif .tiff

That should do for now, I have to run to catch a bus! thanks!

From stackoverflow
  • What about the example named "Here’s an example of how to unpack a MIME message like the one above, into a directory of files"? It looks close from what you want.

    import email
    ...
    msg = email.message_from_file(fp)
    ...
    for part in msg.walk():
        # multipart/* are just containers
        if part.get_content_maintype() == 'multipart':
            continue
        # Applications should really sanitize the given filename so that an
        # email message can't be used to overwrite important files
        filename = part.get_filename()
        if not filename:
            ext = mimetypes.guess_extension(part.get_content_type())
        ...
    
    Setori : oh yeah i tried that but uess_extension returns dodgy output, also walk seems to walk over every part, I just want the parts in the root of the email. kinda like getting files in the root directory but not the files inside directories in the root folder.
  • Have you tried the get_payload( [i[, decode]]) method? Unlike walk it is not documented to recursively open attachments.

  • I'm understanding your questions to mean "I have to check all attachments of an email, but if an attachment is also an email, I want to ignore it." Either way this answer should lead you down the right path.

    What I think you want is mimetypes.guess_type(). Using this method is also much better than just checking against a list of exentions.

    def check(self, msg):
        import mimetypes
    
        for part in msg.walk():
            if part.get_filename() is not None:
                filenames = [n for n in part.getaltnames() if n]
                for filename in filenames:
                    type, enc = mimetypes.guess_type(filename)
                    if type.startswith('message'):
                        print "This is an email and I want to ignore it."
                    else:
                        print "I want to keep looking at this file."
    

    Note that if this still looks through attached emails, change it to this:

    def check(self, msg):
        import mimetypes
    
        for part in msg.walk():
            filename = part.get_filename()
            if filename is not None:
                type, enc = mimetypes.guess_type(filename)
                if type.startswith('message'):
                    print "This is an email and I want to ignore it."
                else:
                    part_filenames = [n for n in part.getaltnames() if n]
                    for part_filename in part_filenames:
                        print "I want to keep looking at this file."
    

    MIME types documentation

  • The problem with existing suggestions is the walk method. This recursively, depth-first, walks the entire tree, including children.

    Look at the source of the walk method, and adapt it to skip the recursive part. A cursory reading suggests:

    if msg.is_multipart():
        for part in msg.get_payload():
              """ Process message, but do not recurse """
              filename = part.get_filename()
    

    Reading the pydocs, get_payload should return a list of the top level messages, without recursing.

0 comments:

Post a Comment