Courier-MTA has a generic filtering interface different than sendmail's milter. You can find links to implementations here, including courier-pythonfilter. The latter can also be found in pypi. It includes the attachments.py that the filter presented here is alternative to. You must install that filter before trying this.
The original attachments.py uses libarchive-c if present, but doesn't require it. The present filter does. (Mind a Python package with a similar name whose namespace collides with libarchive and hence is difficult to recognize once installed.) In addition, it requires oletools, a Python library to analyze OLE and MS Office files. They are both listed in requirements.txt.
The filter won't block messages destined exclusively to abuse@ mailboxes. That's meant to let abuse teams receive complaints. You may want to alter it (search can_pass), which is not quite a trendy way to maintain software —see below.
Careful with that pip install as it will try and install courier-pythonfilter. If it's already installed and you're in a virtualenv, you need no sudo.
sudo pip install -r "http://www.tana.it/svn/pyfilters/trunk/requirements.txt" curl -O "http://www.tana.it/svn/pyfilters/trunk/attachments.py" python -m compileall -l "attachments.py" sudo mv "attachments.pyc" "/usr/local/lib/python2.7/dist-packages/pythonfilter" sudo courierfilter stop sudo courierfilter start
Check the /python2.7/ destination directory is correct! If you'd like to consider a pythonic install, please see below.
001: #!/usr/bin/python 002: " attachments -- Courier filter which blocks specified attachment types" 003: # Copyright (C) 2005-2008 Robert Penz <robert@penz.name> 004: # hacked (H) 2017 ale 005: # 020: 021: import sys 022: from email.message import Message, _unquotevalue 023: import email.utils 024: import binascii 025: 026: # this is libarchive-c 027: import libarchive 028: 029: import oletools.olevba 030: from oletools.mraptor import MacroRaptor 031: from oletools import rtfobj 032: 033: from io import BytesIO 034: 035: # Extensions. Assume any extension appears in at most one list. 036: # Each list has a different treatment. 037: # Maintain: 038: # $ i=0; for e in $(sort < temp |uniq ); do printf " '%s'," $e; if [ $((++i % 8)) -eq 0 ]; then printf "\n"; fi; done; printf "\n" 039: 040: # https://support.google.com/mail/answer/6590?hl=en 041: # http://www.theverge.com/2017/1/25/14391462/gmail-javascript-block-file-attachments-malware-security 042: # https://kb.intermedia.net/Article/23567 043: 044: blocked_extensions = ( 045: '.acc', '.ade', '.adp', '.asp', '.bat', '.ccs', '.chm', '.class', 046: '.cmd', '.com', '.cpl', '.dll', '.dmg', '.drv', '.exe', '.grp', 047: '.hlp', '.hta', '.htx', '.ins', '.isp', '.jar', '.je', '.js', 048: '.jse', '.lib', '.lnk', '.mde', '.msc', '.msh', '.msh1', '.msh1xml', 049: '.msh2', '.msh2xml', '.mshxml', '.msi', '.msp', '.mst', '.ocx', '.ovl', 050: '.pcd', '.php', '.php3', '.pif', '.ps1', '.ps1xml', '.ps2', '.ps2xml', 051: '.psc1', '.psc2', '.reg', '.sbs', '.scr', '.sct', '.shb', '.shd', 052: '.shs', '.sys', '.vb', '.vba', '.vbe', '.vbs', '.vdl', '.vxd', 053: '.ws', '.wsc', '.wsf', '.wsh', '.wst') 054: 055: 056: # extensions supported by VBA_parser, see also 057: # https://en.wikipedia.org/wiki/List_of_Microsoft_Office_filename_extensions 058: # https://datatypes.net/open-ade-files 059: # https://docs.microsoft.com/en-us/deployoffice/security/block-specific-file-format-types-in-office 060: # office_extensions = ( 061: # '.doc', '.dot', #- Word 97-2003 062: # '.docm', '.dotm', #- Word 2007+ 063: # '.xml', #- Word 2003 XML 064: # '.mht', #- Word MHT - Single File Web Page / MHTML 065: # '.xls', #- Excel 97-2003 066: # '.xlsm', '.xlsb', #- Excel 2007+ 067: # '.ppt', #- PowerPoint 97-2003 068: # '.pptm', '.ppsm') #- PowerPoint 2007+ 069: 070: office_extensions = ( 071: '.accda', '.accdb', '.accde', '.accdr', '.accdt', '.ade', '.adn', '.adp', 072: '.cdb', '.doc', '.docb', '.docm', '.docx', '.dot', '.dotm', '.dotx', 073: '.htm', '.html', '.laccdb', '.ldb', '.maf', '.mam', '.maq', '.mar', 074: '.mat', '.mda', '.mdb', '.mde', '.mdf', '.mdn', '.mdt', '.mdw', 075: '.mht', '.mhtml', '.ods', '.pot', '.potm', '.potx', '.ppam', '.ppax', 076: '.pps', '.ppsm', '.ppsx', '.ppt', '.pptm', '.pptx', '.rtf', '.sldm', 077: '.sldx', '.thmx', '.wbk', '.wiz', '.xla', '.xlam', '.xlb', '.xlcxlk', 078: '.xll', '.xlm', '.xlmss', '.xls', '.xlsb', '.xlsm', '.xlsx', '.xlt', 079: '.xltm', '.xltx', '.xlw') 080: 081: 082: # extensions implemented as a zip container have a variety of media 083: # files, but macro are still implemented as OLE containers. 084: # See 'Heuristic' below. 085: # https://www.codeproject.com/Articles/15216/Office-2007-bin-file-format 086: # https://kb.intermedia.net/Article/23567 087: 088: 089: 090: # https://en.wikipedia.org/wiki/List_of_archive_formats but must be in 091: # https://github.com/libarchive/libarchive/wiki/ManPageLibarchiveFormats5 092: archive_extensions = ('.zip', '.tar.gz', '.tgz', '.tar.Z', '.tar.bz2', 093: '.tbz2', '.tar.lzma', '.tlz', '.7z', '.ace', '.rar') 094: 095: 096: # TODO: detect documents with TargetMode="External" and DDE, see: 097: # http://staaldraad.github.io/2017/10/23/msword-field-codes/ 098: 099: 100: def de_comment(field): 101: """Parse a header field fragment and remove comments. 102: 103: copied from AddrlistClass.getdelimited() in email/_parseaddr.py 104: """ 105: 106: slist = [''] 107: quote = False 108: pos = 0 109: depth = 0 110: while pos < len(field): 111: if quote: 112: quote = False 113: elif field[pos] == '(': 114: depth += 1 115: elif field[pos] == ')': 116: depth = max(depth - 1, 0) 117: pos += 1 118: continue 119: elif field[pos] == '\\': 120: quote = True 121: if depth == 0: 122: slist.append(field[pos]) 123: pos += 1 124: 125: return ''.join(slist) 126: 127: def is_quoted(value): 128: """ Check whether a value (string or tuple) is quoted 129: """ 130: if isinstance(value, tuple): 131: return value[2].startswith('"') 132: else: 133: return value.startswith('"') 134: 135: class Recipients(object): 136: def __init__(self, controlFileList=None, *args, **kwargs): 137: object.__init__(self, *args, **kwargs) 138: self.rcpt_count = 0 139: self.rcpt_abuse = 0 140: self.relay = False 141: 142: for cf in controlFileList: 143: with open(cf) as fp: 144: for line in fp: 145: if line[0] == 'r': 146: self.rcpt_count = self.rcpt_count + 1 147: if line[1:7].lower() == 'abuse@': 148: self.rcpt_abuse = self.rcpt_abuse + 1 149: elif line[0] == 'u': 150: if line[1:9] == 'authsmtp': 151: self.relay = True 152: 153: def can_pass(self): 154: "Return true if the only recipient(s) are RFC2142 abuse-mailbox(es)" 155: return self.rcpt_count > 0 and self.rcpt_abuse == self.rcpt_count 156: 157: 158: class MyMessage(Message): 159: """Email message with comments stripped 160: """ 161: def __init__(self, *args, **kwargs): 162: Message.__init__(self, *args, **kwargs) 163: 164: def get_filename(self, failobj=None): 165: """Return the filename associated with the payload if present. 166: 167: The filename is extracted from the Content-Disposition header's 168: `filename' parameter. If that header is missing the `filename' 169: parameter, this method falls back to looking for the `name' parameter. 170: """ 171: # changed from original: get the unquoted string 172: missing = object() 173: filename = self.get_param('filename', missing, 'content-disposition', 174: unquote=False) 175: if filename is missing: 176: filename = self.get_param('name', missing, 'content-type', unquote=False) 177: if filename is missing: 178: return failobj 179: 180: # added to original: non quoted comments are removed 181: bare = is_quoted(filename) 182: if not bare: 183: filename = _unquotevalue(filename) 184: filename = email.utils.collapse_rfc2231_value(filename) 185: if bare and '(' in filename: 186: filename = de_comment(filename) 187: # malformed values, e.g. name=3D"blah", we only remove trailing char 188: while filename.endswith(('"', "'", '>', ',', ';')): 189: filename = filename[0:-1] 190: return filename.strip().lower() 191: 192: def reader_entry(which): 193: # print 'Entered', which, 'reader' 194: pass 195: 196: def check_message(msg): 197: block = False 198: for part in msg.walk(): 199: try: 200: # reader of attached email message 201: def mail_reader(): 202: # return part.get_payload(decode=True) 203: # copied in order to detect malformed stuff 204: reader_entry('mail') 205: payload = part.get_payload() 206: cte = part.get('content-transfer-encoding', '').lower() 207: if cte == 'quoted-printable': 208: return binascii.a2b_qp(payload) 209: elif cte == 'base64': 210: mem = '' 211: for line in payload.split(): 212: ln = line.strip() 213: try: 214: mem += binascii.a2b_base64(ln) 215: except binascii.Error: 216: # Incorrect padding 217: l = '' 218: for c in ln: 219: if not (c.isalnum() or c in '+/'): 220: break 221: l += c 222: if len(l) % 4: 223: l+='===='[0:4-len(l)%4] 224: mem += binascii.a2b_base64(l) 225: return mem 226: elif cte in ('x-uuencode', 'uuencode', 'uue', 'x-uue'): 227: return binascii.a2b_uu(payload) 228: else: # 7bit, 8bit 229: if part.is_multipart(): 230: return payload[0].as_string() 231: else: 232: # When is_multipart() returns False, the payload should be a string object. 233: # https://docs.python.org/2/library/email.message.html#email.message.Message.is_multipart 234: return payload 235: 236: # multipart/* are just containers 237: if part.get_content_maintype() == 'multipart': 238: continue 239: 240: if part.get_content_type() == 'message/rfc822': 241: inner_msg = email.message_from_string(mail_reader(), _class=MyMessage) 242: return check_message(inner_msg) 243: 244: # get_filename() is in MyMessage 245: filename = part.get_filename() 246: if filename: 247: # print part.get_content_type(), filename 248: if block_file(filename, mail_reader): 249: return True 250: 251: finally: 252: pass 253: 254: return False 255: 256: def block_ole_file(filename, data): 257: try: 258: parser = oletools.olevba.VBA_Parser(BytesIO(data), data=data, relaxed=True) 259: # Heuristic: if an OpenXML contains an OLE container, it is suspicious 260: if parser.type == 'OpenXML': 261: if len(parser.ole_subfiles) > 0: 262: return True 263: if parser.detect_vba_macros(): 264: vba_code_all = '' 265: for (subfilename, stream_path, vba_filename, vba_code) in parser.extract_macros(): 266: vba_code_all += vba_code + '\n' 267: mraptor = MacroRaptor(vba_code_all) 268: mraptor.scan() 269: if mraptor.suspicious: 270: return True 271: except oletools.olevba.FileOpenError as e: 272: sys.stderr.write('attachments FileOpenError: ' + str(e) + '\n') 273: 274: def block_file(filename, reader): 275: """ 276: Check if a file should be blocked, either because of its extension 277: or its content. If content must be examined, the reader is called. 278: filename must be defined and lower().strip() 279: Return True if blocking is deserved. 280: """ 281: # print 'block_file', filename 282: if filename.endswith(blocked_extensions): 283: return True 284: 285: if filename.endswith(archive_extensions): 286: # print filename 287: try: 288: zmem = reader() 289: with libarchive.memory_reader(zmem) as archive: 290: for entry in archive: 291: def archive_reader(): 292: reader_entry('archive') 293: mem = '' 294: for block in entry.get_blocks(): 295: mem += block 296: return mem 297: 298: if block_file(entry.pathname, archive_reader): 299: return True 300: except libarchive.exception.ArchiveError as e: 301: if e.retcode == libarchive.ffi.ARCHIVE_FATAL: 302: # Unrecognized archive format, e.g. rar v5 303: return True 304: finally: 305: pass 306: 307: elif filename.endswith(".gz"): 308: def gunzip_reader(): 309: reader_entry('gunzip') 310: myvars = object() 311: myvars.mem = '' 312: myvars.just_1 = 0 313: myvars.size = -1 314: with libarchive.memory_reader(reader(), 315: format_name='raw', filter_name='gzip') as archive: 316: for entry in archive: 317: myvars.just_1 += 1 318: if myvars.just_1 != 1 or entry.size != None: 319: raise ValueError('Invalid gzip format') 320: for block in entry.get_blocks(): 321: myvars.mem += block 322: return myvars.mem 323: return block_file(filename[0:len(filename)-3], gunzip_reader) 324: 325: elif filename.endswith(office_extensions): 326: try: 327: data = reader() 328: if oletools.rtfobj.is_rtf(data, treat_str_as_data=True): 329: rtfp = oletools.rtfobj.RtfObjParser(data) 330: rtfp.parse() 331: for rtfobj in rtfp.objects: 332: if rtfobj.is_ole: 333: if rtfobj.oledata_size is None: 334: # format_id=TYPE_LINKED? 335: return True 336: elif block_ole_file(filename, rtfobj.oledata): 337: return True 338: elif rtfobj.is_package: 339: return True 340: else: 341: return block_ole_file(filename, data) 342: finally: 343: pass 344: elif filename.endswith('.eml'): 345: msg = email.message_from_string(reader(), _class=MyMessage) 346: return check_message(msg) 347: return False 348: 349: def doFilter(bodyFile, controlFileList): 350: "Function called by Pythonfilter" 351: try: 352: rcpts = Recipients(controlFileList) 353: if rcpts.can_pass(): 354: return '' 355: 356: msg = email.message_from_file(open(bodyFile), _class=MyMessage) 357: block = check_message(msg) 358: if block: 359: return "550 Attachment rejected for policy reasons" 360: 361: except Exception as e: 362: sys.stderr.write('attachments ' + type(e).__name__ + ': ' + str(e) + '\n') 363: 364: # nothing found --> to the next filter 365: return '' 366: 367: 368: if __name__ == '__main__': 369: # For debugging, you can create a file that contains a message 370: # body, possibly including attachments. 371: # Run this script with the name of that file as an argument, 372: # and it'll print either a permanent failure code to indicate 373: # that the message would be rejected, or print nothing to 374: # indicate that the remaining filters would be run. 375: if len(sys.argv) != 2: 376: print "Usage: attachments.py <message_body_file>" 377: sys.exit(0) 378: re = doFilter(sys.argv[1], []) 379: if (re == ''): 380: re = '(empty string)' 381: print re 382:
This is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation, either version 3 of the License, or
(at your option) any later version.
Please find copies of the GNU General Public License
at http://www.gnu.org/licenses/.
Courier-pythonfilter doesn't seem to support independently written filters. In addition, this ones uses exactly the same name as an existing filters, so it has to be re-installed everytime courier-pythonfilter is upgraded. Perhaps, courier-pythonfilter should define a namespace for the filters.
For another topic, I found it easier to write a can_pass function than using courier-pythonfilter configuration. A well configured framework can provide for a start filter which just reads the control file and honors whitelisted targets by skipping all subsequent filters. However, I'm not sure all filters deserve the same whitelisting. An alternative approach would be to cache the contents of the control file, so as to share them among filters.
Besides the Courier-users mailing list, this page also provides for hypothesis annotations if you allowed their javascript. You might have noticed those icons in the upper right corner. Please register at their server in order to write comments.
Copyright (C) 2017 Alessandro Vesely, all rights reserved.