+----------------------------------------------------------+ |Email and Batch Attachments Extraction - Tips and Tricks| |Written by Wim Stockman - on 06 Aug 2020 | |Last Updated - on 5 May 2023 | +----------------------------------------------------------+ _____ _ _ _____ _ _ _ | ____|_ __ ___ __ _(_) | | ____|_ _| |_ _ __ __ _ ___| |_(_) ___ _ __ | | _| | '_ ` _ \ / _` | | | | _| \ \/ / __| '__/ _` |/ __| __| |/ _ \| '_ \ | | |___| | | | | | (_| | | | | |___ > <| |_| | | (_| | (__| |_| | (_) | | | || |_____|_| |_| |_|\__,_|_|_| |_____/_/\_\\__|_| \__,_|\___|\__|_|\___/|_| |_|| ___ ____ _ _ ( _ ) | __ ) __ _| |_ ___| |__ / _ \/\ | _ \ / _` | __/ __| '_ \ | (_> < | |_) | (_| | || (__| | | | \___/\/ |____/ \__,_|\__\___|_| |_| _ _ _ _ _ _____ _ _ _ / \ | |_| |_ __ _ ___| |__ _ __ ___ ___ _ __ | |_ | ____|_ _| |_ _ __ __ _ ___| |_(_) ___ _ __ / _ \| __| __/ _` |/ __| '_ \| '_ ` _ \ / _ \ '_ \| __| | _| \ \/ / __| '__/ _` |/ __| __| |/ _ \| '_ \ / ___ \ |_| || (_| | (__| | | | | | | | | __/ | | | |_ | |___ > <| |_| | | (_| | (__| |_| | (_) | | | | /_/ \_\__|\__\__,_|\___|_| |_|_| |_| |_|\___|_| |_|\__| |_____/_/\_\\__|_| \__,_|\___|\__|_|\___/|_| |_| Preface All commands are done in bash on a Linux Arch System Update 5 May 2023 ---------------- 1.Ripmime --------- For some reason munpack was doing awkward after so many years of service with my sieve script. I guess the mails I want to process now , changed of style and are embedded in each other what a problem seems to be for munpack So I had to switch to ripmime. Which over the years also grow maturity. in my script that gets call by a dovecot sieve script I use it. ** /usr/bin/ripmime -v -i - -d /mydir_where_the_attachment_gets_stored > ripmime.log The parameters i use are -i - : the '-' means input from STDIN -d : output directory -v : which stands for verbose and outputs the file names extracted from which I write to the ripmime.log this comes in handy to process the attachment later on. End of Update ------------ 1. Extracting E-mails --------------------- 1.1 Extracting e-mails From mbox file format. If you downloaded some mails in mbox format from gmail takeout or some newsgroup. Sometimes you just want every mail as a single file. Here is a nice one liner to do this with Awk: ** awk '/^From / {nr +=1;next;} ; {print $0 >> sprintf("%06d.eml",nr)'}' your_mbox_file A Perl equivalent: ** perl -pe 'open STDOUT, sprintf(">m%05d.mbx", ++$n) if /^From /' < your_mbox_file > before-first 1.2 Extracting e-mails from encapsulated emails. When your coworker sents you a bunch of emails as an attachment inside an email and you want every mail separatly. Or you selected a lot of important mails for yourself and sented them to yourself in one mail so you could easily save them. And now you want every mail as a single file for you new archiving system you are building. Here are some steps to get you going. Tools required: munpack link to install: https://salsa.debian.org/debian/mpack The command: ** munpack -t yourmail.eml This will extract your different emails and name them part1 part2 part3 etc... without extension. if you want to rename them to mail1.eml mail2.eml you can run this command: for f in part*; do mv "$f" "mail${f:4}.eml"; done To combine both commands into a nice oneliner: ** munpack -t yourmail.eml && for f in part*; do mv "$f" "mail${f:4}.eml"; done 2. Extracting Attachments ------------------------- 2.1 Extract attachments of a single mail The best result is achieved with munpack from the mpack package. I tried with ripmime but this fails to much. You can install the mpack from source from https://salsa.debian.org/debian/mpack or use your package manager. The command is simple: ** munpack yourmail This will extract the attachments from your email. if you also want to text part use the "-t" option 2.2 Extract attachments of multiple emails munpack can only extract from one email at a time but it does it really well. So how do we get those hundreds mails with attachment processed in a bliss. Presuming all your mails have a suffix of .eml we can easly select them with a wildcard *.eml So we throw in some bash magic for-loop and we and end up with this command: ** for f in *.eml; do munpack "$f"; done If munpack encounters duplicated names of files it wil add a numbered suffix to it. So you don't have to worry about that.c If all your mail-files have different names you should first copy them to a separate folder and then use the more wild wild-card at it. ** for f in *; do munpack "$f"; done 2.3 Extracting all attachments out of an mbox file. Say you have this nice mbox archive of a cartoonist whom made a picture every week over the past decade. You are intrested in to see all the pictures and not the comment he made. So this will combine the codes we learned from Chapter 1 and 2 and combine them together in this command: ** awk 'BEGIN {RS="\r\n";} /^From / { cmd="munpack"; print mail | cmd ;close(cmd) ;mail ="";} {mail = mail $0 "\n";}' The RS="\r\n" is only needed if your mbox file is created in a dos or windows environment. I noticed with google takeout it is needed. Sidenote: I love awk, that's why I made this work from inside awk where it calls munpack as a subprocess. Have fun. +--------------------------------------------------+ |Suggestions? gopher@yasendfile.org | +--------------------------------------------------+