automating-paperwork-payslips.md (3154B)
1 --- 2 title: "Automating grabbing payslips for use with Paperless" 3 date: 2019-02-04T11:47:00 4 tags: ["Formats", "Guides", "Linux", "Servers", "Software"] 5 --- 6 7 My workplace has recently started sending out Payslips as email attachments instead of the usual physical sheet which I'm a big fan of, [Paperless](https://github.com/the-paperless-project/paperless) is always on hand to sort and process any paperwork I have which keeps things organised and under control. 8 9 To tie all these processes together, we're going to use `getmail`, `mpack` and `qpdf`. 10 11 Please note that this will download your **entire inbox** every time so it helps if you don't run the script too often, and keep your inbox size to manageable levels. 12 13 Firstly, we'll need to specify a number of variables for use later in the script: 14 ``` 15 email_sender="payslipsender@address" 16 email_username="youremailaddress" 17 email_password="youremailpassword" 18 payslip_password="yourpdfpassword" 19 payslip_pattern=Payslip 20 payslip_filetype=pdf 21 import_directory="$HOME/import" 22 temp_directory="$(mktemp -d)" 23 ``` 24 25 We'll change to the temporary directory and make the directories that `getmail` expects to be there: 26 ``` 27 cd "$temp_directory" || exit 28 mkdir {cur,new,tmp} 29 ``` 30 31 As I don't really want to keep an copy of my whole inbox around for no good reason, I dump my email to a temporary directory and write my `getmail` config file into this directory with a heredoc. 32 Here I'm using IMAP with SSL but getmail supports [a number of different methods of grabbing mail](http://pyropus.ca/software/getmail/configuration.html#conf-retriever): 33 34 ``` 35 cat << EOF > getmailrc 36 [retriever] 37 type = SimpleIMAPSSLRetriever 38 server = your.imap.server 39 username = $email_username 40 port = 993 41 password = $email_password 42 43 [destination] 44 type = Maildir 45 path = $temp_directory/ 46 EOF 47 ``` 48 49 Then run `getmail` using the temporary directory as your working directory: 50 ``` 51 getmail --getmaildir "$temp_directory" 52 ``` 53 54 Change directory to our newly saved items, then extract all attachments that match our search pattern in the variable above. Lastly, move these attachments to the Paperless import directory. 55 ``` 56 cd new || exit 57 grep "$email_sender" ./* | cut -f1 -d: | uniq | xargs munpack -f 58 mv "$payslip_pattern"*"$payslip_filetype" "$import_directory" 59 ``` 60 61 Now Paperless won't work on these files unless they're decrypted, which we can do as follows: 62 ``` 63 cd "$import_directory" || exit 64 for i in $payslip_pattern*$payslip_filetype; do 65 fileProtected=0 66 qpdf "$i" --check || fileProtected=1 67 if [ $fileProtected == 1 ]; then 68 qpdf --password="$payslip_password" --decrypt "$i" "decrypt-$i" && rm "$i" 69 fi 70 done 71 ``` 72 73 Now we have a directory full of unencrypted files to let Paperless work with. Last but not least, we'll need to delete the old temporary directory we used: 74 ``` 75 rm -r "$temp_directory" 76 ``` 77 78 Lastly all you need to do is set up the above script as a cron job to run after pay day! The cron line I'm using is as follows: 79 ``` 80 0 0 2 * * $HOME/path/to/script/payslip.sh & 81 ``` 82 83 * **Edit 2019-02-27:** `pdftk` replaced with `qpdf` as it required java which pulls down ~200MB dependencies. 84 * **Edit 2019-08-09:** Added cron section.