blog.minskio.co.uk

Content and theme behind minskio.co.uk
Log | Files | Refs

automating-paperwork-payslips.md (3154B)


      1 ---
      2 title: "Automating grabbing payslips for use with Paperless"
      3 date: 2019-02-04T11:47:00
      4 tags: ["Formats", "Guides", "Linux", "Servers", "Software"]
      5 ---
      6 
      7 My workplace has recently started sending out Payslips as email attachments instead of the usual physical sheet which I'm a big fan of, [Paperless](https://github.com/the-paperless-project/paperless) is always on hand to sort and process any paperwork I have which keeps things organised and under control.
      8 
      9 To tie all these processes together, we're going to use `getmail`, `mpack` and `qpdf`.
     10 
     11 Please note that this will download your **entire inbox** every time so it helps if you don't run the script too often, and keep your inbox size to manageable levels.
     12 
     13 Firstly, we'll need to specify a number of variables for use later in the script:
     14 ```
     15 email_sender="payslipsender@address"
     16 email_username="youremailaddress"
     17 email_password="youremailpassword"
     18 payslip_password="yourpdfpassword"
     19 payslip_pattern=Payslip
     20 payslip_filetype=pdf
     21 import_directory="$HOME/import"
     22 temp_directory="$(mktemp -d)"
     23 ```
     24 
     25 We'll change to the temporary directory and make the directories that `getmail` expects to be there:
     26 ```
     27 cd "$temp_directory" || exit
     28 mkdir {cur,new,tmp}
     29 ```
     30 
     31 As I don't really want to keep an copy of my whole inbox around for no good reason, I dump my email to a temporary directory and write my `getmail` config file into this directory with a heredoc.
     32 Here I'm using IMAP with SSL but getmail supports [a number of different methods of grabbing mail](http://pyropus.ca/software/getmail/configuration.html#conf-retriever):
     33 
     34 ```
     35 cat << EOF > getmailrc
     36 [retriever]
     37 type = SimpleIMAPSSLRetriever
     38 server = your.imap.server
     39 username = $email_username
     40 port = 993
     41 password = $email_password
     42 
     43 [destination]
     44 type = Maildir
     45 path = $temp_directory/
     46 EOF
     47 ```
     48 
     49 Then run `getmail` using the temporary directory as your working directory:
     50 ```
     51 getmail --getmaildir "$temp_directory"
     52 ```
     53 
     54 Change directory to our newly saved items, then extract all attachments that match our search pattern in the variable above. Lastly, move these attachments to the Paperless import directory.
     55 ```
     56 cd new || exit
     57 grep "$email_sender" ./* | cut -f1 -d: | uniq | xargs munpack -f
     58 mv "$payslip_pattern"*"$payslip_filetype" "$import_directory"
     59 ```
     60 
     61 Now Paperless won't work on these files unless they're decrypted, which we can do as follows:
     62 ```
     63 cd "$import_directory" || exit
     64 for i in $payslip_pattern*$payslip_filetype; do
     65 	fileProtected=0
     66 	qpdf "$i" --check || fileProtected=1
     67 	if [ $fileProtected == 1 ]; then
     68 		qpdf --password="$payslip_password" --decrypt "$i" "decrypt-$i" && rm "$i"
     69 	fi
     70 done
     71 ```
     72 
     73 Now we have a directory full of unencrypted files to let Paperless work with. Last but not least, we'll need to delete the old temporary directory we used:
     74 ```
     75 rm -r "$temp_directory"
     76 ```
     77 
     78 Lastly all you need to do is set up the above script as a cron job to run after pay day! The cron line I'm using is as follows:
     79 ```
     80 0 0 2 * * $HOME/path/to/script/payslip.sh &
     81 ```
     82 
     83 * **Edit 2019-02-27:** `pdftk` replaced with `qpdf` as it required java which pulls down ~200MB dependencies.
     84 * **Edit 2019-08-09:** Added cron section.