summaryrefslogtreecommitdiffstats
path: root/content/posts/automating-paperwork-payslips.md
blob: 0643878b2ba057722c41319c187fa0eca7db5543 (plain) (blame)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
---
title: "Automating grabbing payslips for use with Paperless"
date: 2019-02-04T11:47:00
tags: ["Formats", "Guides", "Linux", "Servers", "Software"]
---

My workplace has recently started sending out Payslips as email attachments instead of the usual physical sheet which I'm a big fan of, [Paperless](https://github.com/the-paperless-project/paperless) is always on hand to sort and process any paperwork I have which keeps things organised and under control.

To tie all these processes together, we're going to use `getmail`, `mpack` and `qpdf`.

Please note that this will download your **entire inbox** every time so it helps if you don't run the script too often, and keep your inbox size to manageable levels.

Firstly, we'll need to specify a number of variables for use later in the script:
```
email_sender="payslipsender@address"
email_username="youremailaddress"
email_password="youremailpassword"
payslip_password="yourpdfpassword"
payslip_pattern=Payslip
payslip_filetype=pdf
import_directory="$HOME/import"
temp_directory="$(mktemp -d)"
```

We'll change to the temporary directory and make the directories that `getmail` expects to be there:
```
cd "$temp_directory" || exit
mkdir {cur,new,tmp}
```

As I don't really want to keep an copy of my whole inbox around for no good reason, I dump my email to a temporary directory and write my `getmail` config file into this directory with a heredoc.
Here I'm using IMAP with SSL but getmail supports [a number of different methods of grabbing mail](http://pyropus.ca/software/getmail/configuration.html#conf-retriever):

```
cat << EOF > getmailrc
[retriever]
type = SimpleIMAPSSLRetriever
server = your.imap.server
username = $email_username
port = 993
password = $email_password

[destination]
type = Maildir
path = $temp_directory/
EOF
```

Then run `getmail` using the temporary directory as your working directory:
```
getmail --getmaildir "$temp_directory"
```

Change directory to our newly saved items, then extract all attachments that match our search pattern in the variable above. Lastly, move these attachments to the Paperless import directory.
```
cd new || exit
grep "$email_sender" ./* | cut -f1 -d: | uniq | xargs munpack -f
mv "$payslip_pattern"*"$payslip_filetype" "$import_directory"
```

Now Paperless won't work on these files unless they're decrypted, which we can do as follows:
```
cd "$import_directory" || exit
for i in $payslip_pattern*$payslip_filetype; do
	fileProtected=0
	qpdf "$i" --check || fileProtected=1
	if [ $fileProtected == 1 ]; then
		qpdf --password="$payslip_password" --decrypt "$i" "decrypt-$i" && rm "$i"
	fi
done
```

Now we have a directory full of unencrypted files to let Paperless work with. Last but not least, we'll need to delete the old temporary directory we used:
```
rm -r "$temp_directory"
```

Lastly all you need to do is set up the above script as a cron job to run after pay day! The cron line I'm using is as follows:
```
0 0 2 * * $HOME/path/to/script/payslip.sh &
```

* **Edit 2019-02-27:** `pdftk` replaced with `qpdf` as it required java which pulls down ~200MB dependencies.
* **Edit 2019-08-09:** Added cron section.