blog.minskio.co.uk

Content and theme behind minskio.co.uk
Log | Files | Refs

disabling-cgit-scraping-logs.md (2839B)


      1 ---
      2 title: "Disabling cgit scraping logs"
      3 date: 2021-11-03T10:47:00
      4 tags: ["Docker", "Linux", "Servers", "Snippets", "Software"]
      5 ---
      6 
      7 One recurring problem that keeps happening every couple of months is my server will run out of disk space, the cause is usually the docker directory blowing up in size to a few gigabytes which on my small VPS can really start to cause issues.
      8 
      9 You can find the offending containers using the excellent `ncdu` via:
     10 ```
     11 sudo ncdu /var/lib/docker
     12 ```
     13 
     14 When you have the ID of the container (e.g. `/var/lib/docker/overlay2/6fe41495127cc92398107df951416ec27463fd4ff6525a7d227bcf0c4e63803a`) you can find the corresponding container via:
     15 ```
     16 for i in $(docker ps -a | awk '{if (NR!=1) {print $NF}}')
     17 do
     18 	if docker inspect "$i" | grep -q 6fe41495127cc92398107df951416ec27463fd4ff6525a7d227bcf0c4e63803a 
     19 	then
     20 		echo "$i"
     21 	fi
     22 done
     23 ```
     24 
     25 With the offender found, you can start a shell in this container and browse to the files (in my case, /var/log)
     26 ```
     27 docker exec -it cgit sh
     28 cd /var/log/httpd/
     29 ls -lah
     30 ```
     31 
     32 Here I have a gigabyte `error_log` and a hundred megabyte `access_log`. Using `tail -f` to have a look at the files, it's mainly bots scraping diffs causing these logs.
     33 
     34 Now let's get these disabled, there's a `robots=index, nofollow` option in `/etc/cgitrc` that can be changed to `robots=none`. To stop this option being reset on the container restarting, we'll mount this file to the host filesystem. Below are the relevant lines from my `docker-compose.yml` file:
     35 ```
     36     volumes:
     37       - $CONFDIR/cgit/cgitrc:/etc/cgitrc
     38 ```
     39 
     40 As an added bonus, we can fix a long-standing issue with this container where code that should be highlighted is just blank. The line in question is `source-filter=/opt/highlight.sh` Comment out or remove this line and you'll have code previews working as expected.
     41 
     42 Unfortunately, even with the above in place logs are immediately starting to fill up again with bot user agents. Time for a more janky solution! Logging is being controlled in this container via the `/etc/httpd/conf/httpd.conf` file, again we're going to mount this on the host filesystem with a `docker-compose` declaration:
     43 
     44 ```
     45     volumes:
     46       - $CONFDIR/cgit/cgitrc:/etc/cgitrc
     47       - $CONFDIR/cgit/httpd.conf:/etc/httpd/conf/httpd.conf
     48 ```
     49 
     50 With this file on our host filesystem, we can now edit it. The offending lines are as follows:
     51 ```
     52 ErrorLog "logs/error_log"
     53 LogLevel warn
     54 
     55 LogFormat "%h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\"" combined
     56 LogFormat "%h %l %u %t \"%r\" %>s %b" common
     57 CustomLog "logs/access_log" combined
     58 ```
     59 
     60 All you need to do is pre-append all lines except ErrorLog with a `#` symbol, then change the `ErrorLog` location to `/dev/null`.
     61 
     62 With that drastic and janky change, restart the container and you should notice that no more logs are being created.