blog.minskio.co.uk

Content and theme behind minskio.co.uk
Log | Files | Refs

scraping-now-albums.md (1942B)


      1 ---
      2 title: "Scraping and grabbing Now! albums"
      3 date: 2018-12-04T16:28:00
      4 tags: ["Guides", "Linux", "Lists", "Music", "Servers", "Snippets", "Software"]
      5 ---
      6 
      7 Recently a collegue at work came to me to download them an album from online, unfortunately as it was a compilation album and the individual tracks had been released a million times already this wasn't to be released through the usual channels.
      8 
      9 No matter though, vague scripting to the rescue! The tracklist that I was after was available on the [now website](https://www.nowmusic.com/album/now-rock-n-roll/) which had no issues being scraped.
     10 
     11 ```
     12 source=$(wget https://www.nowmusic.com/album/now-rock-n-roll/ -qO-)
     13 artists=$(printf "$source" | grep artist | sed 's/^.*>\([^<]*\)<.*$/\1/')
     14 titles=$(printf "$source" | grep \"title\" | sed 's/^.*>\([^<]*\)<.*$/\1/')
     15 paste <(printf "$artists") <(printf "$titles") | sed -e 's/\t/ - /g' > parse_list.txt
     16 ```
     17 
     18 Now we have all 73 tracks in a single text file, no fuss, no muss.
     19 
     20 All of these tracks are incredibly likely to be uploaded to youtube, so we can grab them using the ever-excellent `youtube-dl`
     21 
     22 To manage this, we'll run a youtube search on every entry, and grab the resulting output, converting it to `mp3` along the way.
     23 
     24 ```
     25 while read line; do youtube-dl -x --audio-format=mp3 ytsearch:"$line lyrics"; done < parse_list.txt
     26 ```
     27 
     28 Please note, I append a " lyrics" in the search string to avoid too obvious music videos that sometimes have
     29 
     30 With this, we have 73 `mp3` files dumped into our working directory with messy filenames. I usually throw these into `beets` in singleton mode via docker to improve the quality of the filenames/tags.
     31 
     32 ```
     33 docker run -it -v $(pwd):/music linuxserver/beets bash
     34 beet im -s /music
     35 ```
     36 
     37 This will take some time, and will need a lot of nannying as there are no existing tags to work with initially. After the process however you'll be rewarded with tagged files ready to (rock 'n) roll.