diff options
author | breadcat | 2020-09-30 08:59:47 +0100 |
---|---|---|
committer | breadcat | 2020-09-30 08:59:47 +0100 |
commit | 4088ed4c88d34c3e449e90479d965645e586bb3d (patch) | |
tree | e247f7c28fb20540a136d4a835e237e77a2bd935 | |
parent | 6d66953caa4631e560aec1f94c8e3677f40f20cf (diff) | |
download | blog.minskio.co.uk-4088ed4c88d34c3e449e90479d965645e586bb3d.tar.gz blog.minskio.co.uk-4088ed4c88d34c3e449e90479d965645e586bb3d.tar.bz2 blog.minskio.co.uk-4088ed4c88d34c3e449e90479d965645e586bb3d.zip |
Slight script update
-rw-r--r-- | content/posts/formatting-dumped-subtitles.md | 12 |
1 files changed, 7 insertions, 5 deletions
diff --git a/content/posts/formatting-dumped-subtitles.md b/content/posts/formatting-dumped-subtitles.md index 1d00c4a..9a48b5e 100644 --- a/content/posts/formatting-dumped-subtitles.md +++ b/content/posts/formatting-dumped-subtitles.md @@ -8,11 +8,12 @@ tags : [ "Formats", "Languages", "Linux", "Media", "Snippets", "Software", ] As per my previous post, you should now have a single `srt` subtitle file, to convert this into a single word list that you can begin translating away at, you can run the below verbose script. ``` -tr ' ' '\n' < subs.srt \ - sed -e 's/<[^>]*>//g' \ - tr '[:upper:]' '[:lower:]' \ - tr -d '\>\/!-.:?,.\",[:digit:]' \ - sed -e '/^[[:space:]]*$/d' -re 's/\s+$//' -re 's/\{...\}//' \ +tr ' ' '\n' < subs.srt | \ + sed -e 's/<[^>]*>//g' | \ + tr '[:upper:]' '[:lower:]' | \ + tr -d '\>\/!-.:?,.\",[:digit:]' | \ + tr -d '…' | \ + sed -e '/^[[:space:]]*$/d' -re 's/\s+$//' -re 's/\{...\}//' | \ sort -u > subs-sort.srt ``` @@ -22,4 +23,5 @@ One issue I've noticed is some _special_ characters won't be converted to lowerc <pre><code>tr 'ÆØÅÄÖÐÞÁÉÍÓÚÝ' 'æøåäöðþáéíóúý'</pre></code> +* **Edit 2020-09-23:** Added elipses removal, fixed pipes * **Edit 2020-07-05:** Added {\an} tag removal |