summaryrefslogtreecommitdiffstats
diff options
context:
space:
mode:
authorbreadcat2020-09-30 08:59:47 +0100
committerbreadcat2020-09-30 08:59:47 +0100
commit4088ed4c88d34c3e449e90479d965645e586bb3d (patch)
treee247f7c28fb20540a136d4a835e237e77a2bd935
parent6d66953caa4631e560aec1f94c8e3677f40f20cf (diff)
downloadblog.minskio.co.uk-4088ed4c88d34c3e449e90479d965645e586bb3d.tar.gz
blog.minskio.co.uk-4088ed4c88d34c3e449e90479d965645e586bb3d.tar.bz2
blog.minskio.co.uk-4088ed4c88d34c3e449e90479d965645e586bb3d.zip
Slight script update
-rw-r--r--content/posts/formatting-dumped-subtitles.md12
1 files changed, 7 insertions, 5 deletions
diff --git a/content/posts/formatting-dumped-subtitles.md b/content/posts/formatting-dumped-subtitles.md
index 1d00c4a..9a48b5e 100644
--- a/content/posts/formatting-dumped-subtitles.md
+++ b/content/posts/formatting-dumped-subtitles.md
@@ -8,11 +8,12 @@ tags : [ "Formats", "Languages", "Linux", "Media", "Snippets", "Software", ]
As per my previous post, you should now have a single `srt` subtitle file, to convert this into a single word list that you can begin translating away at, you can run the below verbose script.
```
-tr ' ' '\n' < subs.srt \
- sed -e 's/<[^>]*>//g' \
- tr '[:upper:]' '[:lower:]' \
- tr -d '\>\/!-.:?,.\",[:digit:]' \
- sed -e '/^[[:space:]]*$/d' -re 's/\s+$//' -re 's/\{...\}//' \
+tr ' ' '\n' < subs.srt | \
+ sed -e 's/<[^>]*>//g' | \
+ tr '[:upper:]' '[:lower:]' | \
+ tr -d '\>\/!-.:?,.\",[:digit:]' | \
+ tr -d '…' | \
+ sed -e '/^[[:space:]]*$/d' -re 's/\s+$//' -re 's/\{...\}//' | \
sort -u > subs-sort.srt
```
@@ -22,4 +23,5 @@ One issue I've noticed is some _special_ characters won't be converted to lowerc
<pre><code>tr '&AElig;&Oslash;&Aring;&Auml;&Ouml;&ETH;&THORN;&Aacute;&Eacute;&Iacute;&Oacute;&Uacute;&Yacute;' '&aelig;&oslash;&aring;&auml;&ouml;&eth;&thorn;&aacute;&eacute;&iacute;&oacute;&uacute;&yacute;'</pre></code>
+* **Edit 2020-09-23:** Added elipses removal, fixed pipes
* **Edit 2020-07-05:** Added {\an} tag removal