diff options
author | breadcat | 2020-07-05 18:00:47 +0100 |
---|---|---|
committer | breadcat | 2020-07-05 18:00:47 +0100 |
commit | 93f7170fffae046ba492c161c88a30ed77be2a08 (patch) | |
tree | 7139cecf612cad63108f4b3945d8d1e830b9219c /content | |
parent | bc61ec416abc5f083234f46496105ecb9828c9f2 (diff) | |
download | blog.minskio.co.uk-93f7170fffae046ba492c161c88a30ed77be2a08.tar.gz blog.minskio.co.uk-93f7170fffae046ba492c161c88a30ed77be2a08.tar.bz2 blog.minskio.co.uk-93f7170fffae046ba492c161c88a30ed77be2a08.zip |
Update to remove extra tags
Diffstat (limited to 'content')
-rw-r--r-- | content/posts/formatting-dumped-subtitles.md | 7 |
1 files changed, 5 insertions, 2 deletions
diff --git a/content/posts/formatting-dumped-subtitles.md b/content/posts/formatting-dumped-subtitles.md index 4250b8d..1d00c4a 100644 --- a/content/posts/formatting-dumped-subtitles.md +++ b/content/posts/formatting-dumped-subtitles.md @@ -1,6 +1,7 @@ --- title: "Formatting dumped subtitles into a vocabulary list" date: 2020-05-28T16:52:00 +lastmod: 2020-07-05T17:59:00 tags : [ "Formats", "Languages", "Linux", "Media", "Snippets", "Software", ] --- @@ -11,7 +12,7 @@ tr ' ' '\n' < subs.srt \ sed -e 's/<[^>]*>//g' \ tr '[:upper:]' '[:lower:]' \ tr -d '\>\/!-.:?,.\",[:digit:]' \ - sed -e '/^[[:space:]]*$/d' -re 's/\s+$//' \ + sed -e '/^[[:space:]]*$/d' -re 's/\s+$//' -re 's/\{...\}//' \ sort -u > subs-sort.srt ``` @@ -19,4 +20,6 @@ In short, this will break all spaces into new lines, remove HTML tags, make ever One issue I've noticed is some _special_ characters won't be converted to lowercase Å to å for example. I don't have an automated workaround for you aside from specifying the letters individually for example using: -<pre><code>tr 'ÆØÅÄÖÐÞÁÉÍÓÚÝ' 'æøåäöðþáéíóúý'</pre></code>
\ No newline at end of file +<pre><code>tr 'ÆØÅÄÖÐÞÁÉÍÓÚÝ' 'æøåäöðþáéíóúý'</pre></code> + +* **Edit 2020-07-05:** Added {\an} tag removal |