about summary refs log tree commit diff
path: root/app/lib/hashtag_normalizer.rb
diff options
context:
space:
mode:
authorClaire <claire.github-309c@sitedethib.com>2022-07-17 22:07:20 +0200
committerClaire <claire.github-309c@sitedethib.com>2022-07-17 22:07:20 +0200
commitcd87d7dcef814ad86fb15334680cb0e3232437a9 (patch)
tree63db8838568ea440bb3cb9797cdbaf5c4952e9e7 /app/lib/hashtag_normalizer.rb
parent9094c2f52c24e1c00b594e7c11cd00e4a07eb431 (diff)
parentc3f0621a59a74d0e20e6db6170894871c48e8f0f (diff)
Merge branch 'main' into glitch-soc/merge-upstream
- `.env.production.sample`:
  Our sample config file is very different from upstream since it is much more
  complete. Upstream added documentation for a few env variables.
  Copied the new variables/documentation from upstream.
- `app/lib/feed_manager.rb`:
  Upstream added a timeline type (hashtags), while glitch-soc already had an
  extra one (direct messages). Not really a conflict but textually close
  changes.
  Ported upstream's changes.
- `app/models/custom_emoji.rb`:
  Upstream upped the custom emoji size limit, while glitch-soc had configurable
  limits.
  Upped the default limits accordingly.
- `streaming/index.js`:
  Upstream reworked how hastags were normalized. Minor conflict due to
  glitch-soc's handling of instance-local posts.
  Ported upstream's changes.
Diffstat (limited to 'app/lib/hashtag_normalizer.rb')
-rw-r--r--app/lib/hashtag_normalizer.rb25
1 files changed, 25 insertions, 0 deletions
diff --git a/app/lib/hashtag_normalizer.rb b/app/lib/hashtag_normalizer.rb
new file mode 100644
index 000000000..c1f99e163
--- /dev/null
+++ b/app/lib/hashtag_normalizer.rb
@@ -0,0 +1,25 @@
+# frozen_string_literal: true
+
+class HashtagNormalizer
+  def normalize(str)
+    remove_invalid_characters(ascii_folding(lowercase(cjk_width(str))))
+  end
+
+  private
+
+  def remove_invalid_characters(str)
+    str.gsub(/[^[:alnum:]#{Tag::HASHTAG_SEPARATORS}]/, '')
+  end
+
+  def ascii_folding(str)
+    ASCIIFolding.new.fold(str)
+  end
+
+  def lowercase(str)
+    str.mb_chars.downcase.to_s
+  end
+
+  def cjk_width(str)
+    str.unicode_normalize(:nfkc)
+  end
+end