about summary refs log tree commit diff
path: root/app/chewy
diff options
context:
space:
mode:
authorEugen Rochko <eugen@zeonfederated.com>2019-08-16 01:24:03 +0200
committerGitHub <noreply@github.com>2019-08-16 01:24:03 +0200
commit8fdff2748ffbc4ce3d36c75f2bd33182b641e895 (patch)
treeec8d1455c492e9b454ddf55dcb914bbe37007d6b /app/chewy
parent2ca6b2bb6c9e811ad98e3df23e70efbf22882e42 (diff)
Add more accurate account search (#11537)
* Add more accurate account search

When ElasticSearch is available, a more accurate search is implemented:

- Using edge n-gram index for acct and display name
- Using asciifolding and cjk width normalization on display names
- Using Gaussian decay on account activity for additional scoring (recency)
- Using followers/friends ratio for additional scoring (spamminess)
- Using followers number for additional scoring (size)

The exact match precedence only takes effect when the input conforms
to the username format and the username part of it is complete, i.e.
when the user started typing the domain part.

* Support single-letter usernames

* Fix tests

* Fix not picking up account updates

* Add weights and normalization for scores, skip zero terms queries

* Use local counts for accounts index, adjust search parameters

* Fix mistakes

* Using updated_at of accounts is inadequate for remote accounts
Diffstat (limited to 'app/chewy')
-rw-r--r--app/chewy/accounts_index.rb36
1 files changed, 36 insertions, 0 deletions
diff --git a/app/chewy/accounts_index.rb b/app/chewy/accounts_index.rb
new file mode 100644
index 000000000..e11b80039
--- /dev/null
+++ b/app/chewy/accounts_index.rb
@@ -0,0 +1,36 @@
+# frozen_string_literal: true
+
+class AccountsIndex < Chewy::Index
+  settings index: { refresh_interval: '5m' }, analysis: {
+    analyzer: {
+      content: {
+        tokenizer: 'whitespace',
+        filter: %w(lowercase asciifolding cjk_width),
+      },
+
+      edge_ngram: {
+        tokenizer: 'edge_ngram',
+        filter: %w(lowercase asciifolding cjk_width),
+      },
+    },
+
+    tokenizer: {
+      edge_ngram: {
+        type: 'edge_ngram',
+        min_gram: 1,
+        max_gram: 15,
+      },
+    },
+  }
+
+  define_type ::Account.searchable.includes(:account_stat), delete_if: ->(account) { account.destroyed? || !account.searchable? } do
+    root date_detection: false do
+      field :id, type: 'long'
+      field :display_name, type: 'text', analyzer: 'edge_ngram', search_analyzer: 'content'
+      field :acct, type: 'text', analyzer: 'edge_ngram', search_analyzer: 'content', value: ->(account) { [account.username, account.domain].compact.join('@') }
+      field :following_count, type: 'long', value: ->(account) { account.active_relationships.count }
+      field :followers_count, type: 'long', value: ->(account) { account.passive_relationships.count }
+      field :last_status_at, type: 'date', value: ->(account) { account.last_status_at || account.created_at }
+    end
+  end
+end