about summary refs log tree commit diff
diff options
context:
space:
mode:
authorDarius Kazemi <darius.kazemi@gmail.com>2019-08-03 10:11:09 -0700
committermultiple creatures <dev@multiple-creature.party>2020-02-21 01:25:39 -0600
commite6e69f091e3414b29271040926cc1d2e7c5f0e41 (patch)
treecbd46c9e52eb68bd4eb341fdc55bd265a7890212
parent855de12844c9367e0754acd8f32bcc42996f419c (diff)
Add option to exclude suspended domains/subdomains from tootctl domains crawl (#11454)
* Add "--exclude-suspended" to tootctl domains crawl

This new option ignores any instances suspended server-wide as
well as their associated subdomains. This queries all domain
blocks up front, then runs a regexp on each domain. This improves
performance over what may be the obvious implementation, which is
to ask `DomainBlocks.blocked?(domain)` for each domain -- this
hits the DB many times, slowing things down considerably.

* cleaning up code style

* Compiling regex

* Removing ternary operator
-rw-r--r--lib/mastodon/domains_cli.rb18
1 files changed, 13 insertions, 5 deletions
diff --git a/lib/mastodon/domains_cli.rb b/lib/mastodon/domains_cli.rb
index f30062363..17cafd1bc 100644
--- a/lib/mastodon/domains_cli.rb
+++ b/lib/mastodon/domains_cli.rb
@@ -58,6 +58,7 @@ module Mastodon
     option :concurrency, type: :numeric, default: 50, aliases: [:c]
     option :silent, type: :boolean, default: false, aliases: [:s]
     option :format, type: :string, default: 'summary', aliases: [:f]
+    option :exclude_suspended, type: :boolean, default: false, aliases: [:x]
     desc 'crawl [START]', 'Crawl all known peers, optionally beginning at START'
     long_desc <<-LONG_DESC
       Crawl the fediverse by using the Mastodon REST API endpoints that expose
@@ -74,18 +75,25 @@ module Mastodon
       default (`summary`), a summary of the statistics is returned. The other options
       are `domains`, which returns a newline-delimited list of all discovered peers,
       and `json`, which dumps all the aggregated data raw.
+
+      The --exclude-suspended (-x) option means that domains that are suspended
+      instance-wide do not appear in the output and are not included in summaries.
+      This also excludes subdomains of any of those domains.
     LONG_DESC
     def crawl(start = nil)
-      stats     = Concurrent::Hash.new
-      processed = Concurrent::AtomicFixnum.new(0)
-      failed    = Concurrent::AtomicFixnum.new(0)
-      start_at  = Time.now.to_f
-      seed      = start ? [start] : Account.remote.domains
+      stats           = Concurrent::Hash.new
+      processed       = Concurrent::AtomicFixnum.new(0)
+      failed          = Concurrent::AtomicFixnum.new(0)
+      start_at        = Time.now.to_f
+      seed            = start ? [start] : Account.remote.domains
+      blocked_domains = Regexp.new('\\.?' + DomainBlock.where(severity: 1).pluck(:domain).join('|') + '$')
 
       pool = Concurrent::ThreadPoolExecutor.new(min_threads: 0, max_threads: options[:concurrency], idletime: 10, auto_terminate: true, max_queue: 0)
 
       work_unit = ->(domain) do
         next if stats.key?(domain)
+        next if options[:exclude_suspended] && domain.match(blocked_domains)
+
         stats[domain] = nil
         processed.increment