diff options
author | Darius Kazemi <darius.kazemi@gmail.com> | 2019-08-03 10:11:09 -0700 |
---|---|---|
committer | multiple creatures <dev@multiple-creature.party> | 2020-02-21 01:25:39 -0600 |
commit | e6e69f091e3414b29271040926cc1d2e7c5f0e41 (patch) | |
tree | cbd46c9e52eb68bd4eb341fdc55bd265a7890212 | |
parent | 855de12844c9367e0754acd8f32bcc42996f419c (diff) |
Add option to exclude suspended domains/subdomains from tootctl domains crawl (#11454)
* Add "--exclude-suspended" to tootctl domains crawl This new option ignores any instances suspended server-wide as well as their associated subdomains. This queries all domain blocks up front, then runs a regexp on each domain. This improves performance over what may be the obvious implementation, which is to ask `DomainBlocks.blocked?(domain)` for each domain -- this hits the DB many times, slowing things down considerably. * cleaning up code style * Compiling regex * Removing ternary operator
-rw-r--r-- | lib/mastodon/domains_cli.rb | 18 |
1 files changed, 13 insertions, 5 deletions
diff --git a/lib/mastodon/domains_cli.rb b/lib/mastodon/domains_cli.rb index f30062363..17cafd1bc 100644 --- a/lib/mastodon/domains_cli.rb +++ b/lib/mastodon/domains_cli.rb @@ -58,6 +58,7 @@ module Mastodon option :concurrency, type: :numeric, default: 50, aliases: [:c] option :silent, type: :boolean, default: false, aliases: [:s] option :format, type: :string, default: 'summary', aliases: [:f] + option :exclude_suspended, type: :boolean, default: false, aliases: [:x] desc 'crawl [START]', 'Crawl all known peers, optionally beginning at START' long_desc <<-LONG_DESC Crawl the fediverse by using the Mastodon REST API endpoints that expose @@ -74,18 +75,25 @@ module Mastodon default (`summary`), a summary of the statistics is returned. The other options are `domains`, which returns a newline-delimited list of all discovered peers, and `json`, which dumps all the aggregated data raw. + + The --exclude-suspended (-x) option means that domains that are suspended + instance-wide do not appear in the output and are not included in summaries. + This also excludes subdomains of any of those domains. LONG_DESC def crawl(start = nil) - stats = Concurrent::Hash.new - processed = Concurrent::AtomicFixnum.new(0) - failed = Concurrent::AtomicFixnum.new(0) - start_at = Time.now.to_f - seed = start ? [start] : Account.remote.domains + stats = Concurrent::Hash.new + processed = Concurrent::AtomicFixnum.new(0) + failed = Concurrent::AtomicFixnum.new(0) + start_at = Time.now.to_f + seed = start ? [start] : Account.remote.domains + blocked_domains = Regexp.new('\\.?' + DomainBlock.where(severity: 1).pluck(:domain).join('|') + '$') pool = Concurrent::ThreadPoolExecutor.new(min_threads: 0, max_threads: options[:concurrency], idletime: 10, auto_terminate: true, max_queue: 0) work_unit = ->(domain) do next if stats.key?(domain) + next if options[:exclude_suspended] && domain.match(blocked_domains) + stats[domain] = nil processed.increment |