From edc506770093fb5745ab7db3c17379f1b8d5596a Mon Sep 17 00:00:00 2001 From: The Fox in the Shell Date: Sat, 15 Apr 2017 04:09:28 +0000 Subject: Avoid unrestricted access to the Mastodon DB (#5) * Use a limited, read-only view of public toots This is less scary than unrestricted access to the production database... * README: Update install process * README: Update SQL query * README: Update paragraph on direct DB access --- index.js | 10 +++---- install.sql | 25 ++++++++++++++++++ readme.md | 87 ++++++++++++++++++++++++++++++++++++++++++++++++++++--------- 3 files changed, 103 insertions(+), 19 deletions(-) create mode 100644 install.sql diff --git a/index.js b/index.js index 62f0ed9..8657372 100644 --- a/index.js +++ b/index.js @@ -2,20 +2,18 @@ var mastodon = require('mastodon'); var pg = require('pg'); var query = `SELECT id -FROM statuses +FROM public_toots WHERE favourites_count > ( SELECT avg(favourites_count) - FROM statuses + FROM public_toots WHERE favourites_count > 1 AND created_at > NOW() - INTERVAL '30 days' - AND visibility = 0 ) -AND created_at > NOW() - INTERVAL '5 days' -AND visibility = 0;` +AND created_at > NOW() - INTERVAL '5 days';` var config = { - user: process.env.DB_USER || 'mastodon', + user: process.env.DB_USER || 'ambassador', database: process.env.DB_NAME || 'mastodon_production', password: process.env.DB_PASSWORD || '', host: process.env.DB_HOST || '/var/run/postgresql', diff --git a/install.sql b/install.sql new file mode 100644 index 0000000..5fe7c82 --- /dev/null +++ b/install.sql @@ -0,0 +1,25 @@ +-- -*- mode: sql; sql-product: postgres -*- + +-- Create a login role for ambassador +CREATE USER ambassador; + +-- Use this if your deployment uses passwords rather than peer authentication: +-- ALTER ROLE mastodon_ambassador WITH PASSWORD 'something secret, hopefully'; +-- +-- Note that PostgreSQL supports setting “encrypted” (hashed) passwords, +-- which is a better option if the password must be stored in some configuration +-- management tool. + + +-- Now, create the view that ambassador actually uses +CREATE VIEW public_toots AS + SELECT * + FROM statuses + WHERE visibility = 0 +; + +-- Make sure the role doesn't have access to anything undesireable +REVOKE ALL FROM ambassador; + +-- Let ambassador select from the view +GRANT SELECT ON public_toots TO ambassador; diff --git a/readme.md b/readme.md index 39dc27e..88f883d 100644 --- a/readme.md +++ b/readme.md @@ -11,45 +11,96 @@ But there's no reason some instances couldn't have human ambassadors! Or couldn' This is my attempt to start that conversation, and I'd very much love all of your feedback! :) + ## Installation + +This bot has to be installed on your instance server, so unless you're the admin you're not going to be able to set it up yourself. The reason for that is that the bot reads directly from your database, rather than using the API. + + +### Mastodon account for the ambassador bot + First, you'll need to create a new account on your instance and use [the @tinysubversions extractor](http://tinysubversions.com/notes/mastodon-bot/) to get an OAuth token for it. -This bot has to be installed on your instance server, so unless you're the admin you're not going to be able to set it up yourself. The reason for that is that the bot reads directly from your database, rather than using the API. It requires the following environment variables (and uses the provided defaults when they're missing): +### Creating a UNIX user for ambassador + +Running ambassador as the same user as Mastodon (or worse, `root`) is +problematic, security-wise. Thankfully, creating a new user for it is easy: + + # sudo adduser ambassador + # sudo -u ambassador -s + [You should now be logged in as ambassador] + $ cd + +While you are logged-in as ambassador, you can clone the source repository for it: + + git clone git@github.com:mbilokonsky/ambassador + + +Note: Ideally, ambassador should not get R/W access to its own sources. + However, best practices for deploying software on servers falls outside + the scope of this README. + + +### Creating a database user for ambassador + +To avoid a whole host of security issues, `ambassador` only gets (read-only) +access to a *view* of the database that only contains public toots. + +So you have to start by creating the view and the PostgreSQL user for +`ambassador`: + + # pgsql -f ~ambassador/ambassador/install.sql + + +Note that the default setup assumes that: +- the UNIX and PostgreSQL users are both called `ambassador`; +- PostgreSQL allows [peer authentication]. + +If that's not the case, you can always use password authentication instead. + +[peer authentication]: https://www.postgresql.org/docs/9.6/static/auth-methods.html#AUTH-PEER + + +### Running ambassador + +It requires the following environment variables +(and uses the provided defaults when they're missing): ``` DB_HOST (defaults to '/var/run/postgresql') - DB_NAME (defaults to 'mastodon\_production') - DB_USER (defaults to 'mastodon') + DB_NAME (defaults to 'mastodon_production') + DB_USER (defaults to 'ambassador') DB_PASSWORD (defaults to '') INSTANCE_HOST (no default, host of your instance) AMBASSADOR_TOKEN (no default) ``` -To install it, set your environment variables and do the following: +As user `ambassador`, set the environment variables and do the following: + + cd ~/ambassador + yarn && yarn start - git clone git@github.com:mbilokonsky/ambassador - cd ambassador - yarn && yarn start It'll cycle every 15 minutes, boosting new toots that have crossed the threshold. It keeps track, in memory, of which toots have already been boosted - that way it won't spam the server trying to boost them again and again. This is a very naive cache and technically a memory leak, so I'll fix that soon, but for now it's fine (and pm2 should gracefully restart in the event of a crash). + ## How does it determine what's good enough to boost? + So, this is still sort of an open question but right now I'm using the following query: ``` SELECT id -FROM statuses +FROM public_toots WHERE favourites_count > ( SELECT avg(favourites_count) - FROM statuses + FROM public_toots WHERE favourites_count > 1 AND created_at > NOW() - INTERVAL '30 days' - AND visibility = 0 ) -AND created_at > NOW() - INTERVAL '5 days' -AND visibility = 0; +AND created_at > NOW() - INTERVAL '5 days'; ``` + So we do two things here: 1. Compute our fav threshold. Grab all public toots that have received more than 1 fav over the past 30 days. Average the fav counts for those toots. This is our threshold. @@ -57,12 +108,22 @@ So we do two things here: Goal here is that this sets a pretty high bar (favs over 30 days) and applies it to only the past 5 days. It's an aggressive filter, but it's also a sliding window. If you have a bunch of super popular toots on your instance, they'll skew the curve - but only for a month or so, and this will be normalized if you have a lot of activity. Generally things will even out over time. +I am not, btw, a database expert - I pieced this query together through trial-and-error and if you want to propose an optimization I am all ears. + + ## Seriously? You want me to give this thing access to my production database? -Look, I get it - but how else do you want me to find your top toot in a performant way? I'm not passing any user input into the database, just repeating a static query. I am not, btw, a database expert - I pieced this query together through trial-and-error and if you want to propose an optimization I am all ears. + +Ambassador only gets access to public toots (through the `public_toots` view +created in `install.sql`). + +The direct database access is required to find top toots efficiently. + ## What's next? Can I help? + I'd love it if I could get some eyes on this - am I SQLing right? How do y'all feel about that threshold function? Are there security issues here? + ## Where is it running? 1. A Weirder Earth! -- cgit