Page MenuHomePhabricator

[EPIC] FY 24/25 WE 4.3.4 Improve our existing tooling to allow quicker reaction times to ongoing attacks.
Open, Needs TriagePublic

Description

Rationale

While requestctl has been an integral part of our toolkit to respond to large-scale abuse, it’s clear now, from extended experience, that the process of creating a new requestctl action, while generally safe from error, it’s not intuitive nor fast for most SREs.

With this work, we aim to make requestctl easier to use, more powerful, and simpler to modify/release.

User stories

We have two categories of user stories, some are to improve the functionality of requestctl to make it a better tool to use under duresss, while others are more strictly related to its UX

  1. As an SRE, I want to be able to change a feature in conftool/requestctl and for it to be releasable automatically to our debian repository
  2. As an SRE, I want to be able to block requests for cached object, not just uncached ones, during an attack
  3. As an SRE, I want to be able to inject requestctl rules in the TLS termination layer at the edge (haproxy) to limit bandwidth usage and/or concurrency for specific users or request patterns
  4. As an SRE/Data engineer/developer, I want to have an easy way to visualize all requestctl rules currently active, the ones in log-only mode, and the ones that can be activated. I also want to be able to inspect the current or potential impact in turnilo/superset using a simple link.
  5. As an SRE, I want to be able to add a new rule following a simple guided procedure, securely and with confidence, in under 5 minutes
  6. As an SRE, I want to be able to add rules systematically from traffic filters I’ve created while analyzing traffic on superset / from a data pipeline
  7. Conversely, as an SRE, I want to be able to quickly check on superset what impact a rule I'm adding would have
  8. As an SRE, I want all the actions taken in requestctl to be auditable in the future.

What direction are we moving towards

For the engine improvements, we will need to change the integration of requestctl into varnish to allow acting on cache hits (T317794), and adding the ability to translate a rule from the abstract requestctl DSL to different softwares, like haproxy but also turnilo/superset.
We will create an HTTP API for requestctl and probably a web interface as well. It will mean that we'll get away from gitops for interacting with requestctl, meaning we'll need a different way to sync documents from CLI (which should remain operational under any circumstance). We will use this API to allow creating and managing rules from external sources.

Event Timeline

Change #1054082 had a related patch set uploaded (by Giuseppe Lavagetto; author: Giuseppe Lavagetto):

[operations/puppet@production] varnish: add support for hit rules

https://gerrit.wikimedia.org/r/1054082

Change #1054083 had a related patch set uploaded (by Giuseppe Lavagetto; author: Giuseppe Lavagetto):

[operations/puppet@production] varnish: actually include the requestctl hit rules

https://gerrit.wikimedia.org/r/1054083

Change #1054082 merged by Giuseppe Lavagetto:

[operations/puppet@production] varnish: add support for hit rules

https://gerrit.wikimedia.org/r/1054082

Change #1054083 merged by Giuseppe Lavagetto:

[operations/puppet@production] varnish: actually include the requestctl hit rules

https://gerrit.wikimedia.org/r/1054083