Simple distributed rate limiter with Elixir
While developing Elixir web apps you’ll find there are some activities or requests that can be way more expensive than others, something like calling an external API, sending a password reset email, a CPU intensive task or any other that fits your business model.
This is where a rate limiter comes handy, most web apps have a centralized rate limiter using technologies like Redis, Memcached or some algorithms such as Token Bucket or Generic cell rate algorithm, but we’re using Elixir and with it we have different posibilities:
Distributed Elixir
Let’s make it clear, stateful distributed systems are hard to implement and debug, even with the BEAM, which give us a set of tools to make it way easier. Since Elixir runs on top of it we can take advantage of these and try to get some distributed functionality without the need of external technologies or different languages.
Currently there are several hex packages that make Elixir nodes clustering eaiser, for our implementation we’ll use libcluster, which support different strategies like empd (Erlang Port Mapper Daemon), Multicast UDP gossip or Kubernetes.
Once we solve our clustering strategy we’ll need a way to run a distributed rate limiter, there are some hex packages out there such as Hammer, which can only handle a local data storage or use Redis, neither is what we’re looking for so we’ll create our own solution. As the title says, we’ll keep it simple.
Helper Libraries
Having a cluster of nodes conected is just part of running our distributed system, but what about net split healing? dynamic cluster size? supervised distributed processes? Fortunately there are some hex packages to help with this:
Swarm, which is probably the most famous in the Elixir community, and the newcomer Horde which is inspired by Swarm but makes it easier to maintain a separation between a global supervisor from a global registry and is based on delta-CRDTs (conflict-free replicated data types).
For our implementation we’ll use Horde since it makes it easier (in my opinion) to know what’s going on, the drawback is that it’s less mature than Swarm, though it looks very promising.
The implementation
Let’s get started, assuming you have Elixir > 1.5 already installed create a new application:
Open the project folder on your favorite text editor and add our dependencies:
Install them with mix deps.get
and we’re all set to start our
implementation.
Lets start by configuring libcluster for our development environment (for production and test environments refer to the package documentation), for simplicity we’ll use the epmd strategy and nodes with fixed names:
Since we’re using Elixir > 1.5, I’ll separate my main supervisor from my application file, let’s update it:
And now lets create the supervisor.ex
file:
A lot is going on here, first we load the topologies configuration
from our config file if one exists, if found we’ll start our
cluster supervisor (part of libcluster) with the name of:
Limiter.ClusterSupervisor
.
Then we’ll start our global supervisor and our global registry (part of Horde),
with names Limiter.GlobalSupervisor
and Limiter.GlobalRegistry
respectively.
Finally we’ll start a Task
that calls two function from the
Limiter.HordeConnector
module, which we’ll create next.
After we have the supervisor’s children set we can call the init
function.
Now let’s create the horde_connector.ex
file:
Here we declare two functions, the first one connect/0
connects every GlobalSupervisor
and GlobalRegistry
when the app starts, with the ones on other nodes of the cluster.
For more info on this, check Horde’s documentation.
The second one start_children/0
is where we start the processes we want our
GlobalSupervisor
to supervise, in this case just our Limiter.RateLimiter
module, which will be a GenServer:
As you can see, when calling the start_link/1
function, we register our
GenServer with a name returned by the via_tuple/0
private function, this is
how Horde registers the process in our GlobalRegistry
.
In the module attributes we can see that there’s a limit of 2 every 60 seconds.
Our storage happens on an ETS table with name :rate_limiter
which is
cleared after the given number of seconds. All handled by GenServer’s messaging,
nothing new here.
The main functionality is provided by the log/1
function, which receives a
unique key to identify who is logging to the table, you can use a user’s ID or
an IP address, etc. for our implementation we’ll use the IP address.
In this function, first we need to know the PID of our RateLimiter
process,
which owns the ETS table, for this we can use the lookup/2
function
provided by Horde.
If a PID is returned we can make an RPC (Remote Procedure Call) after knowing which node from our cluster owns the process.
The get_log/1
function will be called on the node that owns the GenServer and
ETS, returning :ok
or {:error, :rate_limited}
if we try to log 3 or more
times in 60 seconds.
Let’s try our app now, on 3 different shell sessions run:
This will start and connect our 3 nodes with each other. You can use Erlang’s
observer :observer.start()
to see there’s only one Limiter.RateLimiter
process running across
the cluster, if you kill or gracefully terminate the owner node, you’ll se
another node will immediately start a new one thanks to our GlobalSupervisor
.
The next figure shows the functionality we can expect: we successfully call the
log/1
function and the third time we get an error, after the table is cleared
we can try again, receiving the same behavior.
Notice we don’t have to worry where our rate limiter was started, we can call the
log/1
function from any node without caring where in the cluster the ETS lives.
We now have a simple rate limiter working. If we use Phoenix we can limit some actions like this:
And we’re done with our simple rate limiter example, a working app can be found in my Github.
For a better implementation of a pure Elixir rate limiter, check out my library Limitex.
Conclusion
This experiment shows us that Elixir, most times, can help us to get solutions without requiring external dependencies or writing an implementation on another language.
This rate limiter is obviously very basic and has its caveats, for example if there’s a topology change, Horde can termiante and restart our GenServer and ETS on another node, losing its state. For basic functionality where you can afford to lose state from time to time, this solution can work. However if you can’t afford this, maybe Redis or another technology is what you’re looking for.
Also keep in mind that having our rate limiter on a remote node will add latency to our request. As always pros and cons should be evaluated according to our business model.
If you have any observation or something to add please don’t hesitate and leave a comment below.