May 13, 2016

Creating a supervision tree for Elixir GenEvent behavior

Creating a supervision tree for Elixir GenEvent behavior

One of the projects I am working right now is based on Event Sourcing and CQRS. Both concepts are not the core content of this post but I think it is nice add some context for better understanding.

One of the components of the application is an event bus or event manager, and for that, I am using the GenEvent behavior.

Into the context

In summary, the application works following the steps below:

  • the interface sends a command to the event manager;
  • a command handler picks up the command and sends a message to the model;
  • the model validates and persists an event;
  • the event is published in the event manager;
  • an event handler picks up the event and updates a projection.

So, basically we have one Event Manager and two Handlers, one for the commands and another one for the events. I will not go into details how GenEvent works in this post, but the Elixir GenEvent documentation is an excellent place to go if you have questions about it.

Assuming you know how GenEvent in general works, the following situations need to be considered in terms of fault tolerance:

  1. the Event Manager can fail and should restart;
  2. when the Event Manager fails and restarts, the Handlers need to be added back;
  3. when a Handler fails, it needs to be added back to the Event Manager.

Reacting to Event Manager failures

The most important part of the Let it Crash philosophy is, in a deterministic way, define what happens when something fails, due the fact that nothing will happen just magically.

Let's use a Supervisor in our application that will supervise my Event Manager (and later, the handlers as well). When my application starts it calls start_link/0 in MyApp.EventSupervisor:

defmodule MyApp do
  use Application

  def start(_type, _args) do
    import Supervisor.Spec, warn: false

    children = [
      supervisor(MyApp.EventSupervisor, [])
    ]

    opts = [strategy: :one_for_one, name: MyApp.Supervisor]
    Supervisor.start_link(children, opts)
  end

end

Here is my EventSupervisor module that will call start_link/0 in MyApp.EventManager:

defmodule MyApp.EventSupervisor do
  use Supervisor

  @server __MODULE__

  def start_link do
    Supervisor.start_link(@server, :ok, [name: @server])
  end

  def init(:ok) do
    children = [
      worker(MyApp.EventManager, [])
    ]

    supervise(children, strategy: :one_for_one)
  end

end

My Event Manager simply starts itself:

defmodule MyApp.EventManager do

  @server __MODULE__

  def start_link do
    GenEvent.start_link [{:name, @server}]
  end

  # code omitted 

end

In case my MyApp.EventManager crashes for any reason, MyApp.EventSupervisor will be notified and it knows how to restart the manager.

Dealing with Handlers failures

As the handlers can crash themselves but also are dependent on the event manager health, the supervision is more complex.

Let's have a third component, that we can call it a Watcher. This watcher will be responsible for add the handler back in the manager in case of a handler failure, and it crashes itself when the event manager crashes.

The handler watcher is a simple GenServer that monitors the event manager process and it knows how to add the handler in the manager. In my application, as I have two handlers I will have one watcher for each handler. Below is an example of one of them:

defmodule MyApp.CommandHandlerWatcher do
  use GenServer

  @server __MODULE__

  def start_link(event_manager) do
    GenServer.start_link(@server, event_manager, [name: @server])
  end

  def init(event_manager) do
    Process.monitor(event_manager)
    start_handler(event_manager)
  end

  @doc """
    Stops this watcher in a case of Event Manager goes down.
  """
  def handle_info({:DOWN, _, _, {MyApp.EventManager, _node}, _reason}, _from) do
    {:stop, "EventManager down.", []}
  end

  @doc """
    Handles EXIT messages from the GenEvent handler and restarts it.
  """
  def handle_info({:gen_event_EXIT, _handler, _reason}, event_manager) do
    {:ok, event_manager} = start_handler(event_manager)
    {:noreply, event_manager}
  end

  defp start_handler(event_manager) do
    case GenEvent.add_mon_handler(event_manager, MyApp.CommandHandler, []) do
     :ok -> {:ok, event_manager}
     {:error, reason} -> {:stop, reason}
    end
  end
end

The last missing piece is supervise the watchers, restarting them, as they will crash when the event manager crashes, as we defined that way. Let's add in our current EventSupervisor two more workers.

defmodule MyApp.EventSupervisor do

  alias MyApp.{EventManager, CommandHandlerWatcher, EventHandlerWatcher}

  # code omitted

  def init(:ok) do
    children = [
      worker(EventManager, []),
      worker(CommandHandlerWatcher, [EventManager]),
      worker(EventHandlerWatcher, [EventManager])
    ]

    supervise(children, strategy: :one_for_one)
  end

end

It is important to notice that the watchers add the handlers using GenEvent.add_mon_handler/3 and monitor the event manager with Process.monitor/1. These functions will make the watchers receive exit messages from both handler and manager, and it will be handled by handle_info/2 functions using pattern matching.