November 1, 2019

The Changeset API Pattern

The focus is on enhancing data integrity in your application by adding explicitness between your API and your schema changesets.

The Changeset API Pattern

Over time, as you gain overall experience with software development, you start noticing some paths that can lead to much more smooth sailing. Those are called design patterns, formalized best practices that can be used to solve common problems when implementing a system.

One of these patterns that I am having great success while working on web applications in Elixir, is what I am calling, for the lack of a better name, the Changeset API Pattern.

Before I start with the pattern itself, I'd like to outline some information that I consider as the motivation behind the usage, and it is called Data Integrity.

Data integrity is the maintenance of, and the assurance of the accuracy and consistency of, data over its entire life-cycle, and is a critical aspect to the design, implementation and usage of any system which stores, processes, or retrieves data.
The overall intent of any data integrity technique is the same: ensure data is recorded exactly as intended. In short, data integrity aims to prevent unintentional changes to information. Data integrity is not to be confused with data security, the discipline of protecting data from unauthorized parties.

Overall Goal

Facilitate the data integrity main goal, ensure that data is recorded exactly as intended.

The Changeset API Pattern is not the sole responsible for achieving this goal, however, once used in conjunction with some data modeling best practices such as column types and constraints, default values and so on, the pattern will become an important application layer on top of an already established data layer, aiming for an overall better data integrity.

Database Data Integrity

As mentioned above, having good database specifications will facilitate data integrity. In Elixir, this is commonly achievable through Ecto, the most common component to interact with application data stores, through Ecto Migration DSL:

defmodule Core.Repo.Migrations.CreateUsersTable do
  use Ecto.Migration

  def change do
    create table(:users) do
      add :company_id, references(:companies, type: :binary_id), null: false
      add :first_name, :string, null: false
      add :last_name, :string, null: false
      add :email, :string, null: false
      add :age, :integer, null: false
      timestamps()
    end

    create index(:users, [:email], unique: true)
    create constraint(:users, :age_must_be_positive, check: "version > 0")
  end
end

In the migration above we are specifying:

  • column data types;
  • columns can't have null values;
  • company_id is a foreign key;
  • email column is unique;
  • age has to be greater than zero.

Depending on your datastore and column type you can apply a variety of data constraints to fulfill your needs. Ideally, the specifications defined in the migration should align with your Ecto Schema and generic changeset:

defmodule Core.User do
  use Ecto.Schema
  import Ecto.Changeset
  
  alias Core.Company

  @primary_key {:id, :binary_id, autogenerate: true}
  @timestamps_opts [type: :utc_datetime]
  schema "users" do
    belongs_to(:company, Company, type: :binary_id)
    field(:first_name, :string)
    field(:last_name, :string)
    field(:email, :string)
    field(:age, :integer)
    timestamps()
  end
  
  @required_fields ~w(company_id first_name last_name email age)a
  
  def changeset(struct, params) do
    struct
    |> cast(params, @required_fields)
    |> validate_required(@required_fields)
    |> validate_number(:age, greater_than: 0)
    |> unique_constraint(:email)
    |> assoc_constraint(:company)
  end
end

Those should be considered your main gate in terms of data integrity as it is ensuring data only will be stored if all checks pass. From there you can have other layers on top, for example, the Changeset API Pattern.

The Changeset API Pattern

Once you have a good foundation, it is time to tackle your application API scenarios regarding data integrity. While a generic changeset, as above, is sufficient to ensure that the data integrity matches what is defined in the database in a general sense (all inserts and all updates), usually not all changes are equal from the application standpoint.

The Problem

For example, let's assume that besides the existing columns in the users table example  above, we also have a column called encrypted_password for user authentication. In our application, we have the following endpoints in our API that modify data:

  • Register User;
  • Update User Profile;
  • Change User Password.

Having a generic changeset in our schema will allow all these three operations to happen as desired, however, it opens some data integrity concerns for the two update operations:

  • While updating my first name as part of Update User Profile flow, I also can change my password;
  • While changing my password as part of Change User Password flow, I can update my age.

As long as the fields are conforming with the generic changeset validations, these unexpected changes will be allowed. You can remedy this behavior by applying filters in your API or your controller, however, this will become brittle once your application evolves. Other than that, Ecto.Schema and Ecto.Changeset modules provide lots of functions for field validation, casting and database constraint checks, not leveraging them would require lots of code duplication, at least in terms of functionality.

The Solution

The Changeset API Pattern states that:

For each API operation that modifies data, a specific Ecto Changeset is implemented, making it explicit the desired changes and all validations to be performed.

Instead of a generic changeset, we will implement three changesets with a very clear combination for cast, validation and database constraint checks.

Register User Changeset

defmodule Core.User do
  # Code removed

  schema "users" do
    # Code removed
    field(:hashed_password, :string)
    # Code removed
  end

  @register_fields ~w(company_id first_name last_name email age hashed_password)a

  def register_changeset(struct, params) do
    struct
    |> cast(params, @register_fields)
    |> validate_required(@register_fields)
    |> validate_number(:age, greater_than: 0)
    |> unique_constraint(:email)
    |> assoc_constraint(:company)
  end
end

Update User Profile Changeset

defmodule Core.User do
  # Code removed

  @update_profile_fields ~w(first_name last_name email age)a

  def update_profile_changeset(struct, params) do
    struct
    |> cast(params, @update_profile_fields)
    |> validate_required(@update_profile_fields)
    |> validate_number(:age, greater_than: 0)
    |> unique_constraint(:email)
  end
  
  # Code removed
end

Change User Password Changeset

defmodule Core.User do
  # Code removed

  @change_password_fields ~w(hashed_password)a

  def change_password_changeset(struct, params) do
    struct
    |> cast(params, @change_password_fields)
    |> validate_required(@change_password_fields)
  end
end

In your API functions, even if extra data comes in, you are safe because the intent and output expectation of each operation is already defined in the closest point to the data store interaction from the application standpoint, in our case, in the schema definition module.

Caveat

One thing that I noticed when I started implementing this pattern is the fact that sometimes I was doing a little more than my initial intent within the changeset functions.

Instead of performing the data type casting, validations and database checks, in a few cases, I was also setting the field value. For the sake of illustration only but it can be anything along these lines, let's take an example of a user schema, that has a column verified_at that is nullable when the user is registered, but it will store the date and time the user was verified.

The changeset for this operation would only allow verified_at field to be cast with the proper data type, but beyond that, the current date and time were set in the changeset using Ecto.Changeset.put_change/3.

Instead, what should be done is to delegate to the API the responsibility to set the value for verified_at, that value would be later validated in the changeset as any other update.

Another common example is encrypting the plain text password (defined as a virtual field) during user registration or password change inside the schema module. The schema should not need to know about encryption hashing libraries, modules or functions, and that should be delegated to the API functions.

There is nothing wrong with Ecto.Changeset.put_change/3, in some cases it makes sense to use it, for values that can't come through the API for any reason, if you need a mapping between the value sent via API and your datastore, or if you need to nullify a field.

Advantages

  • pushes data integrity concerns upfront in the development process;
  • protects the schema against unexpected data updates;
  • adds explicitness for allowed data changes and checks to be performed per use-case;
  • complements the commonly present data integrity checks in schema modules with use-cases checks;
  • leverages Ecto.Schema and Ecto.Changeset functions for better data integrity overall;
  • concentrate all data integrity checks in one single place, and in the best place, the schema module;
  • simplifies data changes testing per use-case;
  • simplifies input data handling in the API functions or controller actions.

Disadvantages

  • adds extra complexity in the schema modules;
  • can mislead to handle more than data integrity in the schema modules, as mentioned in the caveats session.

When the pattern is not needed

Even this pattern presents itself to me as a great way to achieve better data integrity, there is one scenario that I find myself skipping it:

  • usually, the entity (model) is much simpler;
  • the API only provides two types of change (create and a generic update);
  • both create and update require same data integrity checks.

Conclusion

Data is a very important asset in any software application and data integrity is a critical component to achieve data quality. The benefits of using this pattern so far are giving me much more reliability and control regarding the data handled by my applications nowadays. Other than that, it is making me think ahead in the development process regarding how I structure the data and how the application interacts with them.