Photo by Nick Smith on Unsplash
Removing Nullables: A Journey to a Cleaner GraphQL Schema
Since I joined at PERSUIT in 2022, we’ve embarked on many projects to modernise our codebase. One of our big ones is increasing our adoption of TypeScript. We wanted to generate TypeScript types for our GraphQL schema using graphql-codegen, but we had a problem: we have a large schema, and there are many, many nullable fields in it, far more than we could manually validate.
Articles in this series:
The problem with nullable fields
When creating a schema, we want to be as precise as possible. This allows clients to make assumptions that greatly simplify their code. Imagine the following GraphQL schema:
type User {
name: String!
email: String
}
A client receiving a User
knows that the user will always have a name, but only some users have an email address. It means that they can safely do this:
function getFirstName(user: User): string {
return user.name.replace(/ .*/, "")
}
To do something similar with the email address, we would first have to check whether it’s null or not. The code below is not safe because it assumes user.email is a string:
function getEmailDomain(user: User): string {
// TypeScript will complain email might not be defined
return user.email.replace(/.*@/, ““)
}
Now imagine our GraphQL schema was
type User {
name: String
email: String
}
Even though we know that users always have a name, someone consuming our schema wouldn’t, and indeed graphql-codegen will generate the following User
type:
type User = {
name?: string
email?: string
}
This will cause TypeScript to complain about our getFirstName()
function, because we can no longer assume that name is defined.
If we didn’t want to fix the schema, we could change our getFirstName()
function to use the non-null assertion operator like this:
function getFirstName(user: User): string {
return user.name!.replace(/ .*/, "")
}
But when repeated for many fields across the whole application, it becomes a major code smell, and new team members may assume “That’s just how things are done here” and accidentally use that operator on something that really can be null.
Think back to the last time you were poking your way around an unfamiliar codebase. Now imagine that many of the fields were declared nullable even when most of them aren’t. You’d either be wasting time writing unnecessary defensive code or risk shipping bugs to production.
This is the situation we were in - we needed to fix our schema.
Fixing the schema without breaking the app
One of the nice things about GraphQL is that you know the data you get back will match the schema. We use Apollo Server to handle the GraphQL requests, and it checks that our responses match the declared schema. If there are any nulls in fields that have been declared non-nullable, Apollo will reject the response1. This means that we’re going to have customer-visible issues if any fields are incorrectly marked as non-nullable. In most case, these issues will show as pages that show an error because they can't retrieve the data they need.
Finding the nullable fields
To find the fields that are actually nullable, we need to analyse our responses. To do this we created a GraphQL directive to mark the fields we think are not nullable, and an Apollo Server plugin to report any fields that were tagged with this directive that had nulls in them.
Step 1: Create the @proposedNonNullable
directive
Creating a directive in GraphQL is easy. In your schema, declare
directive @proposedNonNullable on FIELD_DEFINITION | ARGUMENT_DEFINITION | INPUT_FIELD_DEFINITION | QUERY | MUTATION
You can then tag any field, argument, or field on an input type, query or mutation with @proposedNonNullable
, like so:
type User {
email: String @proposedNonNullable
}
@proposedNonNullable
has no meaning to Apollo, so it won’t complain or reject the response if a user’s email address is null.
Since most of our fields should be non-nullable, we tagged all of the fields that weren’t already non-nullable as @proposedNonNullable
Step 2: Create an Apollo plugin to check our responses
This step was far more involved, and I’ll cover it in detail in parts 2 to 5. In short though, we created a plugin that analyses all of our responses, looks for fields that are in violation, and logs its findings.
Our logs look something like this:
{
"msg": "@proposedNonNullable violation"
"paths": [
"getUsers.total",
"getUsers.users[4].email",
"getUsers.users[6].email"
]
}
In this example. the total field of the getUsers
response was null when it shouldn’t be, and some of the users had a null email address. For getUsers.total
there’s almost certainly a bug to fix, and it would be a good idea to check why the frontend appears to be unaffected - maybe it needed this field in an older version of the software but no longer uses it. For the missing email addresses, we know that not all users have email addresses, so we can remove the @proposedNonNullable
tag from that field.
Step 3: Analysing the logs and fixing the issues
We run this code in our QA environment for a month and fix all of the issues reported. Most of the reported violations were fields that should be nullable, some may be bugs, and the rest are likely to be caused by bad data.
Once all the issues reported in QA are fixed, we perform the same process in production. At this point the only issues we should find are data-related, and only in very old data that is rarely accessed. We can handle these on a case-by-case basis, in many cases we can create a field resolver that computes the missing value.
Step 4: Declaring the @proposedNonNullable
fields as non-nullable
This is the scary part. Once there have been no new violations in production for more than a month, replace all of the @proposedNonNullable
fields with fields that are actually non-nullable. Ie:
type User {
name: String @proposedNonNullable
}
becomes
type User {
name: String!
}
If we have made any mistakes above Apollo will start rejecting our responses, so it's important to have lengthy testing periods in steps 3 and 4.
Coming up...
In upcoming articles, we will delve deeper into the process of upgrading our schema to accurately reflect reality. Parts 2 to 5 will focus on the creation of the plugin, beginning with part two, which explores Apollo's internal representation and demonstrates how to build a simple plugin that logs all fields in each request. Subscribe below to receive these articles in your mailbox.
- This is a simplification. If any of the parent fields are nullable then the closest nullable field will be sent as null, and the error will be added to the list of errors. However since our goal is to make most of our fields non-nullable, this means there is usually no nullable parent field and the client doesn’t receive any of the response data. ↩