Extending our plugin to report data types

Find the complete code for this article on Github

We're building a safe way to transition our many nullable GraphQL fields to non-nullable. I covered why we want to do this in part 1, and in part 2 we built a simple Apollo server plugin that reports on the fields requested by each operation, expanding any fragments as necessary.

In this article, we will extend our plugin so that it also reports the type of each field that it's returning. As we noted in part 2, this information is not available from the request document, so we need to consult the schema.

Articles in this series:

Querying the schema

Because a FieldNode doesn't tell us the parent type of the field, we need to keep track of our parent type as we traverse the document. At the start, it's easy - the operation is either a query, mutation or schema, and the response types are represented as fields of the Query, Mutation and Subscription types respectively:

export function getOperationParentType(
  { schema }: GraphQLRequestContext<unknown>,
  operation: OperationDefinitionNode
): GraphQLObjectType | undefined {
  switch (operation.operation) {
    case 'query':
      return schema.getQueryType() ?? undefined
    case 'mutation':
      return schema.getMutationType() ?? undefined
    case 'subscription':
      return schema.getSubscriptionType() ?? undefined
    default:
      throw new Error(`Unknown operation ${operation.operation}`)
  }
}

As a quick aside, note how we use GraphQLRequestContext<unknown> instead of GraphQLRequestContext<any>. unknown is a better choice than any because unknown prevents us from accidentally assuming something about the type.

We have the ?? undefined because the Apollo typings declare those functions as returning Maybe<GraphQLObjectType>, which GraphQL defines as GraphQLObjectType | undefined | null. Removing null from the possibilities makes life just a little easier for our calling code.

Walking the tree

Now we need to process each of the fields in the operation definition's selection set. There's usually only one, but there can be more. The good thing is we can handle this step like we would any other GraphQL object, because the operation is represented as querying fields on the Query, Mutation or Subscription type.

Let's adapt our code from part 2:

export function listFields(
  requestContext: GraphQLRequestContext<unknown>,
  possibleParents: readonly GraphQLObjectType[],
  selectionSet: SelectionSetNode,
  path: readonly string[] = []
): RequestedPathDetails[] {
  const allPossibleFields = possibleParents.flatMap((parent) =>
    Object.values(parent.getFields())
  )
  const allPossibleFieldsByName = groupBy(prop('name'), allPossibleFields)
  const fieldNodes = selectionSet.selections.flatMap((selection) =>
    getFields(requestContext, selection)
  )
  const branchNodes = fieldNodes.filter(({ selectionSet }) => selectionSet)
  const leafNodes = fieldNodes.filter(({ selectionSet }) => !selectionSet)

  return [
    ...branchNodes.flatMap(processBranchNode),
    ...leafNodes.map(processLeafNode)
  ]
}

There's not much extra in the main function compared to part 2. We've added a new parameter, possibleParents. We use this to look up the fields that each path could reference. To understand why it's an array, imagine this schema, based on the one we used in part 2:

type Query {
    media: [Media]!
}

union Media = Book | Movie

type Book {
    name: String!
    """ This is a new field. Note Book.released is nullable """
    released: Int
    author: Person!
}

type Movie {
    name: String!
    """ This is a new field. Note Movie.released is non-nullable """
    released: Int!
    director: Person!
}

type Person {
    name: String!
}

If we want to query the director of a movie, we need to do it like this:

query {
  media {
    ... on Book {
      released
    }
    ... on Movie {
      released
    }
  }
}

Note how Media could be either a Book or a Movie, and that means when we process the fields in query.media, they could come from either the Book or Movie type. getFields() would return an array with two FieldNodes - one for Book.released and one for Movie.released. Since Book.released and Movie.released could have different types, we must process them both.

We get all of the fields of possibleParents, storing them in allPossibleFields. The groupBy and prop functions come from Ramda, a very useful utility library for functional programming. Then for convenience, we create allPossibleFieldsByName so we can look the fields up by name.

As in part 2, we divide the requested fields into two types, leaf and branch nodes, by whether each field has a selection set. These types correspond to scalar and object GraphQL types respectively, since you cannot request child fields of a scalar type, and you must request child fields if it's an object type.

Handling leaf nodes (scalar types)

Leaf nodes are passed to processLeafNode(). This function is defined within listFields() and so has access to all of the outer function's variables. It is substantially similar to the one in part 2 - it simply adds the types to the returned structure:

function processLeafNode(leafNode: FieldNode): RequestedPathDetails {
  return {
    path: [...path, getDataProperty(leafNode)].join('.'),
    possibleFields: allPossibleFieldsByName[leafNode.name.value]
  }
}

As in part 2, we pass the field node to getDataProperty() which returns the field as it would appear in the response, taking into account its alias, if it has one.

possibleFields is set to the array of schema fields that the path could be referencing. While it's possible for there more than one if possibleParents has more than one element, in most cases they'll be equivalent. We don't deduplicate here though because that's beyond the scope of this function and we don't know what duplicate means to the caller - they could vary in nullability or the directives placed on them. Additionally, because of how we process branch nodes, the final array returned by listFields() may contain multiple entries with the same path. It's simpler and easier to delegate collating and deduplicating to the calling code.

Handling branch nodes (object types)

The main change to processBranchNode() is that we need to generate the array that will be passed to the possibleParents argument of listFields(). Again it is defined within listFields() so it can access all of its variables. Let's have a look:

function processBranchNode(branchNode: FieldNode): RequestedPathDetails[] {
  const possibleNodeTypes = allPossibleFieldsByName[branchNode.name.value]
    .map(prop('type'))
    .map(getBaseType)
    .flatMap(getPossibleTypes(requestContext))
    .filter(isObjectType)
  const uniquePossibleNodeTypes = uniqBy(prop('name'), possibleNodeTypes)
  const childPathDetails = listFields(
    requestContext,
    uniquePossibleNodeTypes,
    branchNode.selectionSet,
    [...path, getDataProperty(branchNode)]
  )

  return [processLeafNode(branchNode), ...childPathDetails]
}

possibleNodeTypes is calculated using a 4-step process:

  1. Get the type of each field using prop('type') - that's equivalent to ({type}) => type.

  2. Get the base type of the type. This removes any modifiers such as non-nullable and list. For example, [String!]! will be converted to String

  3. Resolve any abstract types. For any unions or interfaces, we need the list of concrete types.

  4. Filter out any scalar types. In a well-designed schema there generally won't be any, but we can't guarantee that.

There may be duplicates if possibleParents contains more than one type with the same object fields. These would cause unnecessary processing so we filter out the duplicates to create uniquePossibleNodeTypes.

We then recurse back into listFields() to get all of the child-requested paths. When returning the result, we add an entry for the current for the branch node itself, since in part 4 we will want to check branch nodes as well.

Summing up

In addition to reporting all of the paths that have been requested, we now include the possible types that each path could be. This is all the information we need to determine which paths we need to check for null values. In part 4, we will take this information and analyse the response to find the fields that are null when they shouldn't be. Subscribe now to get notified as soon as it's published!