The case for prefixed IDs

ยท

6 min read

IDs are often overlooked, but they play a crucial role in almost every application. Most modern IDs are random strings, an improvement on the previously popular sequential IDs. However, we can improve them further by adding a short prefix to designate what type of object the ID is for. This can help greatly in searching and understanding log files and even reducing bugs in our code.

A brief intro to IDs

The two main types of ID are sequential and random. Sequential IDs were commonly used in web applications prior to about 2010 because they were easy to understand and simple to implement as an autoincrement database field. They have fallen out of favour in recent years because they have several drawbacks:

  • When implemented as an autoincrement field in a database, you cannot know the ID of a record until it's been inserted.

  • It can be difficult to create complex data structures, particularly ones that need to be created as a single transaction, since parent objects need to be written to storage before their children can be created.

  • They're easily guessable, so it's much easier to exploit any flaws in your access control.

  • They can leak information about your business to competitors. For example, by comparing the IDs of two new users created a week apart you can estimate how many signups an application is getting.

  • Searching for a particular ID in logs can be a painful experience. It can be difficult to find all the logs related to customer 1234 while excluding the unrelated logs for order 1234, order item 1234, order payment 1234 and product 1234.

  • It's very difficult to merge databases since it's practically guaranteed that the record IDs will clash.

Randomised IDs have become far more popular in recent years. They come in many forms, but ultimately they're just a sequence of random characters. They have two key advantages over sequential IDs:

  • Fast and convenient: They can be generated whenever and wherever they're needed since they don't require a central authority.

  • Unguessable and unenumerable: Assuming that a good-quality PRNG is used, it's practically impossible to discover or enumerate objects without already knowing their IDs.

At this point, I'd like to give special mention to UUIDs, since they have probably the greatest mindshare amongst developers. They have some disadvantages worth bearing in mind:

  • When viewing UUIDs in logs or a database, double-clicking usually only selects part of the ID. Cutting and pasting one, for example when searching log files, generally requires manually selecting all of it.

  • Some UUID libraries use poor-quality PRNGs to generate the ID, leading to identifiers that are potentially guessable by a skilled adversary

  • They're long. UUIDs are represented in hexadecimal (base 16) and have a fixed length of 36 characters. When viewing a database table, a UUID column generally requires around twice the screen space compared to other ID types.

  • They don't support prefixing. If nothing else this makes them hard to recommend in an article that promotes prefixed IDs ๐Ÿ™ƒ

Why prefix?

My IDs are fine - why bother prefixing them?

- A random person on the street

Here's an example of a prefixed ID, in this case for a user: usr_v2koprohd5vn8u61. It contains two components: usr_ which designates that this ID is for a user and v2koprohd5vn8u61, which uniquely identifies this person. Similarly they might belong to group grp_fd7au0lvf5dabnkl. Why do this?

The primary reason is that you can immediately see what the ID is for. If you're debugging a program and you see grp_fd7au0lvf5dabnkl when you're expecting a user ID, you immediately have a very strong clue as to what's gone wrong. It also means you don't have to contextualise IDs when you're logging them, making notes, chatting about them in Slack etc. You don't need to specify that usr_v2koprohd5vn8u61 is a user because the person you're talking to can already see that.

Beyond that, if you're using a language like TypeScript, prefixed IDs can almost entirely eliminate a whole class of bugs. Let's take this code for example:

function addUserToGroup(userId: string, groupId: string) {
  // ...
}

const userId = "1234"
const groupId = "5678"

addUserToGroup(groupId, userId) // Oh no!

It's an easy mistake to make, and one that can be hard to spot in code review. Prefixed IDs by themselves obviously can't prevent this error, but with just a little TypeScript magic we can solve it in most cases:

const userIdPrefix = "usr_"
type UserId = `${typeof userIdPrefix}${string}`

const groupIdPrefix = "grp_"
type GroupId = `${typeof groupIdPrefix}${string}`

function addUserToGroup(userId: UserId, groupId: GroupId) {
   // ...
}

const userId: UserId = "usr_1234"
const groupId: GroupId = "grp_5678"

// Now we will get a TypeScript error
addUserToGroup(groupId, userId)

How does this work? UserId has been declared as any string that starts with usr_. Since a group ID will never start with usr_, TypeScript understands that user IDs and group IDs are not interchangeable.

There is a slight annoyance in that sometimes we need to cast a string to an ID, for example, if we have received it from some external source like a file or database. We can do that safely with a function like the following:

function asUserId(id: string): UserId {
   if (id.startsWith(userIdPrefix)) {
      return id as UserId
   } else {
      throw new Error(`${id} is not a valid user ID`)
   }
}

Generating IDs

If you're coding in TypeScript/TypeScript I recommend using cuid2 as the ID generator. It's simple to use, secure (in that it generates IDs with very high entropy), and if necessary you can choose how long you want your IDs to be. To generate an ID:

import { createId } from '@paralleldrive/cuid2';

const myId: UserId = `usr_${createId()}`

If you're using classes to represent your objects, you can make your instances auto-generate their IDs like so:

import { createId } from '@paralleldrive/cuid2';

class User {
  id: UserId = `usr_${createId()}`

  name!: String

  constructor(user: User & {id?: UserId}) {
    Object.assign(this, user)
  }
}

const fred = new User({ name: "Fred" })

// Will print out "usr_sa43u90qdfsankjf" or similar
console.log(fred.id)

Recommendations

Assuming you're sold on the idea of prefixing your IDs, I have a few recommendations:

  • Use _ as the separator between the type and ID. It is accepted as a valid character by almost all systems, and most importantly is considered to be part of the word when double-clicking, so cutting and pasting becomes much easier

  • Use 3-letter prefixes. It's good for all of your prefixes to have the same length, and 3 letters gives a good compromise between length and finding a unique prefix that's relatable to its type

  • Use a secure library to generate your IDs

  • Use base 36 (0-9, a-z) IDs. They provide more than twice the entropy per character than hex (base 16) while still being easy to type.

  • Use all lower-case for the IDs because it's easier to type and when speaking you don't need to say things like "Upper case A, lower case Q" etc.

  • Don't bikeshed your ID length. Use whatever default your generation library provides, or (assuming base 36) 16 characters excluding the prefix. Even if you continuously generated 1,000 16-character, base 36 IDs every second, it would still take 100 years before you had a 1% chance of a collision.

Conclusion

Prefixing your IDs makes your logs easier to read and (where the language supports it) can help remove a whole class of potential errors from your code. They're simple to implement and greatly improve the development experience. Give them a try in your next project and let me know how you go :)

ย