0 405 en

Unique ID Generation Cheat Sheet

Unique identifiers are crucial for ensuring data integrity and uniqueness in applications. They help track, reference, and ensure consistency across various systems and services. This article contains a Cheat Sheet and descriptions of the different types of unique identifier generators, their methods, and tradeoff analysis to help you choose the best option for your application.

Unique ID Generation Cheat Sheet by bool.dev
You can find a high-resolution PDF here.

0. Database Auto-Increment

. Database Auto-Increment
Auto-increment fields in databases automatically generate unique identifiers for new records.

Example ID: 1, 2, 3 (incrementing sequentially)

Pros:

  • Simple and ensures uniqueness within a single database
  • Easy to implement in single-node applications
  • Efficient for small-scale applications
  • Sequentially ordered
  • Efficient storage of a numeric value

Cons:

  • Not suitable for distributed systems
  • Scalability issues in distributed environments
  • Performance can degrade with large datasets
  • The limited range is based on the integer type (INT, BIGINT, etc.).

Database ID Implementation

CREATE TABLE Users (
id BIGINT AUTO_INCREMENT PRIMARY KEY,
name VARCHAR(255) NOT NULL
);

1. Twitter Snowflake

1. Twitter Snowflake

A 64-bit identifier that includes a timestamp, machine ID, and sequence number, ensuring uniqueness across distributed systems.

Example ID: 13572484591234567

Pros:

  • Guaranteed unique within the distributed system
  • Highly scalable, can generate millions of IDs per second
  • Time-ordered, useful for chronological sorting
  • Efficient storage as a 64-bit integer

Cons:

  • Requires coordination between nodes to avoid collisions
  • Complexity in managing and coordinating worker IDs
  • Slightly slower than simple numeric increments due to bitwise operations

 Twitter Snowflake Implementation

2. UUID V4 (Universally Unique Identifier)

UUID (Universally Unique Identifier)

A 128-bit identifier that is globally unique and can be generated independently by different systems without coordination.

Example ID: 550e8400-e29b-41d4-a716-446655440000

Pros:

  • Globally unique without the need for a central authority
  • Can be generated independently on multiple systems without coordination
  • Fast generation, no need for network communication

Cons:

  • Larger storage requirement (128-bit)
  • No inherent orderability, random distribution
  • Inefficient for use in databases with large datasets

UUID Generation

Mostly supported by default in different programming languages

Python

import uuid

# Generate a UUID
uuid_generated = uuid.uuid4()

print(uuid_generated)

Java

import java.util.UUID;

public class Main {
    public static void main(String[] args) {
        // Generate a UUID
        UUID uuid = UUID.randomUUID();
        
        System.out.println(uuid.toString());
    }
}

Node.js

const { v4: uuidv4 } = require('uuid');

// Generate a UUID
const uuid = uuidv4();

console.log(uuid);

JavaScript

function uuidv4() {
  return "10000000-1000-4000-8000-100000000000".replace(/[018]/g, c =>
    (+c ^ crypto.getRandomValues(new Uint8Array(1))[0] & 15 >> +c / 4).toString(16)
  );
}

console.log(uuidv4());

C#

using System;

class Program
{
    static void Main()
    {
        // Generate a UUID
        Guid uuid = Guid.NewGuid();
        
        Console.WriteLine(uuid.ToString());
    }
}

PHP

function GUID()
{
    if (function_exists('com_create_guid') === true)
    {
        return trim(com_create_guid(), '{}');
    }

    return sprintf('%04X%04X-%04X-%04X-%04X-%04X%04X%04X', mt_rand(0, 65535), mt_rand(0, 65535), mt_rand(0, 65535), mt_rand(16384, 20479), mt_rand(32768, 49151), mt_rand(0, 65535), mt_rand(0, 65535), mt_rand(0, 65535));
}

Ruby

require 'securerandom'

# Generate a UUID
uuid = SecureRandom.uuid

puts uuid

Go

package main

import (
    "fmt"
    "github.com/google/uuid"
)

func main() {
    // Generate a UUID
    uuid := uuid.New()

    fmt.Println(uuid.String())
}

Bash

#!/bin/bash

# Generate a UUID
uuid=$(uuidgen)

echo $uuid

3. UUID v7

UUIDv7 is a new variant of UUID that incorporates a timestamp for sortable and unique identifiers, combining elements of UUID and ULID.

Example ID: 01890c8e-bc4d-7b3f-915e-0d4d4e310e67

Pros:

  • Combines global uniqueness with timestamp-based sorting.
  • Can be generated independently on multiple systems.
  • Maintains compatibility with existing UUID systems and libraries.
  • Useful for chronological sorting and time-based queries.

Cons:

  • Larger storage requirement (128-bit).
  • Slightly more complex generation algorithm than traditional UUID.
  • Performance impact due to timestamp extraction.

4. ULID (Universally Unique Lexicographically Sortable Identifier)

ULID (Universally Unique Lexicographically Sortable Identifier)

A 128-bit, lexicographically sortable identifier combines timestamps and randomness to ensure uniqueness and orderability.

Example ID: 01ARZ3NDEKTSV4RRFFQ69G5FAV

Pros:

  • Globally unique, combining randomness and timestamp
  • Can be generated independently on multiple systems
  • Is compatible with UUID/GUID's
  • 1.21e+24 unique ULIDs per millisecond (1,208,925,819,614,629,174,706,176 to be exact)
  • Lexicographically sortable
  • Uses Crockford's base32 for better efficiency and readability (5 bits per character)
  • Case insensitive
  • No special characters (URL safe)
  • Monotonic sort order (correctly detects and handles the same millisecond)

Cons:

  • A slightly more complex generation algorithm
  • Larger than Snowflake and others

ULID Implementation

5. KSUID (K-Sortable Unique Identifier)

KSUID

A 160-bit identifier that is also sortable by time and includes a timestamp, random payload, and checksum.

Example ID: 0ujsszwN8NRY24YaXiTIEEIo7K0

Pros:

  • Combines timestamp and randomness, globally unique
  • Can be generated independently without coordination
  • Fast generation and verification, suitable for high-throughput systems
  • K-sortable, suitable for time-based sorting
  • Structured format aids in unique and ordered ID generation

Cons:

  • Larger storage requirement (160-bit)
  • Complexity in generation algorithm
  • Additional overhead due to checksum verification

KSUID Implementation

6. MongoDB Object ID

MongoDB ObjectID

A 96-bit identifier that includes a timestamp, machine ID, process ID, and a counter, ensuring uniqueness and a rough creation order.

Example ID: 507f1f77bcf86cd799439011

Pros:

  • Combines timestamp, machine ID, process ID, and counter for uniqueness
  • Can be generated independently on multiple nodes
  • Fast generation, used natively in MongoDB
  • Roughly ordered based on creation time
  • 96-bit size balances uniqueness and storage efficiency

Cons:

  • Not guaranteed to be globally unique in all scenarios
  • Potential for collision if machine and process IDs are not managed correctly
  • Less efficient than simple integer-based IDs
  • Unique to MongoDB: ObjectId is only compatible with MongoDB and might not work with other database systems. The ObjectId format may need to be converted or mapped if you need to integrate with a different database or move to a new one.
  •  

7. CUID (Collision-resistant Unique Identifier)

CUIDs are designed to be highly unique, focusing on being readable and less prone to collisions even in high-concurrency environments.

A Collision-resistant Unique Identifier (CUID) with 25 characters in base-36 encoding typically has a size of around 129 bits. This provides a substantial level of uniqueness while keeping the identifier relatively compact.

Example ID: cjld2cyuq0006s1rxy8123456

Pros:

  • Highly unique with collision resistance even in high concurrency
  • Suitable for distributed systems with high-traffic
  • Fast generation, designed for high-concurrency environments
  • Efficient storage, compact format

Cons:

  • Slightly larger than simple numeric IDs
  • More complex than simple numeric or auto-increment IDs
  • Not naturally ordered, designed primarily for uniqueness

CUID implementation

8. NanoID

NanoID

NanoID is a tiny, secure, URL-friendly unique string ID generator that is designed to be more flexible and performant than UUID.

Example ID: V1StGXR8_Z5jdHi6B

Pros:

  • Secure and highly unique with customizable size and alphabet
  • Suitable for distributed systems, highly scalable
  • Fast generation with high security
  • Compact and efficient due to customizable length and alphabet

Cons:

  • Slightly more complex generation than UUID
  • Customizability might lead to misuse if not properly managed
  • No inherent orderability

NanoID Implementation

9. Sonyflake

Sonyflake

Sonyflake is a distributed unique ID generator inspired by Twitter Snowflake, optimized for 64-bit IDs, and designed to be more efficient for generating unique IDs in a single data center.

Example ID: 1132088477364927953

Pros:

  • Highly unique with a low risk of collisions in a single data center
  • Optimized for single data center deployment, highly scalable
  • Fast generation with efficient use of 64-bit storage
  • Time-ordered, useful for chronological sorting
  • Efficient 64-bit size, similar to Twitter Snowflake

Cons:

  • Requires careful configuration to avoid collisions
  • Less suitable for multi-data center environments
  • Limited to the single data center, it requires careful management

Sonyflake Implementation

10. FlakeID

FlakeID

FlakeID is a decentralized unique ID generator that creates 128-bit IDs based on timestamps, machine IDs, and random numbers.

Flake IDs have 128-bit size:

  • 64-bit timestamp - milliseconds since the epoch (Jan 1, 1970)
  • 48-bit worker ID - MAC address from a configurable device
  • 16-bit sequence # - usually 0, incremented when more than one ID is requested in the same millisecond and reset to 0 when the clock ticks forward

Example ID: 4zqG3B2TnMs57S1PvQ

Pros:

  • Combines timestamp, machine ID, and randomness for high uniqueness
  • Suitable for distributed systems, highly scalable
  • Fast generation, ideal for high-throughput systems
  • Roughly ordered based on timestamp
  • Larger 128-bit size balances uniqueness and orderability

Cons:

  • Requires careful management of machine IDs and random components
  • Larger storage size compared to simpler numeric IDs
  • Not precisely ordered due to random component

FlakeID Implementation

11. Base62

Base62 encoding is a method of encoding IDs that combines uppercase and lowercase letters and digits, making it URL-friendly and compact. The size of Base62 of length 22 characters =22×5.95≈130.9-bit

Example ID: 1B2M2Y8AsgTpgAmY7PhCfg

Pros:

  • Unique and URL-friendly avoid special characters
  • Suitable for web applications needing URL-friendly IDs
  • Fast generation with compact encoding
  • Compact storage due to Base62 encoding

Cons:

  • Not inherently collision-resistant in high-concurrency environments
  • Requires careful management to ensure uniqueness
  • Complexity in encoding and decoding compared to numeric IDs
  • No inherent orderability

Base62 implementation

Summary

When selecting a unique identifier format for your application, consider the specific requirements, including uniqueness, scalability, performance, orderability, and storage efficiency. Each type of identifier has its tradeoffs, and the best choice will depend on your application's context.

  • Database Auto-Increment: Simple and efficient for single-node applications, but not suitable for distributed systems.
  • Twitter Snowflake: Best for high-throughput, time-ordered IDs in distributed systems.
  • UUID: Ideal for global uniqueness without central coordination but with larger storage needs.
  • ULID: Combines timestamp-based ordering with global uniqueness, suitable for sortable IDs.
  • KSUID: Offers sortable IDs with additional robustness from a checksum, though with larger storage requirements.
  • MongoDB ObjectID: A balanced approach for systems needing roughly ordered IDs with reasonable storage efficiency.
  • CUID: Best for high-concurrency environments needing collision resistance.
  • NanoID: Ideal for secure, URL-friendly unique IDs with customizable length.
  • Sonyflake: Optimized for single data center deployment with efficient 64-bit IDs.
  • FlakeID: Suitable for distributed systems needing 128-bit unique IDs.
  • Base62: Compact, URL-friendly IDs for web applications needing readable IDs.

By understanding these tradeoffs, you can make an informed decision that aligns with your application's needs for uniqueness, performance, and simplicity.

References and Further Reading

  1. UUID Specification
  2. Twitter Snowflake
  3. MongoDB ObjectID
  4. NanoID Documentation

Comments:

Please log in to be able add comments.