Skip to content
Menu
Portfolio
  • Artificial Intelligence
  • Personal Projects
  • Assignments
  • Algorithms
  • Notes
  • Home
Portfolio

TechPi Forum

Posted on July 12, 2025July 14, 2025

Snowflake‑based ID generation system

Snowflake algorithm core properties:

  • Global Uniqueness: No duplicate IDs should ever be generated.
  • Trending (Roughly) Increasing: Especially important for MySQL InnoDB, which uses clustered indexes based on B-trees. Sequential IDs help ensure high write performance.
  • Monotonic Increase: Useful in cases such as versioning or message ordering — the next ID must always be larger than the previous.
  • Obfuscation & Security: Sequential IDs are predictable. If exposed (e.g., as order numbers), they can be exploited. Sometimes you need IDs that are non-sequential or randomized.

Snowflake uses a 64-bit long to store an ID composed of 4 parts:

  1. Sign Bit (1 bit): Always 0 to ensure the ID is positive.
  2. Timestamp (41 bits): Millisecond-level timestamp.
  3. Worker ID (10 bits): Represents the node ID, allowing up to 1024 machines.
  4. Sequence Number (12 bits): Auto-increment sequence within the same millisecond, allowing up to 4096 IDs per worker per ms.

Our generator implementation:

public class PiSnowflakeIdGenerator implements IdGenerator {

    /**
     * Number of bits used for the sequence number.
     */
    private static final long SEQUENCE_BITS = 10L;

    /**
     * Number of bits used for the worker ID.
     */
    private static final long WORKER_ID_BITS = 7L;

    /**
     * Number of bits used for the data center ID.
     */
    private static final long DATA_CENTER_BITS = 3L;

    private static final long SEQUENCE_MASK = (1 << SEQUENCE_BITS) - 1;

    private static final long WORKER_ID_LEFT_SHIFT_BITS = SEQUENCE_BITS;
    private static final long DATACENTER_LEFT_SHIFT_BITS = SEQUENCE_BITS + WORKER_ID_BITS;
    private static final long TIMESTAMP_LEFT_SHIFT_BITS = WORKER_ID_LEFT_SHIFT_BITS + WORKER_ID_BITS + DATA_CENTER_BITS;

    /**
     * Worker ID (7 bits)
     */
    private long workId = 1;

    /**
     * Data center ID (3 bits)
     */
    private long dataCenter = 1;

    /**
     * Last timestamp recorded
     */
    private long lastTime;

    /**
     * Sequence number
     */
    private long sequence;

    private byte sequenceOffset;

    public PiSnowflakeIdGenerator() {
        try {
            String ip = IpUtil.getLocalIp4Address();
            String[] cells = StringUtils.split(ip, ".");
            this.dataCenter = Integer.parseInt(cells[0]) & ((1 << DATA_CENTER_BITS) - 1);
            this.workId = Integer.parseInt(cells[3]) >> 16 & ((1 << WORKER_ID_BITS) - 1);
        } catch (Exception e) {
            this.dataCenter = 1;
            this.workId = 1;
        }
    }

    public PiSnowflakeIdGenerator(int workId, int dataCenter) {
        this.workId = workId;
        this.dataCenter = dataCenter;
    }

    /**
     * Generate a trend-incrementing ID.
     *
     * @return Unique ID
     */
    @Override
    public synchronized Long nextId() {
        long nowTime = waitToIncrDiffIfNeed(getNowTime());
        if (lastTime == nowTime) {
            if (0L == (sequence = (sequence + 1) & SEQUENCE_MASK)) {
                // Sequence number used up in the current time unit; wait for the next second.
                nowTime = waitUntilNextTime(nowTime);
            }
        } else {
            // Alternate the starting sequence value between 0 and 1
            vibrateSequenceOffset();
            sequence = sequenceOffset;
        }
        lastTime = nowTime;

        long ans = ((nowTime % DateUtil.ONE_DAY_SECONDS) << TIMESTAMP_LEFT_SHIFT_BITS)
                 | (dataCenter << DATACENTER_LEFT_SHIFT_BITS)
                 | (workId << WORKER_ID_LEFT_SHIFT_BITS)
                 | sequence;

        if (log.isDebugEnabled()) {
            log.debug("seconds:{}, datacenter:{}, work:{}, seq:{}, ans={}",
                    nowTime % DateUtil.ONE_DAY_SECONDS, dataCenter, workId, sequence, ans);
        }

        return Long.parseLong(String.format("%s%011d", getDaySegment(nowTime), ans));
    }

    /**
     * If the current time is earlier than the last recorded time, wait until the clock catches up to avoid duplicates.
     *
     * @param nowTime Current timestamp
     * @return Adjusted timestamp
     */
    private long waitToIncrDiffIfNeed(final long nowTime) {
        if (lastTime <= nowTime) {
            return nowTime;
        }
        long diff = lastTime - nowTime;
        AsyncUtil.sleep(diff);
        return getNowTime();
    }

    /**
     * Wait until the next time unit (second).
     *
     * @param lastTime Previous timestamp
     * @return Next timestamp
     */
    private long waitUntilNextTime(final long lastTime) {
        long result = getNowTime();
        while (result <= lastTime) {
            result = getNowTime();
        }
        return result;
    }

    /**
     * Toggle the sequence offset between 0 and 1 to avoid fixed starting sequence.
     */
    private void vibrateSequenceOffset() {
        sequenceOffset = (byte) (~sequenceOffset & 1);
    }

    /**
     * Get the current time in seconds.
     *
     * @return Current time in seconds
     */
    private long getNowTime() {
        return System.currentTimeMillis() / 1000;
    }

    /**
     * Build a date-based prefix using year and day-of-year format.
     *
     * @param time Timestamp
     * @return Date segment prefix
     */
    private static String getDaySegment(long time) {
        LocalDateTime localDate = DateUtil.time2LocalTime(time * 1000L);
        return String.format("%02d%03d", localDate.getYear() % 100, localDate.getDayOfYear());
    }
}

Our implementation details:

  • Uses seconds instead of milliseconds
  • Uses a year+day prefix to IDs
  • Adjusts bit ratio: workerId:dataCenterId = 3:7
  • If time goes backward, it waits instead of throwing an error
  • Alternates the starting value of the sequence between 0 and 1 to avoid even-only IDs

image credit: https://levelup.gitconnected.com/exploring-snowflake-algorithm-for-global-unique-primary-keys-in-distributed-systems-7b1adf1d435d

Pages: 1 2 3 4 5

CATEGORIES

  • Artificial Intelligence
  • Personal Projects
  • Notes
  • Algorithms

  • University of Maryland
  • CMSC426 - Computer Vision
  • CMSC320 - Introduction to Data Science
  • CMSC330 - Organization of Programming Languages
  • CMSC216 - Introduction to Computer Systems
©2025 Portfolio | WordPress Theme by Superbthemes.com