Kishore Karanam archive  

Implementing Bitcask in Go

May 30, 2025.

#computers

Imagine you're building a database. Not something huge like PostgreSQL, but a smaller, simpler, fast storage engine. Something that's easy to reason about, yet powerful enough for real-world applications. That's what Bitcask is all about, a minimal yet extremely efficient key-value storage engine designed to perform reads and writes lightning-fast.

In this post, we'll walk through exactly how we implemented the bitcask paper in Go, focusing not only on what we built but also why we built it this way, and what makes it special. We won't rush; each idea, each piece of code will make sense as we go along.

First, let's grasp the core idea behind Bitcask. It uses a straightforward but powerful strategy:

Let's begin with how our Bitcask handles storing data. For every PUT operation (storing a key-value pair), we don't overwrite old data. Instead, we simply append new entries to a file named data.txt.

Here's how we handle writing each entry:

bitcask.go
// Constants for entry flags
const (
    flagNormal   byte = 0x01
    flagTombstone byte = 0x02
)

// WriteEntry appends a key-value pair to the active file
func WriteEntry(w *bufio.Writer, key, value []byte) {
    // Write entry header
    w.WriteByte(flagNormal)  // Normal entry flag
    binary.Write(w, binary.BigEndian, uint32(len(key)))   // Key length
    binary.Write(w, binary.BigEndian, uint32(len(value))) // Value length
    
    // Write actual data
    w.Write(key)
    w.Write(value)
    
    // Update active file size tracker
    activeFileSize += int64(1 + 8 + len(key) + len(value))  // 1 byte flag + 8 bytes for lengths
}

Each record is clearly structured: a flag indicating if it's a normal entry or a tombstone (for deletion), the key length, the value length, then the actual key and value bytes. No mystery here, simple and straightforward.

But we face a challenge. As we continue writing entries, our file will grow indefinitely. We solve this problem using a strategy called file rotation. Once our active file (data.txt) exceeds a certain size (we chose 100 bytes for demonstration), we rotate the file. Rotating means renaming the current active file to a timestamped log file, for example, data_1748466343.log. After that, we start fresh again with an empty data.txt.

Here's exactly how we perform file rotation safely and atomically in our code:

bitcask.go
// RotateFile handles file rotation and compaction
func RotateFile(oldW *bufio.Writer, keyDir map[string]FileOffset) (*bufio.Writer, error) {
    // Flush any pending writes
    oldW.Flush()
    
    // Acquire file lock to prevent concurrent access
    lock := flock.New("data.txt.lock")
    if err := lock.Lock(); err != nil {
        return oldW, fmt.Errorf("lock file failed: %w", err)
    }
    defer lock.Unlock()

    // Generate new log filename with timestamp
    timestamp := fmt.Sprintf("%d", time.Now().Unix())
    newLogName := fmt.Sprintf("data_%s.log", timestamp)
    os.Rename("data.txt", newLogName)

    // Create new active file
    f, err := os.OpenFile("data.txt", os.O_CREATE|os.O_RDWR|os.O_APPEND, 0644)
    if err != nil {
        return oldW, err
    }
    newW := bufio.NewWriter(f)

    // Update keyDir references
    for k, fo := range keyDir {
        if fo.FileID == "data.txt" {
            keyDir[k] = FileOffset{FileID: newLogName, Offset: fo.Offset}
        }
    }
    activeFileSize = 0

    // Find all log files and sort by timestamp (newest first)
    files, _ := filepath.Glob("data_*.log")
    sort.Slice(files, func(i, j int) bool {
        return extractTimestamp(files[i]) > extractTimestamp(files[j])
    })
    
    // Compact files
    MergeFiles(files, keyDir)

    // Rename compacted file
    compactedLog := "compacted_data.txt"
    os.Rename(compactedLog, newLogName)

    // Create hint file for faster startup
    hintFileName := fmt.Sprintf("data_%s.hint", timestamp)
    hintFile, _ := os.Create(hintFileName)
    for key, fo := range keyDir {
        if fo.FileID == newLogName {
            // Write key length, key, and offset
            binary.Write(hintFile, binary.BigEndian, uint32(len(key)))
            hintFile.Write([]byte(key))
            binary.Write(hintFile, binary.BigEndian, uint64(fo.Offset))
        }
    }
    hintFile.Close()

    // Clean up old log files
    for _, oldLog := range files {
        if oldLog != newLogName {
            os.Remove(oldLog)
        }
    }

    // Rebuild keyDir from hint files
    freshKeyDir, _ := RebuildKeyDir()
    for k := range keyDir {
        delete(keyDir, k)
    }
    for k, fo := range freshKeyDir {
        keyDir[k] = fo
    }
    
    activeFileSize = 0
    return newW, nil
}

When we rotate files, we rename the active file to a new name and create a fresh active file. But we also need to rebuild the key index, what we call the keyDir, so it points to the right locations. To do that, we use compaction.

Compaction takes multiple rotated files and merges them into one single file containing only the most recent entries for each key. It also drops entries marked as deleted, known as "tombstones." This keeps storage efficient and ensures recovery is quick.

Next comes the neat part: hint files.

Hint files are small, secondary files that store a quick index of keys to their offsets in the compacted logs. With these tiny hint files, we drastically speed up the startup process. Instead of scanning entire log files (which could be huge), we just quickly read these small hint files at startup to rebuild our in-memory keyDir.

Here's how our RebuildKeyDir function uses hint files:

bitcask.go
// RebuildKeyDir reconstructs the in-memory index from hint files
func RebuildKeyDir() (map[string]FileOffset, error) {
    keyDir := make(map[string]FileOffset)

    // Find all hint files and sort by timestamp (oldest first)
    hints, _ := filepath.Glob("data_*.hint")
    sort.Slice(hints, func(i, j int) bool {
        return extractTimestamp(hints[i]) < extractTimestamp(hints[j])
    })

    // Process each hint file
    for _, hintFile := range hints {
        // Determine corresponding log file
        logFile := strings.TrimSuffix(hintFile, ".hint") + ".log"
        
        f, _ := os.Open(hintFile)
        r := bufio.NewReader(f)
        
        // Read hint file entries
        for {
            var keyLen uint32
            if err := binary.Read(r, binary.BigEndian, &keyLen); err == io.EOF {
                break
            }
            
            // Read key
            key := make([]byte, keyLen)
            io.ReadFull(r, key)
            
            // Read offset
            var offset uint64
            binary.Read(r, binary.BigEndian, &offset)
            
            // Add to keyDir
            keyDir[string(key)] = FileOffset{
                FileID: logFile, 
                Offset: int64(offset),
            }
        }
        f.Close()
    }

    return keyDir, nil
}

This is why Bitcask shines: when we restart our database, we don't have to slowly scan huge data files. We rapidly scan these tiny hint files. Our database is back up and running, with a full, fresh index in seconds or even milliseconds, even for millions of keys.

Finally, let's briefly touch on the GET operation. It's elegantly simple: Given a key, we just look it up in our keyDir, find the offset, open the file listed in FileOffset, seek to that offset, and read out the value directly. It's a single seek operation, extremely fast.

bitcask.go

    func Get(key string, keyDir map[string]FileOffset) (string, error) {
    fo, ok := keyDir[key]
    if !ok {
        return "", fmt.Errorf("key not found")
    }
    f, err := os.Open(fo.FileID)
    if err != nil {
        return "", err
    }
    defer f.Close()

    f.Seek(fo.Offset, io.SeekStart)
    flag := make([]byte, 1)
    f.Read(flag)
    if flag[0] == flagTombstone {
        return "", fmt.Errorf("key '%s' was deleted", key)
    }

    var kLen, vLen uint32
    binary.Read(f, binary.BigEndian, &kLen)
    binary.Read(f, binary.BigEndian, &vLen)
    f.Seek(int64(kLen), io.SeekCurrent)

    valBuf := make([]byte, vLen)
    io.ReadFull(f, valBuf)
    return string(valBuf), nil
}

By following this process, we built a clean, robust implementation of the Bitcask storage engine in Go, closely sticking to the original design outlined in the paper. It's simple, efficient, and very fast. Bitcask reminds us that simplicity often outperforms complexity, and that clever design decisions can lead to elegant solutions that scale beautifully in the real world.

Credits

Phil Eaton told me to read papers and implement them. This was last year. He suggested delta lake at the time, so I did it in Python. Then came the busy, the lazy, the everything. It stopped. March arrived, and the interest returned. A kind Reddit user, anonymous and random, suggested bitcask: simpler, good for starting. So I worked for a week. Maybe. Then again, busy, lazy, everything. It stopped again. Now it’s May. Here I am, done. So thanks to Phil Eaton for the idea to read papers, implement them. And thanks to that Reddit user, kind and random, dude or dudess? who knows. For the bitcask paper. It was fun, in the end.