Avoiding Data Races in Go

The following diagram illustrates a common scenario that can lead to a data race in a web application. An updater runs in a single goroutine (a goroutine is similar to a thread), which is responsible for periodically querying a database and updating some internal application state. At the same time, a listener, such as web server, spawns new goroutines as needed to handle incoming requests, and each of these goroutines needs read access to the shared state.

The data race we need to avoid is a “partial read”, where a handler goroutine attempts to read from the shared state while the updater goroutine is writing to it. How can we construct the shared state object to avoid this race condition?

What doesn’t work

As a simple example, let’s say our internal state just consists of a slice of users. If we just store the shared state in a plain struct, and both the updater and the handlers access the fields directly, we are at risk of partial reads:

type State struct {
    Users []User
}

Solutions

Method 1: Mutex + Accessors

sync.Mutex is a mutual exclusion lock. Only one goroutine at a time can hold the lock. If a goroutine tries to acquire the lock while it’s held by another goroutine, the goroutine requesting the lock will block until the lock is available. To prevent the data race, create a mutex as part of the State struct and only expose the data fields through accessor functions that acquire and release the lock when called.

var State struct {
    mu sync.Mutex
    users []User
}

func (s *State) GetUsers() []User {
    mu.Lock()
    defer mu.Unlock()
    return s.users
}

func (s *State) SetUsers(users []User) {
    mu.Lock()
    defer mu.Unlock()
    s.users = users
}

Using a plain Mutex, only one goroutine can hold the lock at any given time, so if there are a large number of concurrent readers (as can easily happen with a web server handling many requests) each could end up waiting a long time to acquire the lock. But there is a better solution: sync.RWMutex

Method 2: RWMutex + Accessors

sync.RWMutex is a mutual exclusion lock where the lock can be held by any number of readers OR one writer. For our use case, RWMutex is strictly better because we are happy to allow multiple goroutines to read the data simultaneously. But when the updater goroutine acquires the lock, we want all the readers to wait until the updater releases the lock. RWMutex lets us achieve exactly that.

The code looks almost the same as for the plain Mutex, except we use RLock() and RUnlock() to acquire and release the lock when reading:

var State struct {
    mu sync.RWMutex
    users []User
}

func (s *State) GetUsers() []User {
    mu.RLock()
    defer mu.RUnlock()
    return s.users
}

func (s *State) SetUsers(users []User) {
    mu.Lock()
    defer mu.Unlock()
    s.users = users
}

That’s it! You can safely call SetUsers() from your update goroutine(s) and GetUsers() from your reader goroutines without any risk of a data race.

Real-world examples

One of the things I love about Go is the standard library. It’s a great place to find well-documented, idiomatic code. The log package makes use of a sync.Mutex to serialize concurrent writes from multiple goroutines to the same output destination.

An example of sync.RWMutex in the wild can be found in go-cache, a generic thread-safe in-memory cache, which uses sync.RWMutex to protect access to an underlying map.

Data Race Detection

Golang comes with a built-in data race detector tool which can be used to find partial reads and other kinds of race conditions in your code.

To detect a data race in a web server, you can run the server code with:

go run -race myserver.go

Then you can use a tool like Apache benchmark to send some large number of concurrent requests to the server.

Avoiding starvation

This blog post explains how Go avoids the issue of starvation, which is when a goroutine is never be able to access the lock due to too many other goroutines competing for the same access.

sync/atomic

The sync/atomic package has primitives that allow for atomic updates of data without the need for locks, but it is generally not recommended for use in application code: “These functions require great care to be used correctly. Except for special, low-level applications, synchronization is better done with channels or the facilities of the sync package. Share memory by communicating; don’t communicate by sharing memory.”