The Wrong Database Connection: A Go Deadlock Story

Posted on Aug 2, 2025

TL;DR: Using the wrong connection within a limited connection pool leads to deadlock when concurrent executions exhaust all available connections.


Imagine you’re a software engineer (and you probably are, considering you’re reading this). One day you log in at work, check your workload status, your Grafana dashboard, or if you were diligent enough to create proper alerting, you get paged via PagerDuty, only to discover your application has been stuck for minutes, hours, or even days.

Not dead, just stuck in an idle state, unresponsive to any events. Or even worse, some liveness/readiness probes start failing apparently at random, causing restarts of your service and leaving you with few insights to debug. And in the case no proper alerting or monitoring is set, you won’t easily detect this.s

Unfortunately, this happened to me in two previous experiences. In both cases, the issue was related to the same subtle problem: wrong use of database connections and transactions within the same connection pool, causing a terrible deadlock at the application level.

Let me explain this clearly, with the hope it will save your day in the future. I’ll provide examples in Go since it’s the language I’m most familiar with, but the concept applies to other languages as well.

The Setup

Suppose you have a backend application that relies on a PostgreSQL database. Here’s the simplest way to connect to a database and set up a connection pool in Go:

// Create DB Config
connString := "host=%s port=%d user=%s password=%s dbname=%s sslmode=disable"
databaseUrl := fmt.Sprintf(connString, host, port, user, password, dbname)

db, err := sql.Open("postgres", databaseUrl)
if err != nil {
    panic(err)
}
err = db.Ping()
if err != nil {
    panic(err)
}

// Setup connection pool size
// 4 is an arbitrary number for this example
db.SetMaxOpenConns(4)

Let’s focus on the db.SetMaxOpenConns(N) method. According to the documentation:

// SetMaxOpenConns sets the maximum number of open connections to the database.

This means you can have at most N open connections from your process to the database (4 in this example). When the maximum number of connections is reached, goroutines will wait until an existing connection is released. Once that happens, the connection is acquired to perform whatever operation is needed.

The code that looks fine

Let’s expand our example by adding concurrent workers that use the connection pool to perform transactional operations against the database:

numberOfWorkers := 2        // case numberOfWorkers < size of connection pool
// numberOfWorkers := 4     // case numberOfWorkers == size of connection pool
// numberOfWorkers := 10    // case numberOfWorkers > size of connection pool

for i := range numberOfWorkers {
    go func(id int) {
        log.Printf("Running worker %d\n", id)

        tx, err := db.BeginTx(context.Background(), &sql.TxOptions{})
        if err != nil {
            log.Fatalf("worker %d failed to create tx\n", id)
        }
        defer tx.Rollback()

        _, err = db.Exec("INSERT INTO recipes (id, name, description, created_at) VALUES ($1, $2, $3, $4)", 
            id, fmt.Sprintf("Pizza %d", id), "Just a pizza", time.Now())
        if err != nil {
            log.Fatalf("worker %d failed query\n", id)
        }
        err = tx.Commit()
        if err != nil {
            log.Fatalf("worker %d failed committing tx\n", id)
        }
    }(i)
}

Some of you may have already spotted something wrong with this code, but in production codebases with layers of wrappers and nested methods, such issues aren’t always so clear and evident. Let’s continue and see what happens when we run this code.

Executing the code

When we run our code with a number of workers less than the connection pool size:

2025/08/08 11:53:32 Successfully connected to the db
2025/08/08 11:53:32 Running worker 1
2025/08/08 11:53:32 Running worker 0
2025/08/08 11:53:37 worker 1 ended
2025/08/08 11:53:37 worker 0 ended

Everything works fine. Now let’s increase the number of workers to 4 (equal to the connection pool size):

2025/08/08 11:59:10 Successfully connected to the db
2025/08/08 11:59:10 Running worker 3
2025/08/08 11:59:10 Running worker 0
2025/08/08 11:59:10 Running worker 1
2025/08/08 11:59:10 Running worker 2
2025/08/08 11:59:15 worker 2 ended
2025/08/08 11:59:15 worker 0 ended
2025/08/08 11:59:15 worker 1 ended
2025/08/08 11:59:15 worker 3 ended

Still working fine. Now let’s increase the number of workers to exceed the connection pool size:

2025/08/08 12:00:44 Successfully connected to the db
2025/08/08 12:00:44 Running worker 9
2025/08/08 12:00:44 Running worker 3
2025/08/08 12:00:44 Running worker 7
2025/08/08 12:00:44 Running worker 4
2025/08/08 12:00:44 Running worker 0
2025/08/08 12:00:44 Running worker 2
2025/08/08 12:00:44 Running worker 8
2025/08/08 12:00:44 Running worker 5
2025/08/08 12:00:44 Running worker 6
2025/08/08 12:00:44 Running worker 1

No workers ended this time—the application entered a deadlock state.

Investigating the issue

At this point, what should be the next step to get more insights? In my case, it was using a profiler. You can achieve this in Go by instrumenting your application with pprof. One of the simplest ways to use it is by exposing a web server that serves runtime profiling data:

go func() {
    http.ListenAndServe("localhost:6060", nil)
}()

One interesting thing you can get from pprof, besides CPU and memory profiles, is the full goroutine stack dump by accessing http://localhost:6060/debug/pprof/goroutine?debug=2. This gives you something like:

goroutine 28 [select]:
database/sql.(*DB).conn(0xc000111450, {0x866310, 0xb10e20}, 0x1)
	/usr/local/go/src/database/sql/sql.go:1369 +0x425
database/sql.(*DB).exec(0xc000111450, {0x866310, 0xb10e20}, {0x7f09d8, 0x4f}, {0xc000075f10, 0x4, 0x4}, 0xbe?)
	/usr/local/go/src/database/sql/sql.go:1689 +0x54
// ... more stack trace
main.main.func1(0x9)
	/home/carmelor/Dev/go/src/github.com/rioloc/go-demo/main.go:72 +0x2b8

goroutine 33 [chan receive]:
database/sql.(*Tx).awaitDone(0xc00025e000)
	/usr/local/go/src/database/sql/sql.go:2212 +0x29
created by database/sql.(*DB).beginDC in goroutine 28

goroutine 51 [chan receive]:
database/sql.(*Tx).awaitDone(0xc0000b0100)
	/usr/local/go/src/database/sql/sql.go:2212 +0x29
created by database/sql.(*DB).beginDC in goroutine 25
// ...

The full dump can be found here.

By inspecting the dump more carefully, we can see the evidence of the problem:

goroutine 19 [select]: database/sql.(*DB).conn() // Waiting for connection
goroutine 20 [select]: database/sql.(*DB).conn() // Waiting for connection  
goroutine 21 [select]: database/sql.(*DB).conn() // Waiting for connection
// ... and so on

The select statement is a control structure that lets a goroutine wait on multiple communication operations. Meanwhile, other goroutines are holding active transactions:

goroutine 33 [chan receive]: database/sql.(*Tx).awaitDone() // Active transaction
goroutine 51 [chan receive]: database/sql.(*Tx).awaitDone() // Active transaction
// etc.

The awaitDone() goroutines are transaction monitors that wait for the transaction to be committed, rolled back, or canceled—they’re doing their job correctly.

What we have is a resource deadlock where all available database connections are held by transactions that aren’t progressing, while other goroutines indefinitely wait for those same resources.

The Root Cause

Let’s examine our worker code again, focusing on this critical part:

tx, err := db.BeginTx(context.Background(), &sql.TxOptions{})
if err != nil {
    log.Fatalf("worker %d failed to create tx\n", id)
}
defer tx.Rollback()

// THE BUG IS HERE ↓
_, err = db.Exec("INSERT INTO recipes (id, name, description, created_at) VALUES ($1, $2, $3, $4)", 
    id, fmt.Sprintf("Pizza %d", id), "Just a pizza", time.Now())
if err != nil {
    log.Fatalf("worker %d failed query\n", id)
}
err = tx.Commit()

This code is:

  1. Beginning a transaction, which acquires a connection from the pool
  2. Using the db client to execute a query, which tries to acquire another connection
  3. Committing or rolling back based on the operation status

The problem is that using the db client after creating a transaction results in double connection usage. Here’s exactly how the deadlock occurs:

  • Worker 0 begins a transaction → acquires connection 1 (Pool: 1/4 used)
  • Worker 1 begins a transaction → acquires connection 2 (Pool: 2/4 used)
  • Worker 2 begins a transaction → acquires connection 3 (Pool: 3/4 used)
  • Worker 3 begins a transaction → acquires connection 4 (Pool: 4/4 used)
  • Worker 0 calls db.Exec() → tries to acquire connection 5, but pool is exhausted
  • Worker 1 calls db.Exec() → tries to acquire connection 6, but pool is exhausted
  • Worker 2 calls db.Exec() → tries to acquire connection 7, but pool is exhausted
  • Worker 3 calls db.Exec() → tries to acquire connection 8, but pool is exhausted
  • Deadlock! Everyone is waiting for connections that will never be released.

The Fix

The issue causing all this trouble is the wrong use of db.Exec() instead of tx.Exec(). The correct way is to use the transaction handle, which uses the same connection that the transaction already holds:

tx, err := db.BeginTx(context.Background(), &sql.TxOptions{})
if err != nil {
    log.Fatalf("worker %d failed to create tx\n", id)
}
defer tx.Rollback()

// FIXED: Use tx.Exec() instead of db.Exec()
_, err = tx.Exec("INSERT INTO recipes (id, name, description, created_at) VALUES ($1, $2, $3, $4)", 
    id, fmt.Sprintf("Pizza %d", id), "Just a pizza", time.Now())
if err != nil {
    log.Fatalf("worker %d failed query\n", id)
}
err = tx.Commit()

It’s remarkable how two characters (db vs tx) can halt your entire production system. While this may seem simple to spot in this example, in large production codebases it can be much harder to detect. In my case, it was randomly affecting one pod in our Kubernetes deployment at unpredictable times, especially during high load and concurrency spikes.

Prevention Strategies

How can you avoid this? I see several alternatives:

1. Proper Concurrency Testing
Test your application under realistic concurrent load to spot such issues before production.

2. Code Structure Design
Structure your code so you can’t accidentally open new connections within a transactional block.

3. Use pgxpool for Better Control
For Go applications, consider using pgxpool which provides more granular control over the connection pool. The explicit Acquire()/Release() pattern makes it much clearer when you’re using connections:

conn, err := pool.Acquire(ctx)
defer conn.Release()

tx, err := conn.BeginTx(ctx, pgx.TxOptions{})
defer tx.Rollback(ctx)

// It's much harder to accidentally use pool.Exec() here
_, err = tx.Exec(ctx, "INSERT INTO recipes ...")
tx.Commit(ctx)

4. Monitoring and Alerting
I intentionally avoid mentioning palliative measures like connection timeouts, which may mask the underlying issue while causing performance degradation. Instead, implement proper liveness/readiness probes paired with alerts on restart frequency. This provides a good tradeoff between keeping the system running and being notified when something isn’t behaving correctly.

Key lessons

  • Test under realistic concurrent load
  • Be consistent in resource usage (always be aware on how connections usage happens)
  • Use the right tool, like pprof, to support your debugging against mysterious hangs