Tracing makes a bug easy to spot

Today, I found a bug before I noticed it. Like, it was subtle, and so I wasn’t quite sure I saw it–maybe I hadn’t hit refresh yet? Later, I looked at the trace of my function and boom, there was a clear bug.

Here’s the function with the bug. It responds to a request to /win by saving a record of the win and returning the total of my winnings so far. Can you spot the problem in the TypeScript?

the code pictured is repeated in text below
in my IDE with syntax highlighting

  public respondToWin = async (req: express.Request, res: express.Response) => {

    const username = "jessitron";
    const record = new Winnings(username, 1);;

    const { total_winnings } = await this.winRepository
      .select("SUM(winnings)", "total_winnings")
      .where("wins.username = :username", { username })

    res.send("Total so far:" + total_winnings);

It’s subtle.

Now here’s a trace in Honeycomb:

the /win call runs for 6.2ms;
the save starts about 1ms in and runs for 14.8ms;
the select starts about 4ms in and runs for 2.2ms
a call to /win, with spans showing when everything happens. It shows calls to middleware, the database save (connect, then start, then insert, then commit) and the database select (connect, then select).

Now do you see the bug?

Hint: the problem is that the total_winnings returned does not include the winnings represented by this call.

In the trace, I can see that the select ran concurrently with the save.

same trace, annotated to highlight that the "select" span covers some of the same time range as the "save" span

Ohhh, so the result of the select definitely is not including the record I just inserted.

In the code, the bug is an absence before this line:;

I need to await the promise returned by save. Here’s the fix:

same code, with 'await' inserted before the save.

Now I remember that the first time I tested /win, I got back zero total winnings the first time. But then on refresh, it went up and kept going up. I thought maybe I saw that first one wrong — nope, it was a sneaky bug.

This one is sneaky because it’s a problem with something that is not there. It takes an expert to notice a dangerous absence. Meanwhile, in the trace, I saw something that was there: the select running at the same time as the save. If you want to get better at concurrent programming, this is the kind of tool that builds your expertise.

(For the record, this trace is created by auto-instrumentation for Express, TypeORM, and Postgres. It’s generic configuration that works in any Node app with these libraries.)

Spans in a trace are great for showing you when stuff happened in relation to other stuff. Use them to check your asynchronous code. Include tracing in your development process!