Gracefully terminate a program in Go

This post is about gracefully terminating a program without breaking currently running process.

Let’s implement some dummy task to run.

package main

import (
    "fmt"
    "time"
)

type Task struct {
    ticker *time.Ticker
}

func (t *Task) Run() {
    for {
        select {
        case <-t.ticker.C:
            handle()
        }
    }
}

func handle() {
    for i := 0; i < 5; i++ {
        fmt.Print("#")
        time.Sleep(time.Millisecond * 200)
    }
    fmt.Println()
}

func main() {
    task := &Task{
        ticker: time.NewTicker(time.Second * 2),
    }
    task.Run()
}

At two-second interval Task.Run() calls handle() function, which just prints five ‘#’ symbols with 200ms delay.

If we terminate a running program by pressing Ctrl+C, while in the middle of the handle(), we’ll be left with partly-done job.

$ go run main.go
#####
###^Csignal: interrupt

But we want our program to handle the interrupt signal gracefully, i.e. finish the currently running handle(), and, probably, perform some cleanup. First, let’s capture the Ctrl+C. Notice, that we handle the receiving from channel c in another goroutine. Otherwise, the select construct would block the execution, and we would never get to creating and starting our Task.

func main() {
    task := &Task{
        ticker: time.NewTicker(time.Second * 2),
    }

    c := make(chan os.Signal)
    signal.Notify(c, os.Interrupt)

    go func() {
        select {
        case sig := <-c:
            fmt.Printf("Got %s signal. Aborting...\n", sig)
            os.Exit(1)
        }
    }()

    task.Run()
}

Now, if we interrupt in the middle of handle(), we’ll get this:

$ go run main.go
#####
##^CGot interrupt signal. Aborting...
exit status 1

Well, except that we see our message instead of a default one, nothing changed.

Graceful exit

There is a pattern for a graceful exit, that utilises a channel.

type Task struct {
    closed chan struct{}
    ticker *time.Ticker
}

The channel is used to tell all interested parties, that there is an intention to stop the execution of a Task. That’s why it’s called closed by the way, but that’s just a convention. The type of a channel doesn’t matter, therefor usually it’s chan struct{}. What matters is the fact of receiving a value from this channel. All long-running processes, that want to shut down gracefully, will, in addition to performing their actual job, listen for a value from this channel, and terminate, if there is one.

In our example, the long-running process is Run() function.

func (t *Task) Run() {
    for {
        select {
        case <-t.closed:
            return
        case <-t.ticker.C:
            handle()
        }
    }
}

If we receive a value from closed channel, then we simply exit from Run() with return.

To express the intention to terminate the task we need to send some value to the channel. But we can do better. Since a receive from a closed channel returns the zero value immediately [1], we can just close the channel.

func (t *Task) Stop() {
    close(t.closed)
}

We call this function upon receiving a signal to interrupt. In order to close the channel, we first need to create it with make.

func main() {
    task := &Task{
        closed: make(chan struct{}),
        ticker: time.NewTicker(time.Second * 2),
    }

    c := make(chan os.Signal)
    signal.Notify(c, os.Interrupt)

    go func() {
        select {
        case sig := <-c:
            fmt.Printf("Got %s signal. Aborting...\n", sig)
            task.Stop()
        }
    }()

    task.Run()
}

Let’s try pressing Ctrl+C in the middle of handle() now.

$ go run main.go
#####
##^CGot interrupt signal. Aborting...
###

This works. Despite that we got an interrupt signal, the currently running handle() finished printing.

Waiting for a goroutine to finish

But there is a tricky part. This works, because task.Run() is called from the main goroutine, and handling of an interrupt signal happens in another. When the signal is caught, and the task.Stop() is called, this another goroutine dies, while the main goroutine continues to execute the select in Run(), receives a value from t.closed channel and returns.

What if we execute task.Run() not in the main goroutine? Like that.

func main() {
    // previous code...

    go task.Run()

    select {
    case sig := <-c:
        fmt.Printf("Got %s signal. Aborting...\n", sig)
        task.Stop()
    }
}

If you interrupt the execution now, then currently running handle() will not finish, because the program will be terminated immediately. It happens, because when the interrupt signal is caught and processed, the main goroutine has nothing more to do - since the task.Run() is executed in another gourotine - and just exits. To fix this we need to somehow wait for the task to finish. This is where sync.WaitGroup will help us.

First, we associate a WaitGroup with our Task:

type Task struct {
    closed chan struct{}
    wg     sync.WaitGroup
    ticker *time.Ticker
}

We instruct the WaitGroup to wait for one background process to finish, which is our task.Run().

func main() {
    // previous code...

    task.wg.Add(1)
    go func() { defer task.wg.Done(); task.Run() }()

    // other code...
}

Finally, we need to actually wait for the task.Run() to finish. This happens in Stop():

func (t *Task) Stop() {
    close(t.closed)
    t.wg.Wait()
}

The full code:

package main

import (
	"fmt"
	"os"
	"os/signal"
	"sync"
	"time"
)

type Task struct {
	closed chan struct{}
	wg     sync.WaitGroup
	ticker *time.Ticker
}

func (t *Task) Run() {
	for {
		select {
		case <-t.closed:
			return
		case <-t.ticker.C:
			handle()
		}
	}
}

func (t *Task) Stop() {
	close(t.closed)
	t.wg.Wait()
}

func handle() {
	for i := 0; i < 5; i++ {
		fmt.Print("#")
		time.Sleep(time.Millisecond * 200)
	}
	fmt.Println()
}

func main() {
	task := &Task{
		closed: make(chan struct{}),
		ticker: time.NewTicker(time.Second * 2),
	}

	c := make(chan os.Signal)
	signal.Notify(c, os.Interrupt)

	task.wg.Add(1)
	go func() { defer task.wg.Done(); task.Run() }()

	select {
	case sig := <-c:
		fmt.Printf("Got %s signal. Aborting...\n", sig)
		task.Stop()
	}
}

Update: Ahmet Alp Balkan pointed out, that the pattern used in this post is more error-prone and, probably, should not be used in favor of a pattern with context package. For details, read Make Ctrl+C cancel the context.Context.

Notes

[1] Channel Axioms

Comments