Timeouts Link to heading

Modern software is rarely run in isolation. Instead most of the time the succesfull operation of a system is dependent on multiple services with HTTP calls, database queries, queues, etc acting as glue between the services. Lets spend some thinking what timeouts really mean in more complex environment.

Reading from socket is easy you might think. Why should I care the low level stuff? My frameworks handles these things for me. Even if I would need to drop down to low level then just call read() on file-descriptor and out comes the data right? Well yes, usually. However it’s the cases when it isn’t that easy you need to focus on. It might be shocking to know that many programming languages and frameworks implement extremely poor timeout values and if you want to make your software behave properly you definitely need to pay attention to timeouts.

Proper timeout values are always a compromise. Setting timeouts too short might waste resources if timeouts are handled poorly on the server side and too high values will frustrate users and cause your system to break suddenly. Infinite timeouts are the worst since that will just gobble all the resources and will surely exhaust your systems ability to do meaningful work.

Generally from the operational point of view it is better to fail fast than be stuck forever. Failing fast gives you options while infite timeouts reduce your options on how you react to errors.

TCP Link to heading

Before discussing about HTTP, REST and other more high-level protocols let’s first concentrate on the low level stuff. Namely TCP connections. First of all, you should always configure connection timeout. You don’t want to wait a long time and then fail. You want to get feedback quickly.

Setting TCP timeouts is highly dependant on where you are running and what services are you calling. Running inside K8S cluster and calling other services running in the same cluster than calling services over the Internet.

You want to set TCP connection timeout to roughly 3-4x the round-trip time (RTT) from your service to remote service being called. This gives your service some headroom if there is something out of ordinary going on with the network which causes slowdown. Note! Allow some time for DNS lookups since that is usually included in the RTT.

Consider service running in the same K8S cluster. The RTT is probably at most 2-10ms. Setting TCP connection timeout to 20-40ms is pretty ok value, it will fail fairly fast if server is not performing as it should. However service running on different continent or geographical region will have RTT time of 30-200ms and for these kind of services the connection timeout should be somewhere around 90-800ms.

Server:

package main

import (
	"fmt"
	"net"
	"time"
)

func main() {
	listener, err := net.Listen("tcp", "127.0.0.1:9090")
	if err != nil {
		fmt.Println("Error listening:", err.Error())
		return
	}
	defer listener.Close()
	fmt.Println("Listening on :9090...")

	for {
		conn, err := listener.Accept()
		if err != nil {
			fmt.Println("Error accepting: ", err.Error())
			return
		}
		go handleRequest(conn)
	}
}

func handleRequest(conn net.Conn) {
	time.Sleep(1000 * time.Millisecond)
	conn.Write([]byte("hello\n"))
	conn.Close()
}

Client:

package main

import (
	"bufio"
	"fmt"
	"net"
	"time"
)

func main() { server := "127.0.0.1:9090" 
    connectionTimeout := 3 * time.Millisecond 
    fmt.Printf("Setting connection timeout to %v\n", connectionTimeout) 
    conn, err := net.DialTimeout("tcp", server, connectionTimeout) 
    if err != nil { 
        fmt.Println("Error connecting:", err.Error()) 
        return 
    } 
    defer conn.Close() 
    fmt.Printf("Connected to %s\n", server)

	readTimeout := 3000 * time.Millisecond
	fmt.Printf("Setting read timeout to %v\n", readTimeout)
	conn.SetReadDeadline(time.Now().Add(readTimeout))

	reader := bufio.NewReader(conn)
	response, err := reader.ReadString('\n')
	if err != nil {
		fmt.Println("Error reading:", err.Error())
		return
	}

	fmt.Print("Server response: ", response)
}

We explicitly set the timeout for TCP connection to 3ms which is plenty when running on the same laptop. If we would set the connection timeout to 700 microseconds then we would occasionally get an error while opening the connection.

Read timeouts Link to heading

When we have finally successfully opened a TCP connection we usually want to transmit data back and forth which opens up a whole new can of worms.

Both writing to and reading from socket can have it’s own timeouts.

In our client example before we setup timeout of 3000 milliseconds (3s) for read operation. Similar way you could set timeout for write operations by calling:

writeTimeout := 3000 * time.Millisecond duration here
conn.SetWriteDeadline(time.Now().Add(writeTimeout))

Other timeouts Link to heading

Previous examples were more targeted towards pure TCP connections instead of HTTP. Configuring timeouts for HTTP servers is very much dependent on the language and the framework being used. In order to make your software more robust you should at least consider the following:

  • DNS lookups and their timeouts
  • TLS handshake timeouts
  • Keepalive timeouts
  • etc

Go specific Link to heading

Some Golang specific tips and tricks…

You can customize Go’s net/http when you initialize it:

srv := &http.Server{ 
    ReadTimeout:       2 * time.Second, 
    WriteTimeout:      2 * time.Second, 
    IdleTimeout:       15 * time.Second, 
    ReadHeaderTimeout: 3 * time.Second, 
    TLSConfig:         tlsConfig, 
    Handler: srvMux, }

These are TCP timeout values. You should also implement func TimeoutHandler(h Handler, dt time.Duration, msg string) Handler and wrap your handlerfunctions with that in order to stop execution. Additionally get familiar with http.ResponseController which enables setting deadlines per request basis.