One Thing You Might Overlook When Reading Response Body in Go

ยท

6 min read

One Thing You Might Overlook When Reading Response Body in Go

The Go net/http package provides a robust set of functionality to allow us to deal with HTTP requests and responses. However, reading the response body is a common task that requires a bit of caution. In this blog, we will explore the pitfall I experience when using streaming techniques to process the response body in chunks and how to avoid them.

How to Read the Response Body

There are different ways of reading the response body like the following:

resp, err := http.Get("http://example.com/")
if err != nil {
    // handle error
}
defer resp.Body.Close()
body, err := io.ReadAll(resp.Body)
// ...

or

resp, err := http.Get("https://example.com")
if err != nil {
    // handle error
}
defer resp.Body.Close()
// get a copy of the response body as a byte slice
body, err := httputil.DumpResponse(resp, true)
// ...

The examples above basically read the entire contents of an HTTP response body as a byte slice, but take different arguments. We will not go into detail about these measures. There are plenty of resources available to help you.

What if the response bodies are huge and we want to avoid memory and performance problems because the whole response has to be loaded into memory before it can be processed? To avoid this, we could use streaming techniques to process the response body in chunks.

Reading the Response Body in Chunks

resp, err := http.Get("https://example.com")
if err != nil {
    // handle error
}
defer resp.Body.Close()

buff := make([]byte, 10)  
for {  
    var bytesRead int  
    bytesRead, err = resp.Body.Read(buff)
    if err == io.EOF {
        break
    }
    if err != nil {  
        // handle error
    }
    // send the data to the Send function for further processing.
    send(buff[:bytesRead])  
}

As you can see from the example above, instead of reading the response body all at once, we create a byte slice called buff to store our chunks of the response body. Due to the power of the http package, the response body implements the Reader interface, which allows us to stream on demand by calling the Read(b []byte) method. Finally, the last line of code sends the data to read in the previous step to a hypothetical Send function. The [:bytesRead] notation is used to slice the buff slice to contain only the bytes actually read in the previous step.

To read parts of the response body until the whole body has been read, we put it in an infinite loop. The loop will break when the io.EOF error is returned, indicating that the end of the response body has been reached. This may sound familiar, and is pretty much the same as trying to read a file as a chunk using the os package (to read more detail here).

If you're like me, let's look at what could go wrong in the previous example, where we read the response body in chunks, in the next section.

io.EOF Might Not Be What You Think

The code snippet appears to be correct at first glance, but there is a risk of obtaining arbitrary results when determining the number of chunks received from the remote server if we do not perform a follow-up check after encountering the io.EOF error.

Let's have a look at how the Read method is described in the official documentation of the Reader Interface:

When Read encounters an error or end-of-file condition after successfully reading n > 0 bytes, it returns the number of bytes read. It may return the (non-nil) error from the same call or return the error (and n == 0) from a subsequent call. An instance of this general case is that a Reader returning a non-zero number of bytes at the end of the input stream may return either err == EOF or err == nil. The next Read should return 0, EOF.

Although it may seem feasible to break the loop immediately when the io.EOF error is returned, we may still receive the last chunk of data with a return value such as 5, EOF, potentially losing this chunk of data in the process. To ensure greater robustness, we must modify our code to incorporate a more comprehensive check, as illustrated in the following example:

//...

buff := make([]byte, 10)  
for {  
    var bytesRead int  
    bytesRead, err = resp.Body.Read(buff)
    if err == io.EOF {
        err = nil  
        // There may be one last chunk to receive before breaking the loop.
        if bytesRead <= 0 {  
            break  
        }
    }
    if err != nil {  
        // handle error.
    }
    // send the data to the Send function for further processing.
    send(buff[:bytesRead])  
}

The behaviour we see above is to optimise the HTTP transport code by recycling its connection earlier. We can see this from the code snippet in the http package itself:

...
// If we can return an EOF here along with the read data, do
// so. This is optional per the io.Reader contract, but doing
// so helps the HTTP transport code recycle its connection
// earlier (since it will see this EOF itself), even if the
// client doesn't do future reads or Close.
if err == nil && n > 0 {
    if lr, ok := b.src.(*io.LimitedReader); ok && lr.N == 0 {
        err = io.EOF
        b.sawEOF = true
    }
}
...

For those of you who want to delve deeper, you can find the sourcecode here.

Tests Exercise

Testing is essential to ensure that our code works as intended, and the best way to gain proficiency is through practical application.

Exercise:

Write a test file to test the following function using both a mock HTTP client and a real HTTP server. Share your findings in the comments section below :)

// HTTPClient is an interface for http client.
type HTTPClient interface {
    Do(req *http.Request) (*http.Response, error)
}
// Download downloads a file from a given url and send the data to the send function.
func Download(client HTTPClient, url string, bufferSize int64, send func([]byte)) error {
    // TODO
}

Conclusion

Things are not always as obvious as we think they are, especially what we think they should be based on our experience. There's always a blind side if we can't see it from another angle, just like what we learn about io.EOF, which doesn't mean we've reached the end of the file.

Reading the response body in Go requires a bit of care and attention. You should be aware of pitfalls such as reading the body twice, not closing it, not handling errors properly, and using the wrong decoder. By following best practices and using the right tools, you can avoid these pitfalls and make your HTTP requests more efficient and reliable.

The sample solution to the exercise can be found here. Feel free to clone it, try it out and share your thoughts in the comments below.

I hope you found something useful and that it will save you a lot of valuable debugging time :)

Thank you for reading!

Did you find this article valuable?

Support Ray Yang by becoming a sponsor. Any amount is appreciated!

ย