Kubernetes Certified (CKAD)
CKAD

Kubernetes probes cause nginx HTTP 499 error

HTTP 499 error

I split the Nginx and PHP Fpm from a single container to two containers in the Kubernetes cluster, when Kubernetes deployment start to release a new application version, I found a strange error:

36#36: *1 epoll_wait() reported that client prematurely closed connection, so upstream connection is closed too while sending request to upstream, client: 10.0.0.1, server: _, request: "GET /api/health HTTP/1.1", upstream: "fastcgi://127.0.0.1:9000", host: "10.0.0.1:80"

It says the client closed the connection so the upstream (PHP fpm in this case) is closed too. To understand the root cause of the issue, we need to find out who is the client that closed the connection.

After going through the logs carefully, I found the liveness and readiness probes failed once and it is these probes sent HTTP get requests, then closed the connection and eventually caused the Nginx to return HTTP 499.

But why?

There is a configuration parameter called timeoutSeconds which by default is 1 second. In my case, probes did not get an HTTP response from Nginx as PHP fpm is not able to finish processing the request within a second. Then, probes closed the connection and start a new request.

To fix this issue, I simply increased the timeoutSeconds from 1 to 5.

Other notes

In old versions of Kubernetes (before 1.20), we can use exec instead of httpGet of probes to resolve the HTTP 499 error as exec command will run indefinitely even past their configured deadline. For Kubernetes 1.20+, if you still want the same behaviour as old versions to resolve the HTTP 499 error, you could disable the timeoutSeconds by ExecProbeTimeout to false