DOC-6252 sections about failover behaviour when all endpoints are unhealthy#2768
DOC-6252 sections about failover behaviour when all endpoints are unhealthy#2768andy-stark-redis wants to merge 2 commits intomainfrom
Conversation
|
Thanks @dwdougherty ! |
ggivo
left a comment
There was a problem hiding this comment.
LGTM from Jedis perspective
| in the [Retry configuration]({{< relref "#retry-configuration" >}}) section). However, if the client exhausts | ||
| all the available failover attempts before any endpoint becomes healthy again, commands will throw a `JedisPermanentlyNotAvailableException`. The client won't recover automatically from this situation, so you | ||
| should handle it by reconnecting with the `MultiDBClient` builder after a suitable delay (see | ||
| [Failover configuration](#failover-configuration) for a connection example). | ||
|
|
There was a problem hiding this comment.
On a second look, I don’t think this is technically correct.
Even after a JedisPermanentlyNotAvailableException, if an endpoint becomes healthy again, the client can recover.
JedisPermanentlyNotAvailableException just means that there were no healthy connections for a configured amount of time, so we treat it as a permanent error at that moment. It doesn’t necessarily mean the client is incapable of recovering later.
It also looks like we’re missing an integration test for this scenario — e.g. recovery after a JedisPermanentlyNotAvailableException has already been thrown.
@atakavci — any concerns we clarify this behavior in the docs around JedisPermanentlyNotAvailableException, as it can recover?
There was a problem hiding this comment.
@ggivo , agreed.
JedisPermanentlyNotAvailableException is the way Jedis signaling to the application that "all unhealthy" state has been stable for some period of time, and configured number of attempts(in regard to configured delay) is already exhausted. So that upon receiving this type of exception, the application would decide how to react to a consistent/stable availability issue.
There was a problem hiding this comment.
@atakavci @ggivo OK, so after the app gets a JedisPermanentlyNotAvailableException does Jedis still keep trying to find a healthy endpoint automatically in the background (so if you try a command again a bit later then it might succeed)? Or do you have to add some code to handle this explicitly from the app (eg, use isHealthy to check all the current endpoints and then use setActiveDatabase to start using a healthy endpoint if you can find one)?
Added info about this based on customer feedback. The corresponding section for the Lettuce geo failover page will be added in a separate PR.