NATS uses cookies to ensure you get the best experience on our website. Continuing to use this site assumes compliance with our Privacy Policy.
Edit on GitHub

Cluster Upgrade

The basic strategy for upgrading a cluster revolves around the server’s ability to gossip cluster configuration to clients and other servers. When cluster configuration changes, clients become aware of new servers automatically. In case of a disconnect, a client has a list of servers that joined the cluster in addition to the ones it knew about from its connection settings.

Note that since each server stores it’s own permission and authentication configuration, new servers added to a cluster should provide the same users and authorization to prevent clients from getting rejected or gaining unexpected privileges.

For purposes of describing the scenario, let’s get some fingers on keyboards, and go through the motions. Let’s consider a cluster of two servers: ‘A’ and ‘B.’, and yes - clusters should be three to five servers, but for purposes of describing the behavior and cluster upgrade process, a cluster of two servers will suffice.

Let’s build this cluster:

gnatsd -D -p 4222 -cluster nats://localhost:6222 -routes nats://localhost:6222,nats://localhost:6333

The command above is starting gnatsd with debug output enabled, listening for clients on port 4222, and accepting cluster connections on port 6222. The -routes option specifies a list of nats URLs where the server will attempt to connect to other servers. These URLs define the cluster ports enabled on the cluster peers.

Keen readers will notice a self-route. Gnatsd will ignore the self-route, but it makes for a single consistent configuration for all servers.

You will see the server started, we notice it emits some warnings because it cannot connect to ‘localhost:6333’. The message more accurately reads:

 Error trying to connect to route: dial tcp localhost:6333: connect: connection refused

Let’s fix that, by starting the second server:

gnatsd -D -p 4333 -cluster nats://localhost:6333 -routes nats://localhost:6222,nats://localhost:6333

The second server was started on port 4333 with its cluster port on 6333. Otherwise the same as ‘A.’

Let’s get one client, so we can observe it moving between servers as servers get removed:

nats-sub -s nats://localhost:4222 ">"

Nats-sub is a subscriber sample included with all NATS clients. Nats-sub subscribes to a subject and prints out any messages received. You can find the source code to the go version of nats-sub [here)(https://github.com/nats-io/go-nats/tree/master/examples). After starting the subscriber you should see a message on ‘A’ that a new client connected.

We have two servers and a client. Time to simulate our rolling upgrade. But wait, before we upgrade ‘A,’ let’s introduce a new server ’T.’ Server ’T’ will join the existing cluster while we perform the upgrade. Its sole purpose is to provide an additional place where clients can go besides ‘A.’ and ensure we don’t end up with a single server serving all the clients after the upgrade procedure. Clients will randomly select a server when connecting unless a special option is provided that disables that functionality (usually called ‘DontRandomize’ or ‘noRandomize’). You can read more about “Avoiding the Thundering Herd”. Suffice it to say that clients redistribute themselves about evenly between all servers in the cluster. In our case 12 of the clients on ‘A’ will jump over to ‘B’ and the remaining half to ’T.’

Let’s start our temporary server:

gnatsd -D -p 4444 -cluster nats://localhost:6444 -routes nats://localhost:6222,nats://localhost:6333

After an instant or so, clients on ‘A’ learn of the new cluster member that joined. On our hands-on tutorial, nats-sub is now aware of 3 possible servers, ‘A’ (specified when we started the tool) and ‘B’ and ’T’ learned from the cluster gossip.

We invoke our admin powers and turn off ‘A’ by issuing a CTRL+C to the terminal on ‘A,’ and observe that either ‘B’ or ’T’ reports that a new client connected. That is our nats-sub client.

We perform the upgrade process, update the binary for ‘A’, and restart ‘A’:

gnatsd -D -p 4222 -cluster nats://localhost:6222 -routes nats://localhost:6222,nats://localhost:6333

We move on to upgrade ‘B’. Notice that clients from ‘B’ reconnect to ‘A’ and ’T’. We upgrade and restart ‘B’:

gnatsd -D -p 4333 -cluster nats://localhost:6333 -routes nats://localhost:6222,nats://localhost:6333

If we had more servers, we would continue the stop, update, restart rotation as we did for ‘A’ and ‘B.’ After restarting the last server, we can go ahead and turn off ’T.’ Any clients on ’T’ will redistribute to our permanent cluster members.

Seed Servers

In the examples above we started gnatsd specifying two clustering routes. It is possible to allow the server gossip protocol drive it and reduce the amount of configuration. You could for example start A, B and C as follows:

A - Seed Server

gnatsd -D -p 4222 -cluster nats://localhost:6222

B

gnatsd -D -p 4333 -cluster nats://localhost:6333 -routes nats://localhost:6222

C

gnatsd -D -p 4444 -cluster nats://localhost:6444 -routes nats://localhost:6222

Once they connect to the ‘seed server’, the will learn about all the other servers and connect to each other forming the full mesh.