Wondering About Redis Architecture

John Obelenus
2 min readJan 30, 2019

--

I learned something a little strange about Redis behavior when set up in a cluster. First, a little backstory.

The builders of Redis made a smart decision to insulate application engineers not have to worry about which node in the cluster actually contains the key they are trying to lookup. On top of nodes they created the concept of a “slot”. There are a finite number of slots available to a Redis cluster: 16384, or 2 to the 14th power. Redis uses a hashing function to determine how to partition your data — or, which slot your key will end up in. And Redis will always use every slot.

Once you decide how many nodes you add to the cluster Redis will divide the slot-space according to those nodes. So, if you had two nodes; Node1 would contain slots 0~8191, Node2 would contain slots 8192~16383. If you had 3 it would divide by 3, etc. This calculation is done whenever you add or remove nodes. So if you lose a node, or start dynamically adding nodes, Redis will re-partition slots accordingly. And your keys will start flying around re-mapping to different nodes while they remain in the same slot.

Up until now every decision makes perfect sense. This is a good architecture decision. But there is one more wrinkle. There are commands in Redis that operate on multiple keys. It makes perfect sense to me that you cannot use these commands on keys that live on different nodes — that would never work. But, you also cannot use these commands on keys that are in different slots; even if they are on the same node. And that I just do not understand.

Without cluster mode, every key would still map to a different slot, but there is one node. So Redis knows it has everything and can do any multi-key operation you want — no problem. But once you turn cluster mode on, it stops working. And each Redis node knows exactly which slots its responsible for. So it ought to know which keys are on the node, even in different slots. And it ought to know it can perform that operation, just like in a non-cluster.

If you know why this limitation exists I’d love to hear it. Because to work around this issue involves some large trade-offs.

--

--

John Obelenus
John Obelenus

Written by John Obelenus

Solving Problems & Saving Time through Software and Crushing Entropy. Twitter: @EngineerJohnO

No responses yet