There are two main reasons.
To give you a few examples (all obtained using 64-bit instances):
To test your use case is trivial using the redis-benchmark
utility to generate random data sets and check with the INFO memory
command the space used.
64-bit systems will use considerably more memory than 32-bit systems to store the same keys, especially if the keys and values are small. This is because pointers take 8 bytes in 64-bit systems. But of course the advantage is that you can have a lot of memory in 64-bit systems, so in order to run large Redis servers a 64-bit system is more or less required. The alternative is sharding.
In the past the Redis developers experimented with Virtual Memory and other systems in order to allow larger than RAM datasets, but after all we are very happy if we can do one thing well: data served from memory, disk used for storage. So for now there are no plans to create an on disk backend for Redis. Most of what Redis is, after all, is a direct result of its current design.
If your real problem is not the total RAM needed, but the fact that you need to split your data set into multiple Redis instances, please read the Partitioning page in this documentation for more info.
Recently Redis Labs, the company sponsoring Redis developments, developed a "Redis on flash" solution that is able to use a mixed RAM/flash approach for larger data sets with a biased access pattern. You may check their offering for more information, however this feature is not part of the open source Redis code base.
Yes, a common design pattern involves taking very write-heavy small data in Redis (and data you need the Redis data structures to model your problem in an efficient way), and big blobs of data into an SQL or eventually consistent on-disk database. Similarly sometimes Redis is used in order to take in memory another copy of a subset of the same data stored in the on-disk database. This may look similar to caching, but actually is a more advanced model since normally the Redis dataset is updated together with the on-disk DB dataset, and not refreshed on cache misses.
If you can, use Redis 32 bit instances. Also make good use of small hashes, lists, sorted sets, and sets of integers, since Redis is able to represent those data types in the special case of a few elements in a much more compact way. There is more info in the Memory Optimization page.
Redis will either be killed by the Linux kernel OOM killer, crash with an error, or will start to slow down. With modern operating systems malloc() returning NULL is not common, usually the server will start swapping (if some swap space is configured), and Redis performance will start to degrade, so you'll probably notice there is something wrong.
Redis has built-in protections allowing the user to set a max limit to memory usage, using the maxmemory
option in the configuration file to put a limit to the memory Redis can use. If this limit is reached Redis will start to reply with an error to write commands (but will continue to accept read-only commands), or you can configure it to evict keys when the max memory limit is reached in the case you are using Redis for caching.
We have detailed documentation in case you plan to use Redis as an LRU cache.
The INFO command will report the amount of memory Redis is using so you can write scripts that monitor your Redis servers checking for critical conditions before they are reached.
Short answer: echo 1 > /proc/sys/vm/overcommit_memory
:)
And now the long one:
Redis background saving schema relies on the copy-on-write semantic of fork in modern operating systems: Redis forks (creates a child process) that is an exact copy of the parent. The child process dumps the DB on disk and finally exits. In theory the child should use as much memory as the parent being a copy, but actually thanks to the copy-on-write semantic implemented by most modern operating systems the parent and child process will share the common memory pages. A page will be duplicated only when it changes in the child or in the parent. Since in theory all the pages may change while the child process is saving, Linux can't tell in advance how much memory the child will take, so if the overcommit_memory
setting is set to zero fork will fail unless there is as much free RAM as required to really duplicate all the parent memory pages, with the result that if you have a Redis dataset of 3 GB and just 2 GB of free memory it will fail.
Setting overcommit_memory
to 1 tells Linux to relax and perform the fork in a more optimistic allocation fashion, and this is indeed what you want for Redis.
A good source to understand how Linux Virtual Memory works and other alternatives for overcommit_memory
and overcommit_ratio
is this classic from Red Hat Magazine, "Understanding Virtual Memory". Beware, this article had 1
and 2
configuration values for overcommit_memory
reversed: refer to the proc(5) man page for the right meaning of the available values.
Yes, redis background saving process is always forked when the server is outside of the execution of a command, so every command reported to be atomic in RAM is also atomic from the point of view of the disk snapshot.
It's not very frequent that CPU becomes your bottleneck with Redis, as usually Redis is either memory or network bound. For instance, using pipelining Redis running on an average Linux system can deliver even 1 million requests per second, so if your application mainly uses O(N) or O(log(N)) commands, it is hardly going to use too much CPU.
However, to maximize CPU usage you can start multiple instances of Redis in the same box and treat them as different servers. At some point a single box may not be enough anyway, so if you want to use multiple CPUs you can start thinking of some way to shard earlier.
You can find more information about using multiple Redis instances in the Partitioning page.
However with Redis 4.0 we started to make Redis more threaded. For now this is limited to deleting objects in the background, and to blocking commands implemented via Redis modules. For the next releases, the plan is to make Redis more and more threaded.
Redis can handle up to 2^32 keys, and was tested in practice to handle at least 250 million keys per instance.
Every hash, list, set, and sorted set, can hold 2^32 elements.
In other words your limit is likely the available memory in your system.
If you use keys with limited time to live (Redis expires) this is normal behavior. This is what happens:
INFO
output and by the DBSIZE
command.As a result of this, it is common for users with many keys with an expire set to see less keys in the slaves, because of this artifact, but there is no actual logical difference in the instances content.
It means REmote DIctionary Server.
Originally Redis was started in order to scale LLOOGG. But after I got the basic server working I liked the idea to share the work with other people, and Redis was turned into an open source project.
It's "red" like the color, then "iss".