In Defence of Disabling Swap

Posted on 08 Mar 2021, tagged Linuxswapmemorytechnology

Recently, I read an article In Defence of Swap that talks about memory swap in Linux. I happened to see some problems at work related to it as well. So in this article, I’ll talk about swap and why we want to disable it.

In the article above, the author recommends to use swap because it makes the memory reclamation more efficient. Which means the OS can swap out the memory that allocated by the program, so that more physical memory can be used for file page to improve cache hit. The author thinks a lot of people don’t like swap because they don’t understand how swap works. But I think it’s quite the opposite: most people with experience understand that, and that’s exactly one of the reasons that they want to disable swap.

The first and biggest reason of disabling swap is transparency. Programmers have some expectations about data access latency while writing the program: if they access it from memory, it will be faster; if access from file, it will be slower. When the program put something into memory, it maybe rarely used. But when it needs to be used, it is expected to be accessed fast. When the program access data from file with swap disabled, it maybe slower because there are less memory for file cache, but it’s okay because it’s expected. So yes, swap can sometimes make memory usage more efficient, but that’s in exchange of the stability. It’s probably okay for desktop users, but not for servers. If you really need efficiency, you need to manage cache by yourself. For example, if some data is really barely used and the latency doesn’t matter, the program can optimize it by putting the data into file or database instead of hand over the control to OS, because the OS doesn’t really know which part of memory is important – less often used memory doesn’t mean less important memory. It’s much harder to optimize the program while the memory management is complex and a black box. Another advantage to manage the memory by itself is, we can add metrics about cache misses and so on, so when there is a performance problem it’s easier to trace down what’s happening.

The second reason is OOM killer. This reason applies when the system is running in high load. Swap get its bad reputation mostly because of the very slow response when memory is not enough, and it’s fair in my opinion. Even for a desktop, it’s better to kill the program instead of letting the whole system to be slow. For an online service, you may think it’s better to handle requests slow rather than completely down. It’s true if your service only has one node, but it’s barely the case for web services. It’s better to let the node down, so you know something is wrong. If there is memory leak, kill the service and restart it can normally solve the problem and you can resolve the root cause later. If there are more requests than the service can handle, it’s better to detect that and scale up or limit requests accordingly. Both cases are better than running the node in an unknown state.

So in conclusion, whether a feature is good depends on real world use cases rather than how the designer imagine it will be used. There is a reason when disabling a feature becomes normal practice for some use cases. Maybe it’s out of topic, but as developers, we must always keep the real world use cases in mind instead of resolving hypothetical problems.